Studio
No Code Required
Day 7: Document AI Studio - Intelligent Document Processing with State-of-the-Art AI
Why Document AI Studio
Intelligent document processing should be this simple:
- Upload your document
- Tell us what you need (text, structured data, classifications, PII detection)
- Get results powered by state-of-the-art document AI models
All through a clean web interface. No code required. No setup. No training. No infrastructure to manage.
That's Document AI Studio. Production-grade document intelligence, accessible to everyone.
The Six Primitives, One Interface
Remember the six core primitives we discussed on Day 1 of this series? Document AI Studio gives you all of them in a visual playground:
1. OCR - Text Extraction
Upload any document (scanned, native PDF, handwritten, images) and extract text with:
Three quality tiers:
- Fast: Perfect for clean documents, lowest cost
- Moderate: Handles most real-world documents, best cost/quality
- Premium: Maximum accuracy for degraded scans and handwriting
Three output formats:
- Text: Plain text extraction
- Markdown: Preserves headings, lists, formatting
- HTML: Full layout fidelity with styles
2. Extract - Structured Data
Define your schema visually, get typed JSON automatically:
In the UI:
- Upload your document
- Define schema:
{"vendor": "string", "total": "number", "date": "string"} - Select mode
- Click "Extract"
- Get validated JSON
3. Classify - Document Categorization
Multi-page documents, automatic splitting:
Use case: 200-page medical record with intake forms, lab results, and treatment notes mixed.
4. Chunk - Semantic Segmentation
Intelligent document splitting for RAG and LLMs.
5. Count Tokens - Cost Estimation
Know the cost before you process.
6. PII Detection - Privacy & Compliance
Actions: Detect all PII (with locations and types), redact sensitive data by replacing it with [REDACTED] or [EMAIL] tags, or replace with synthetic fake data for testing.
Modes:
- Local: Fast, offline regex for basic PII (emails, phones, SSNs, etc).
- Sentinel: Our high-accuracy PII model (names, addresses, financial/medical info).
Live demo: cernisintelligence.com/studio
Open Source Components
Document AI Studio is built on open-source foundations:
Docuglean SDK (MIT License)
The six primitives as code:
Python:
pip install docuglean
TypeScript:
npm install docuglean-ocr
Repos:
Sentinel PII Model
Model card: huggingface.co/cernis-intelligence/sentinel
Repo: github.com/cernis-intelligence/sentinel-pii-sdk
CernisOCR & Cernis-Thinking (Open Weights)
- CernisOCR and Cernis-Thinking: huggingface.co/cernis-intelligence