Why Document AI Studio

Intelligent document processing should be this simple:

Upload your document
Tell us what you need (text, structured data, classifications, PII detection)
Get results powered by state-of-the-art document AI models

All through a clean web interface. No code required. No setup. No training. No infrastructure to manage.

That's Document AI Studio. Production-grade document intelligence, accessible to everyone.

The Six Primitives, One Interface

Remember the six core primitives we discussed on Day 1 of this series? Document AI Studio gives you all of them in a visual playground:

1. OCR - Text Extraction

Upload any document (scanned, native PDF, handwritten, images) and extract text with:

Three quality tiers:

Fast: Perfect for clean documents, lowest cost
Moderate: Handles most real-world documents, best cost/quality
Premium: Maximum accuracy for degraded scans and handwriting

Three output formats:

Text: Plain text extraction
Markdown: Preserves headings, lists, formatting
HTML: Full layout fidelity with styles

2. Extract - Structured Data

Define your schema visually, get typed JSON automatically:

In the UI:

Upload your document
Define schema: {"vendor": "string", "total": "number", "date": "string"}
Select mode
Click "Extract"
Get validated JSON

3. Classify - Document Categorization

Multi-page documents, automatic splitting:

Use case: 200-page medical record with intake forms, lab results, and treatment notes mixed.

4. Chunk - Semantic Segmentation

Intelligent document splitting for RAG and LLMs.

5. Count Tokens - Cost Estimation

Know the cost before you process.

6. PII Detection - Privacy & Compliance

Actions: Detect all PII (with locations and types), redact sensitive data by replacing it with [REDACTED] or [EMAIL] tags, or replace with synthetic fake data for testing.

Modes:

Local: Fast, offline regex for basic PII (emails, phones, SSNs, etc).
Sentinel: Our high-accuracy PII model (names, addresses, financial/medical info).

Live demo: cernisintelligence.com/studio

Open Source Components

Document AI Studio is built on open-source foundations:

Docuglean SDK (MIT License)

The six primitives as code:

Python:

pip install docuglean

TypeScript:

npm install docuglean-ocr

Repos:

github.com/cernis-intelligence/docuglean-ocr

Sentinel PII Model

Model card: huggingface.co/cernis-intelligence/sentinel

Repo: github.com/cernis-intelligence/sentinel-pii-sdk

CernisOCR & Cernis-Thinking (Open Weights)

CernisOCR and Cernis-Thinking: huggingface.co/cernis-intelligence

Studio

Day 7: Document AI Studio - Intelligent Document Processing with State-of-the-Art AI