Back to Research
Cernis

Studio

No Code Required

Day 7: Document AI Studio - Intelligent Document Processing with State-of-the-Art AI

2025-12-07Engineering

Why Document AI Studio

Intelligent document processing should be this simple:

  1. Upload your document
  2. Tell us what you need (text, structured data, classifications, PII detection)
  3. Get results powered by state-of-the-art document AI models

All through a clean web interface. No code required. No setup. No training. No infrastructure to manage.

That's Document AI Studio. Production-grade document intelligence, accessible to everyone.


The Six Primitives, One Interface

Remember the six core primitives we discussed on Day 1 of this series? Document AI Studio gives you all of them in a visual playground:

1. OCR - Text Extraction

Upload any document (scanned, native PDF, handwritten, images) and extract text with:

Three quality tiers:

  • Fast: Perfect for clean documents, lowest cost
  • Moderate: Handles most real-world documents, best cost/quality
  • Premium: Maximum accuracy for degraded scans and handwriting

Three output formats:

  • Text: Plain text extraction
  • Markdown: Preserves headings, lists, formatting
  • HTML: Full layout fidelity with styles

2. Extract - Structured Data

Define your schema visually, get typed JSON automatically:

In the UI:

  1. Upload your document
  2. Define schema: {"vendor": "string", "total": "number", "date": "string"}
  3. Select mode
  4. Click "Extract"
  5. Get validated JSON

3. Classify - Document Categorization

Multi-page documents, automatic splitting:

Use case: 200-page medical record with intake forms, lab results, and treatment notes mixed.

4. Chunk - Semantic Segmentation

Intelligent document splitting for RAG and LLMs.

5. Count Tokens - Cost Estimation

Know the cost before you process.

6. PII Detection - Privacy & Compliance

Actions: Detect all PII (with locations and types), redact sensitive data by replacing it with [REDACTED] or [EMAIL] tags, or replace with synthetic fake data for testing.

Modes:

  • Local: Fast, offline regex for basic PII (emails, phones, SSNs, etc).
  • Sentinel: Our high-accuracy PII model (names, addresses, financial/medical info).

Live demo: cernisintelligence.com/studio

Open Source Components

Document AI Studio is built on open-source foundations:

Docuglean SDK (MIT License)

The six primitives as code:

Python:

pip install docuglean

TypeScript:

npm install docuglean-ocr

Repos:

Sentinel PII Model

Model card: huggingface.co/cernis-intelligence/sentinel

Repo: github.com/cernis-intelligence/sentinel-pii-sdk

CernisOCR & Cernis-Thinking (Open Weights)