Back to Research
Cernis

New Model

Multi-Task Document Understanding

Introducing Cernis-Thinking: A Multi-Task Vision Language Model

September 15, 2025Announcement

Introducing Cernis-Thinking: A Multi-Task Vision Language Model for Document Understanding

We're excited to release Cernis-Thinking, an open-source vision language model trained with reinforcement learning to understand and reason about documents. Built on Qwen2.5-VL-7B and fine-tuned using GRPO (Group Relative Policy Optimization), Cernis-Thinking can handle mathematical reasoning, LaTeX OCR, invoice extraction, and handwriting transcription — all in a single 7B parameter model.

Why Another Document Model?

Earlier this year, we saw the Allen Institute's olmOCR — an impressive open-source document OCR model. Inspired by their work and projects like RolmOCR, we wanted to explore whether reinforcement learning could create a more versatile document understanding model that doesn't just extract text, but actually reasons about what it sees.

Traditional document models often struggle with:

  • Understanding context beyond pure OCR
  • Solving problems that require multi-step reasoning
  • Handling diverse document types without task-specific fine-tuning
  • Providing structured, parseable outputs

Cernis-Thinking addresses these challenges by learning from rewards rather than just mimicking training examples.

Key Approach: Reinforcement Learning with Smart Rewards

Unlike supervised fine-tuning where models learn to copy examples, we used GRPO (Group Relative Policy Optimization) — a reinforcement learning technique that teaches the model to maximize rewards. This lets us directly optimize for what matters: correct, well-structured outputs.

Our Reward Functions

We designed three complementary reward functions:

1. Formatting Rewards

  • Rewards proper structure with <REASONING> and <SOLUTION> tags
  • Penalizes excessive artifacts
  • Makes outputs easily parseable downstream

2. Task-Specific Correctness Rewards

  • Math: Exact numeric matching
  • LaTeX/Handwriting: Fuzzy string matching
  • Invoices: Partial credit for key fields

3. ROUGE-Style Word Overlap (for OCR tasks)

  • Prevents wasted training on gibberish
  • Provides feedback even when outputs aren't perfect
  • Accelerates learning on text-heavy tasks

Training Details

Datasets (2,000 samples total)

We mixed four document understanding datasets:

DatasetSamplesTask Type
AI4Math/MathVista~500Math word problems with images
unsloth/LaTeX_OCR~500Mathematical formula recognition
mychen76/invoices-and-receipts_ocr_v1~500Invoice extraction
corto-ai/handwritten-text~500Handwriting transcription

Results: Learning to Think

The training metrics tell a compelling story:

Step RangeAvg RewardBest RewardObservations
1-500.8-1.21.97Early learning, inconsistent
51-1501.2-1.62.25Stable improvement
151-3001.4-1.82.62Strong multi-task performance
301-3751.5-2.02.87Consistent high performance

Example Outputs

Math Reasoning:

<REASONING>
In this parallelogram problem, I need to use:
1. Opposite sides are equal
2. The angle bisector creates a specific relationship
3. Using properties of triangles formed...
</REASONING>

<SOLUTION>
42
</SOLUTION>

LaTeX OCR:

<SOLUTION>
\frac{2}{3} < a^{2} \alpha^{2} \leq 1
</SOLUTION>

Invoice Extraction:

<SOLUTION>
Invoice No: 53553822
Date: 07/24/2012  
Vendor: Leo Brown
Total: $247.50
Tax ID: 926-74-9803
</SOLUTION>

Try Cernis-Thinking

We're releasing everything under Apache 2.0:

Why Open Source?

We believe the future of document understanding requires models that can:

  1. Reason about what they see, not just transcribe it
  2. Adapt to diverse document types without retraining
  3. Explain their outputs for trust and auditability

By open-sourcing Cernis-Thinking, we hope to inspire more research into RL-trained vision language models and practical document AI systems.