New Model
Multi-Task Document Understanding
Introducing Cernis-Thinking: A Multi-Task Vision Language Model
Introducing Cernis-Thinking: A Multi-Task Vision Language Model for Document Understanding
We're excited to release Cernis-Thinking, an open-source vision language model trained with reinforcement learning to understand and reason about documents. Built on Qwen2.5-VL-7B and fine-tuned using GRPO (Group Relative Policy Optimization), Cernis-Thinking can handle mathematical reasoning, LaTeX OCR, invoice extraction, and handwriting transcription — all in a single 7B parameter model.
Why Another Document Model?
Earlier this year, we saw the Allen Institute's olmOCR — an impressive open-source document OCR model. Inspired by their work and projects like RolmOCR, we wanted to explore whether reinforcement learning could create a more versatile document understanding model that doesn't just extract text, but actually reasons about what it sees.
Traditional document models often struggle with:
- Understanding context beyond pure OCR
- Solving problems that require multi-step reasoning
- Handling diverse document types without task-specific fine-tuning
- Providing structured, parseable outputs
Cernis-Thinking addresses these challenges by learning from rewards rather than just mimicking training examples.
Key Approach: Reinforcement Learning with Smart Rewards
Unlike supervised fine-tuning where models learn to copy examples, we used GRPO (Group Relative Policy Optimization) — a reinforcement learning technique that teaches the model to maximize rewards. This lets us directly optimize for what matters: correct, well-structured outputs.
Our Reward Functions
We designed three complementary reward functions:
1. Formatting Rewards
- Rewards proper structure with
<REASONING>and<SOLUTION>tags - Penalizes excessive artifacts
- Makes outputs easily parseable downstream
2. Task-Specific Correctness Rewards
- Math: Exact numeric matching
- LaTeX/Handwriting: Fuzzy string matching
- Invoices: Partial credit for key fields
3. ROUGE-Style Word Overlap (for OCR tasks)
- Prevents wasted training on gibberish
- Provides feedback even when outputs aren't perfect
- Accelerates learning on text-heavy tasks
Training Details
Datasets (2,000 samples total)
We mixed four document understanding datasets:
| Dataset | Samples | Task Type |
|---|---|---|
| AI4Math/MathVista | ~500 | Math word problems with images |
| unsloth/LaTeX_OCR | ~500 | Mathematical formula recognition |
| mychen76/invoices-and-receipts_ocr_v1 | ~500 | Invoice extraction |
| corto-ai/handwritten-text | ~500 | Handwriting transcription |
Results: Learning to Think
The training metrics tell a compelling story:
| Step Range | Avg Reward | Best Reward | Observations |
|---|---|---|---|
| 1-50 | 0.8-1.2 | 1.97 | Early learning, inconsistent |
| 51-150 | 1.2-1.6 | 2.25 | Stable improvement |
| 151-300 | 1.4-1.8 | 2.62 | Strong multi-task performance |
| 301-375 | 1.5-2.0 | 2.87 | Consistent high performance |
Example Outputs
Math Reasoning:
<REASONING>
In this parallelogram problem, I need to use:
1. Opposite sides are equal
2. The angle bisector creates a specific relationship
3. Using properties of triangles formed...
</REASONING>
<SOLUTION>
42
</SOLUTION>
LaTeX OCR:
<SOLUTION>
\frac{2}{3} < a^{2} \alpha^{2} \leq 1
</SOLUTION>
Invoice Extraction:
<SOLUTION>
Invoice No: 53553822
Date: 07/24/2012
Vendor: Leo Brown
Total: $247.50
Tax ID: 926-74-9803
</SOLUTION>
Try Cernis-Thinking
We're releasing everything under Apache 2.0:
- Model: cernis-intelligence/cernis-thinking
- Training Code: Available in model card
- License: Apache 2.0 (commercial-friendly)
Why Open Source?
We believe the future of document understanding requires models that can:
- Reason about what they see, not just transcribe it
- Adapt to diverse document types without retraining
- Explain their outputs for trust and auditability
By open-sourcing Cernis-Thinking, we hope to inspire more research into RL-trained vision language models and practical document AI systems.