Back to Research
Cernis

New Model

Multi-Task Document Understanding

Introducing Cernis-Thinking: A Multi-Task Vision Language Model for Document Understanding

September 15, 2025Announcement

Introducing Cernis-Thinking: A Multi-Task Vision Language Model for Document Understanding

We're excited to release Cernis-Thinking, an open-source vision language model trained with reinforcement learning to understand and reason about documents. Built on Qwen2.5-VL-7B and trained using GRPO (Group Relative Policy Optimization), Cernis-Thinking can handle mathematical reasoning, LaTeX OCR, invoice extraction, and handwriting transcription — all in a single 7B parameter model.

Why Another Document Model?

We wanted to explore whether reinforcement learning could create a more versatile document understanding model that doesn't just extract text, but actually reasons about what it sees.

Traditional document models often struggle with:

  • Understanding context beyond pure OCR
  • Solving problems that require multi-step reasoning
  • Handling diverse document types without task-specific fine-tuning
  • Providing structured, parseable outputs

Cernis-Thinking addresses these challenges by learning from rewards rather than just mimicking training examples.

Key Approach: Reinforcement Learning with Smart Rewards

Unlike supervised fine-tuning where models learn to copy examples, we used GRPO (Group Relative Policy Optimization) — a reinforcement learning technique that teaches the model to maximize rewards. This lets us directly optimize for what matters: correct, well-structured outputs.

Our Reward Functions

We designed three complementary reward functions:

1. Formatting Rewards

  • Rewards proper structure with <REASONING> and <SOLUTION> tags
  • Penalizes excessive artifacts
  • Makes outputs easily parseable downstream

2. Task-Specific Correctness Rewards

  • Rewards for exact numeric matching
  • Rewards for correct LaTeX/Handwriting recognition
  • Rewards for accurate Invoice extraction

3. ROUGE-Style Word Overlap (for OCR tasks)

  • Prevents wasted training on gibberish
  • Provides feedback even when outputs aren't perfect
  • Accelerates learning on text-heavy tasks

Training Details

Datasets (2,000 samples total)

We mixed four document understanding datasets:

DatasetSamplesTask Type
AI4Math/MathVista~500Math word problems with images
unsloth/LaTeX_OCR~500Mathematical formula recognition
mychen76/invoices-and-receipts_ocr_v1~500Invoice extraction
corto-ai/handwritten-text~500Handwriting transcription

Example Outputs

Math Reasoning:

<REASONING>
In this parallelogram problem, I need to use:
1. Opposite sides are equal
2. The angle bisector creates a specific relationship
3. Using properties of triangles formed...
</REASONING>

<SOLUTION>
42
</SOLUTION>

LaTeX OCR:

<SOLUTION>
\frac{2}{3} < a^{2} \alpha^{2} \leq 1
</SOLUTION>

Invoice Extraction:

<SOLUTION>
Invoice No: 53553822
Date: 07/24/2012
Vendor: Leo Brown
Total: $247.50
Tax ID: 926-74-9803
</SOLUTION>

Try Cernis-Thinking

Why Open Source?

We believe the future of document understanding requires models that can:

  1. Reason about what they see, not just transcribe it
  2. Adapt to diverse document types without retraining
  3. Explain their outputs for trust and auditability

By open-sourcing Cernis-Thinking, we hope to inspire more research into RL-trained vision language models and practical document AI systems.