Back to Research
Cernis

New Model

Privacy-First Summarization

Introducing Precis: A Fast Document Summarization Model

September 8, 2025Announcement

Introducing Precis: A Fast, Tiny Document Intelligence Model

Earlier this year, OpenAI released GPT-5 and Anthropic launched Claude Sonnet 4.5 — both excellent at document summarization, but with significant costs. We were excited to see high-quality open-source alternatives like IBM's Granite 4.0 emerge — and saw an opportunity to build a specialized, privacy-preserving summarization model optimized for our document intelligence platform.

The result is Precis, a fine-tuned document intelligence model that's fast and runs locally. We're releasing it open source for anyone who needs local, on-premise processing.

Project Overview

Objective: Train a production-ready LLM to generate comprehensive 300-word summaries of documents, optimized for question-answering capability.

Model Selection: IBM Granite 4.0-H-Micro

  • Parameters: ~3.2B
  • Architecture: Transformer with shared MLP layers
  • Memory footprint: 6GB VRAM (4-bit quantization)

Dataset Engineering

Data Sources

We created a hybrid dataset combining two complementary sources:

Source 1: CNN/DailyMail)

  • Pre-existing human-written summaries
  • News articles with professional highlights
  • Average length: ~750 words → ~60 word summaries

Source 2: ServiceNow RepLiQA

  • Synthetic non-factual documents
  • GPT-5 generated summaries (~300 words each)
  • Designed to test contextual understanding
  • Contains 5 Q&A pairs per document

Final Dataset Structure: 5,500 training examples total

Training Methodology

Supervised Fine-Tuning (SFT)

We chose SFT over Reinforcement Learning for several pragmatic reasons:

Why SFT?

  • Simpler implementation (no judge LLM required)
  • Deterministic training (no API failures)
  • Lower cost (we're on a shoestring budget)
  • Faster iteration cycles
  • Proven effectiveness for style transfer tasks

Evaluation Strategy

Benchmark Design

Primary Metric: Question-Answering Accuracy

  1. Generate summaries with trained model
  2. Use GPT-4 as judge to answer questions from summary only
  3. Compare answers to ground truth
  4. Calculate % questions answered correctly

Secondary Metrics:

  • ROUGE-L score vs reference summaries
  • Average summary length
  • % summaries exceeding 350-word limit

Qualitative Assessment

Strengths:

  • Consistent 300-word output
  • Maintains key facts and details
  • Natural language flow
  • Fast inference (0.5s/summary)

Conclusion

We successfully trained a specialized document summarization model using modern efficient fine-tuning techniques.

This project demonstrates that state-of-the-art results are achievable with modern efficient training frameworks, smart architecture choices, and pragmatic methodology.

The model is ready for production deployment and serves as a strong baseline for future RL optimization.