Back to Research
Cernis

New Model

Privacy-First Summarization

Introducing Precis: A Fast Document Summarization Model

September 8, 2025Announcement

Introducing Precis: A Fast, Tiny Document Intelligence Model Built on IBM Granite

Earlier this year, OpenAI released GPT-4o and Anthropic launched Claude Sonnet 4 — both excellent at document summarization, but with significant costs and privacy concerns for sensitive documents. We were excited to see high-quality open-source alternatives like IBM's Granite 4.0 emerge — and saw an opportunity to build a specialized, privacy-preserving summarization model optimized for our document intelligence platform.

The result is Precis, a fine-tuned document intelligence model that's fast, runs locally, and maintains document privacy. We're releasing it open source for anyone working with confidential documents who needs local, on-premise processing.

This model was trained using efficient supervised fine-tuning (SFT), completely separate from cloud-based APIs, ensuring your documents never leave your infrastructure.

Project Overview

Objective: Train a production-ready LLM to generate comprehensive 300-word summaries of documents, optimized for question-answering capability.

Model Selection: IBM Granite 4.0-H-Micro

  • Parameters: ~3.2B
  • Architecture: Transformer with shared MLP layers
  • Memory footprint: 6GB VRAM (4-bit quantization)

Dataset Engineering

Data Sources

We created a hybrid dataset combining two complementary sources:

Source 1: CNN/DailyMail)

  • Pre-existing human-written summaries
  • News articles with professional highlights
  • Average length: ~750 words → ~60 word summaries

Source 2: ServiceNow RepLiQA

  • Synthetic non-factual documents
  • GPT-5 generated summaries (~300 words each)
  • Designed to test contextual understanding
  • Contains 5 Q&A pairs per document

Final Dataset Structure: 5,500 training examples total

Training Methodology

Supervised Fine-Tuning (SFT)

We chose SFT over Reinforcement Learning for several pragmatic reasons:

Why SFT?

  • Simpler implementation (no judge LLM required)
  • Deterministic training (no API failures)
  • Lower cost
  • Faster iteration cycles
  • Proven effectiveness for style transfer tasks

Evaluation Strategy

Benchmark Design

Primary Metric: Question-Answering Accuracy

  1. Generate summaries with trained model
  2. Use GPT-4 as judge to answer questions from summary only
  3. Compare answers to ground truth
  4. Calculate % questions answered correctly

Secondary Metrics:

  • ROUGE-L score vs reference summaries
  • Average summary length
  • % summaries exceeding 350-word limit

Qualitative Assessment

Strengths:

  • Consistent 300-word output
  • Maintains key facts and details
  • Natural language flow
  • Fast inference (0.5s/summary)

Deployment Preparation

Model Export Options

1. LoRA Adapters (150MB)

  • Smallest size
  • Requires base model
  • Fast upload to HuggingFace

2. Merged Float16 (6GB)

  • Production ready
  • vLLM optimized
  • Self-contained

3. Merged 4-bit (2GB)

  • Resource constrained deployment
  • Slightly slower inference

Conclusion

We successfully trained a specialized document summarization model using modern efficient fine-tuning techniques.

This project demonstrates that state-of-the-art results are achievable with modern efficient training frameworks, smart architecture choices, and pragmatic methodology.

The model is ready for production deployment and serves as a strong baseline for future RL optimization.