New Model
Privacy-First Summarization
Introducing Precis: A Fast Document Summarization Model
Introducing Precis: A Fast, Tiny Document Intelligence Model Built on IBM Granite
Earlier this year, OpenAI released GPT-4o and Anthropic launched Claude Sonnet 4 — both excellent at document summarization, but with significant costs and privacy concerns for sensitive documents. We were excited to see high-quality open-source alternatives like IBM's Granite 4.0 emerge — and saw an opportunity to build a specialized, privacy-preserving summarization model optimized for our document intelligence platform.
The result is Precis, a fine-tuned document intelligence model that's fast, runs locally, and maintains document privacy. We're releasing it open source for anyone working with confidential documents who needs local, on-premise processing.
This model was trained using efficient supervised fine-tuning (SFT), completely separate from cloud-based APIs, ensuring your documents never leave your infrastructure.
Project Overview
Objective: Train a production-ready LLM to generate comprehensive 300-word summaries of documents, optimized for question-answering capability.
Model Selection: IBM Granite 4.0-H-Micro
- Parameters: ~3.2B
- Architecture: Transformer with shared MLP layers
- Memory footprint: 6GB VRAM (4-bit quantization)
Dataset Engineering
Data Sources
We created a hybrid dataset combining two complementary sources:
Source 1: CNN/DailyMail)
- Pre-existing human-written summaries
- News articles with professional highlights
- Average length: ~750 words → ~60 word summaries
Source 2: ServiceNow RepLiQA
- Synthetic non-factual documents
- GPT-5 generated summaries (~300 words each)
- Designed to test contextual understanding
- Contains 5 Q&A pairs per document
Final Dataset Structure: 5,500 training examples total
Training Methodology
Supervised Fine-Tuning (SFT)
We chose SFT over Reinforcement Learning for several pragmatic reasons:
Why SFT?
- Simpler implementation (no judge LLM required)
- Deterministic training (no API failures)
- Lower cost
- Faster iteration cycles
- Proven effectiveness for style transfer tasks
Evaluation Strategy
Benchmark Design
Primary Metric: Question-Answering Accuracy
- Generate summaries with trained model
- Use GPT-4 as judge to answer questions from summary only
- Compare answers to ground truth
- Calculate % questions answered correctly
Secondary Metrics:
- ROUGE-L score vs reference summaries
- Average summary length
- % summaries exceeding 350-word limit
Qualitative Assessment
Strengths:
- Consistent 300-word output
- Maintains key facts and details
- Natural language flow
- Fast inference (0.5s/summary)
Deployment Preparation
Model Export Options
1. LoRA Adapters (150MB)
- Smallest size
- Requires base model
- Fast upload to HuggingFace
2. Merged Float16 (6GB)
- Production ready
- vLLM optimized
- Self-contained
3. Merged 4-bit (2GB)
- Resource constrained deployment
- Slightly slower inference
Conclusion
We successfully trained a specialized document summarization model using modern efficient fine-tuning techniques.
This project demonstrates that state-of-the-art results are achievable with modern efficient training frameworks, smart architecture choices, and pragmatic methodology.
The model is ready for production deployment and serves as a strong baseline for future RL optimization.