Knowledge Distillation

Compress large-model behavior into fast specialists.

We use strong teacher models, curated traces, synthetic data, and evaluation loops to train smaller models that reproduce the behavior you need without carrying the full cost and latency of frontier APIs.

Keep the quality that matters. Remove the inference bill that does not.

Book a technical meeting Contact us

Frontier teachersSmall studentsEvaluationCost reductionPrivate deployment

tabularis / distill.py

live

01 # compress a teacher into a specialist

02 from tabularis import Distiller

04 distiller = Distiller(

05 teacher="frontier-llm",

06 task="support_triage",

07 target_size="180MB",

08 )

10 student = distiller.run(traces=50_000)

12 → quality = 0.97 × teacher

13 ✓ 12× cheaper · 40× faster

▊

inference cost

12× cheaper

vs. teacher baseline

10x lower inference cost target for repeated high-volume tasks

ms latency-oriented deployment for focused workflows

100% control over where the distilled model runs

The bottleneck

What this fixes

A large model is often used to solve a narrow repeated task: classify this message, extract these fields, score this risk, summarize this document type. Paying frontier-model prices for every repetition is expensive and hard to govern.

Our work

How Tabularis helps

We capture the teacher behavior that matters, generate and filter training examples, train a smaller student model, and verify it against task-specific evaluations. The final model can run in your VPC, on-premise, or at the edge.

Specific capabilities

Built for real production constraints

Teacher trace generation from frontier APIs, open-weight models, human experts, or existing business rules.

Synthetic instruction and edge-case generation to cover rare inputs before production exposes them.

Student model training, fine-tuning, quantization, pruning, and latency optimization.

Task-specific benchmarks that compare teacher, student, prompts, and baseline models.

Cost and latency modeling before deployment so the business case is visible.

Packaging for batch jobs, APIs, streaming systems, local inference, and offline environments.

Engagement model

From first dataset to deployed system

Select the behavior

We identify which teacher outputs are worth copying and build a benchmark that reflects production quality.

Train the student

We generate traces, clean them, train a compact model, and optimize for speed, memory, and cost.

Prove the savings

We compare quality, latency, and cost against the teacher model before moving the system into production.

Where it pays off

Concrete use cases

High-volume classification

Replace expensive API calls for millions of tickets, reviews, alerts, transactions, or messages.

Structured extraction

Distill document parsing and field extraction behavior into a controlled model with predictable output.

Private model deployment

Move repeated LLM behavior into your own infrastructure when cloud calls are too risky or expensive.

Next step

Bring one workflow, dataset, or model target.

In the first call we map the technical path, data requirements, deployment constraints, and whether a focused pilot makes sense.

Make a meeting Email Tabularis