Model Comparison

Compare extraction quality across Schematron 3B, 8B, and Gemini 2.5 Flash

→ Learn how to evaluate and use Schematron

Configuration

Select a website example to extract product informationLimit: 100 requests per 4 minutes

Website Example

https://exa.ai

Schematron-3B

Ready

Fast and efficient extraction

Schematron-8B

Ready

Balanced performance and accuracy

Gemini 2.5 Flash

Ready

Google's smaller frontier model.

How to Evaluate and Understand Schematron

A comprehensive guide to understanding the differences between models and how to use this demo

Understanding the Models

This demo compares three models: Schematron-3B, Schematron-8B, and Gemini 2.5 Flash. It's important to understand that Schematron and Gemini 2.5 Flash are fundamentally different types of models.

Schematron Models

Schematron models are purpose-built for structured data extraction. They do not take prompts—instead, they take HTML content and a JSON schema (which can be defined via Zod or Pydantic) and directly output structured data. This design makes them extremely efficient for extraction tasks.

Gemini 2.5 Flash

In this demo, we're using Gemini without traditional prompting—we're simply providing it with the JSON schema and HTML, then extracting via strict JSON response format. However, Gemini's accuracy can be significantly improved for specific tasks by adding carefully crafted prompts. There's much more flexibility with general-purpose models like Gemini.

Despite the flexibility advantage of Gemini, you'll see that Schematron handles extraction tasks extremely intelligently and is an order of magnitude faster and cheaper. This is what makes it particularly powerful for large-scale extraction workloads.

Understanding Latency Measurements

The latency shown in this demo represents total round-trip latency from our server (a Vercel Next.js route) to the model provider and back.

This measurement is somewhat location-specific
The majority of latency comes from LLM latency, though network overhead and other factors are present
For extraction tasks, we need the entire response before we can use it, so we measure end-to-end latency
This includes both the pre-fill stage (ingesting the HTML and prompt) and the auto-regressive generation stage

Unlike chat applications, we don't care as much about time-to-first-token—what matters is total latency from request to complete response.

⚠️ Important Caveats About This Demo

Tokenizer differences: Gemini has a different tokenizer than Schematron, so input token counts can vary slightly between models
KV-cache optimizations: Since we're using Gemini 2.5 Flash through a serverless API, repeated identical requests may benefit from KV-cache hits and other optimizations. For real-world diverse scraping requests, the latency would likely be worse
Missing prompts: In real-world scraping with Gemini, you'd typically add a prompt that would increase the total token count beyond what's shown here
Latency optimization: Schematron's latency is currently optimized for throughput, not minimum latency. If you need lower-latency Schematron inference, there are many optimizations that can be made

When to Use Schematron

We recommend Schematron specifically for large-scale extraction tasks. This is where it truly shines, as its speed and cost advantages unlock use cases that simply aren't economically viable with other models.

💡 The combination of being significantly faster and cheaper means you can:

Process millions of pages cost-effectively
Build real-time extraction pipelines
Enable applications that were previously too expensive to operate

Two Ways to Use Schematron

1. Serverless API (Recommended)

Our serverless API handles all the prompt templating for you. You only need to worry about providing your HTML content and JSON schema—we take care of the rest.

Easiest to use

2. Open Source

Both Schematron models are open source and available to try. However, using them directly requires managing your own prompt templating, which can be tricky to get right.

More control

Learn More & Resources

Ready to start using Schematron? Check out these resources to learn more and get started:

📚 Documentation & Blog

• Official Blog Post: Introducing Schematron - Deep dive into how Schematron works and benchmarks
• Code Example: Scraping Companies - Practical Jupyter notebook example

🤖 Model Pages & Access

• Schematron-3B Model Page - Fast and efficient extraction
• Schematron-8B Model Page - Balanced performance and accuracy
• Hugging Face Repository - Open source models and documentation

💡 According to the official benchmarks, Schematron is 40-80x cheaper than GPT-5 while maintaining frontier-level extraction quality.