8 LLM Structured Output Libraries Ranked (2026): Instructor, BAML, XGrammar & More
Comparing the best libraries for getting reliable JSON from LLMs in 2026 — Instructor, BAML, XGrammar, Vercel AI SDK, PydanticAI, Outlines, Marvin, and Mirascope. Which one should your team use?
Have broken JSON right now? Fix it free in under 1 second — no signup.
Fix My JSON →8 LLM Structured Output Libraries Ranked (2026): Instructor, BAML, XGrammar & More
Getting reliable JSON out of an LLM used to mean prompt engineering, manual parsing, and retry logic cobbled together with duct tape. In 2026 that work is handled by structured output libraries — but now there are eight serious options, and choosing the wrong one costs weeks of refactoring.
This guide ranks all eight by use case, compares the architectures that matter, and tells you exactly which one to reach for.
Why Structured Output Still Matters in 2026
Even with OpenAI's JSON Mode, Anthropic's output_config.format, and Gemini's response_schema, raw LLM JSON output fails in production. Schema violations under load, truncated arrays in long contexts, and provider-specific edge cases mean your application still needs a reliability layer.
Structured output libraries add that layer — either by constraining generation (so the model literally cannot produce invalid JSON) or by validating and retrying after generation. Understanding which approach a library uses determines whether it fits your stack.
> When even structured output fails: Complex nested schemas, very long outputs, and older provider APIs still produce broken JSON. Use AI JSONMedic to repair malformed JSON in those edge cases — it handles 95%+ of corruption patterns that structured output libraries and model-native JSON modes miss.
The Two Architectures
Before the rankings, understand the fundamental split:
Post-generation validation (Instructor, BAML, PydanticAI, Vercel AI SDK, Marvin, Mirascope) — the LLM generates tokens normally, then the library validates the output against your schema. If it fails, the library retries with the error fed back to the model as a correction prompt. Recovery rates above 95% for schemas under 15 fields. Constrained decoding (XGrammar, Outlines) — runs during token generation, masking tokens that would produce invalid JSON/schema. The model physically cannot generate output that violates your schema. No retries needed. Requires access to the inference stack (self-hosted models).If you're calling OpenAI/Anthropic/Gemini APIs, you're in the post-generation camp. If you're running vLLM, SGLang, or TensorRT-LLM on your own hardware, constrained decoding is available and dramatically faster.
1. Instructor — Best Overall for Python
GitHub stars: 11K+ | Monthly downloads: 3M+ | Language: Python (TypeScript port available)Instructor wraps any LLM API — OpenAI, Anthropic, Gemini, Cohere, Ollama, and 10+ more — with a unified Pydantic-based interface. Define your schema as a Pydantic model, pass it to instructor.patch(), and call the API as normal. Instructor handles validation, retries with error feedback, and partial streaming.
import instructor
from anthropic import Anthropic
from pydantic import BaseModel
class RepairResult(BaseModel):
fixed_json: str
error_count: int
error_types: list[str]
client = instructor.from_anthropic(Anthropic())
result = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
response_model=RepairResult,
messages=[{"role": "user", "content": "Analyze this JSON: {\"name\": \"test\", \"value\": }"}],
)
print(result.error_count) # Typed, validated
Where Instructor wins: Multi-provider projects, teams already using Pydantic, fastest time-to-working-code (under an hour for a new endpoint), automatic retry recovery above 95% for typical schemas.
Where it falls short: The retry mechanism adds latency on failure. Deeply nested schemas with 30+ fields see lower recovery rates. No build-time schema validation.
2. BAML — Best for Teams Who Want Compile-Time Safety
Approach: Schema-first DSL | Languages: Python, TypeScript, Ruby (auto-generated clients)BAML (Boundary AI Markup Language) takes a fundamentally different approach: you define schemas in .baml files using a dedicated DSL, then BAML auto-generates typed clients for your language. Think Prisma for LLMs.
// user_extraction.baml
class UserProfile {
name string
email string
role "admin" | "user" | "viewer"
metadata map<string, string>?
}
function ExtractUser(raw_text: string) -> UserProfile {
client GPT4o
prompt #"
Extract the user profile from: {{ raw_text }}
{{ ctx.output_format }}
"#
}
# Auto-generated Python client
from baml_client import b
profile = b.ExtractUser("John is an admin at [email protected]")
print(profile.role) # Type-checked at compile time
Where BAML wins: Type safety surfaced at build time (not runtime), better autocomplete in IDEs, schema changes caught before deployment, excellent for large teams with many schema types.
Where it falls short: Requires a build step in your CI/CD pipeline. More setup overhead for small projects. Custom DSL has a learning curve if your team is Pydantic-native.
3. XGrammar — Best for Self-Hosted Inference
Approach: Constrained decoding | Integration: vLLM, SGLang, TensorRT-LLM default backendXGrammar works during token generation, not after it. Using vocabulary partitioning, it masks tokens that would produce schema-invalid output — the model literally cannot generate malformed JSON. It's up to 100x faster than post-generation retry approaches for complex schemas.
from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams
llm = LLM(model="meta-llama/Meta-Llama-3.1-8B-Instruct")
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"score": {"type": "number", "minimum": 0, "maximum": 100}
},
"required": ["name", "score"]
}
params = SamplingParams(
guided_decoding=GuidedDecodingParams(json_schema=schema)
)
outputs = llm.generate("Rate this product", params)
Where XGrammar wins: Self-hosted models in high-throughput production, no retry latency, guaranteed schema compliance, already the default in vLLM and SGLang.
Where it falls short: Requires access to the inference stack — not usable with OpenAI/Anthropic/Gemini APIs. Overkill for simple schemas where Instructor retry rates are already above 99%.
4. Vercel AI SDK — Best for TypeScript / Full-Stack
Language: TypeScript | Key functions:generateObject(), streamObject()
The Vercel AI SDK is what Instructor is to Python — but built TypeScript-first. Use Zod schemas to define your output shape, and generateObject() returns a fully typed object with validation.
import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
const JsonRepairSchema = z.object({
isValid: z.boolean(),
errors: z.array(z.string()),
repaired: z.string().optional(),
});
const { object } = await generateObject({
model: openai("gpt-4o"),
schema: JsonRepairSchema,
prompt: "Analyze this JSON and list any errors: { name: 'test', }",
});
console.log(object.errors); // string[] — fully typed
Where Vercel AI SDK wins: Next.js projects (it's built for this stack), streaming structured objects with streamObject(), React hooks for UI state (useObject()), edge runtime compatibility.
Where it falls short: TypeScript-only. Less provider coverage than Instructor. Streaming structured objects adds complexity for backend-only services.
5. PydanticAI — Best for Agentic Python Pipelines
Version: v1.85.1 (April 2026) | By: Pydantic teamPydanticAI brings type safety to agentic Python workflows. Beyond structured output, it adds tool registration via decorators (auto-generating JSON schemas from type hints), dependency injection for testable agents, and integration with Pydantic Logfire for observability.
from pydantic_ai import Agent
from pydantic import BaseModel
class ParsedError(BaseModel):
error_type: str
line_number: int | None
suggestion: str
agent = Agent(
"anthropic:claude-sonnet-4-6",
result_type=ParsedError,
system_prompt="You are a JSON error analyst.",
)
result = await agent.run("Parse error: Unexpected token } at line 42")
print(result.data.error_type) # Validated ParsedError instance
Where PydanticAI wins: Agentic pipelines where tools and structured output coexist, observability requirements (Logfire), teams already on Pydantic v2.
Where it falls short: More opinionated than Instructor — the agent pattern adds overhead if you just need simple extraction.
6. Outlines — Best Open-Source Constrained Decoding
Approach: Constrained decoding | Framework: HuggingFace / transformers nativeOutlines pioneered constrained decoding for open-source models. It supports JSON Schema, Pydantic, regex, and context-free grammars — and unlike XGrammar, it's not tied to vLLM. If you're running models via HuggingFace transformers, Outlines is the constrained decoding choice.
import outlines
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
schema = '{"type": "object", "properties": {"name": {"type": "string"}, "score": {"type": "integer"}}, "required": ["name", "score"]}'
generator = outlines.generate.json(model, schema)
result = generator("Extract: John scored 94 on the test")
print(result) # Guaranteed valid JSON
Where Outlines wins: HuggingFace-native inference, fine-grained grammar control, research and experimentation with local models.
Where it falls short: Not production-optimized for throughput. XGrammar has superseded it in the vLLM/SGLang ecosystem for raw performance.
7. Marvin — Best for Pythonic AI Functions
Stars: 5K+ | Philosophy: "AI should feel like calling a function"Marvin takes the most minimal API approach — define a Python function with type hints, decorate it with @marvin.fn, and the library handles the LLM calls, prompting, and structured output automatically.
import marvin
from pydantic import BaseModel
class JsonError(BaseModel):
type: str
message: str
fixable: bool
@marvin.fn
def classify_json_error(error_message: str) -> JsonError:
"""Classify this JSON parsing error and determine if it's auto-fixable."""
result = classify_json_error("Unexpected end of JSON input at position 142")
print(result.fixable) # True — typed + LLM-powered
Where Marvin wins: Rapid prototyping, small scripts, data extraction pipelines where the "AI function" abstraction is intuitive.
Where it falls short: Less control over prompting and retry behavior compared to Instructor. Smaller community.
8. Mirascope — Best Lightweight Alternative
Philosophy: "Pydantic-native, no magic"Mirascope offers Instructor-like functionality with less internal complexity. If you want structured output without Instructor's full wrapper layer — or you're building a library on top and don't want Instructor's opinionated retry logic — Mirascope gives you fine-grained control.
from mirascope.core import anthropic, prompt_template
from pydantic import BaseModel
class ExtractedSchema(BaseModel):
fields: list[str]
is_nested: bool
depth: int
@anthropic.call("claude-sonnet-4-6", response_model=ExtractedSchema)
@prompt_template("Analyze the structure of this JSON schema: {schema}")
def analyze_schema(schema: str): ...
result = analyze_schema('{"type": "object", "properties": {"user": {"type": "object"}}}')
print(result.depth) # 2 — typed
Where Mirascope wins: Library authors, teams wanting explicit control over the retry/validation loop, minimal dependencies.
Where it falls short: Smaller ecosystem, fewer tutorials, less community momentum than Instructor.
Decision Matrix
| Library | Best For | Language | Approach | Provider Coverage |
|---|---|---|---|---|
| Instructor | Most Python projects | Python | Post-gen retry | 15+ providers |
| BAML | Large teams, compile-time safety | Python/TS/Ruby | Post-gen retry | Any via DSL |
| XGrammar | Self-hosted vLLM/SGLang | Python | Constrained decoding | Local models only |
| Vercel AI SDK | Next.js / TypeScript | TypeScript | Post-gen retry | OpenAI, Anthropic, Google |
| PydanticAI | Agentic pipelines | Python | Post-gen retry | OpenAI, Anthropic, Gemini |
| Outlines | HuggingFace research | Python | Constrained decoding | Local models only |
| Marvin | Rapid prototyping | Python | Post-gen retry | OpenAI, Anthropic |
| Mirascope | Library authors | Python | Post-gen retry | OpenAI, Anthropic |
Quick Selection Guide
Starting a new Python project? → Instructor. Largest ecosystem, fastest setup, handles 95%+ of cases. TypeScript / Next.js project? → Vercel AI SDK with Zod schemas. Running vLLM or SGLang on your own hardware? → XGrammar (already built in, zero extra setup). Building a multi-agent system with observability? → PydanticAI + Logfire. Need compile-time schema validation in CI/CD? → BAML. Research / HuggingFace transformers? → Outlines.When Structured Output Libraries Aren't Enough
Structured output libraries handle the happy path. Three situations still produce broken JSON in 2026:
- Legacy provider APIs — older Claude 3.x and GPT-3.5 endpoints without native structured output support
- Very long outputs — 4,000+ token JSON arrays where constrained decoding can degrade or retry budgets are exhausted
- Third-party integrations — webhook payloads, API responses, and no-code tool outputs that arrive broken regardless of how you prompt
For these cases, AI JSONMedic's repair engine applies 14 progressive repair layers to recover JSON that structured output libraries cannot. It handles truncated arrays, mismatched brackets, single-quoted strings, Python True/False/None literals, and trailing commas — all client-side, no upload.
If you're working through a Claude model migration (Sonnet 4 / Opus 4 retire June 15, 2026), see our Claude prefill deprecation guide for the exact migration to output_config.format. For a broader comparison of LLM JSON output approaches, see OpenAI JSON Mode vs Structured Outputs.
FAQ
What's the difference between JSON Mode and structured output libraries?
JSON Mode (OpenAI) or output_config.format (Anthropic) are provider-native features that guarantee JSON-parseable output but don't enforce your specific schema shape. Structured output libraries add schema validation, typed return values, and retry logic on top of these APIs — or bypass them entirely with constrained decoding.
Does using a structured output library eliminate the need for JSON repair?
For well-scoped schemas under 15 fields with supported providers, yes — Instructor and similar libraries recover from nearly all failures. For very long outputs, complex nested schemas, legacy APIs, or third-party integrations, malformed JSON still occurs and a repair tool like AI JSONMedic handles it.
Which library works with the most LLM providers?
Instructor supports 15+ providers via its provider wrappers, making it the most portable choice. BAML supports any provider through its DSL client configuration. XGrammar and Outlines only work with self-hosted models.
Is BAML worth the build step overhead?
For teams with 5+ schema types and multiple contributors, yes. The compile-time type checking prevents a category of production bugs that Instructor only surfaces at runtime. For solo projects or simple schemas, Instructor's zero-setup approach wins.
Can I use structured output libraries with local Ollama models?
Instructor supports Ollama via instructor.from_openai(openai.OpenAI(base_url="http://localhost:11434/v1")). For better local model reliability, Outlines with constrained decoding eliminates JSON failures entirely — it works natively with llama.cpp and HuggingFace models.
What happens to my structured output code after Sonnet 4 / Opus 4 retire on June 15?
For Instructor and most libraries, the migration is a one-line model string change: claude-sonnet-4-20250514 → claude-sonnet-4-6. API compatibility is maintained across the 4.x generation. See our Claude deprecation guide for the full migration checklist.
Still dealing with broken JSON?
Paste it in and get it fixed in under 1 second — free, no signup, no install. Works with ChatGPT, Claude, n8n, and any AI output.
Fix My JSON Free →Related Articles