By AI JSONMedic Team·June 8, 2026·13 min read["structured output""llm""json""python""typescript""instructor""baml"]

8 LLM Structured Output Libraries Ranked (2026): Instructor, BAML, XGrammar & More

Q: What's the difference between JSON Mode and structured output libraries?

JSON Mode (OpenAI) or output_config.format (Anthropic) guarantee JSON-parseable output but don't enforce your specific schema shape. Structured output libraries add schema validation, typed return values, and retry logic on top — or bypass provider APIs entirely with constrained decoding.

Q: Can I use structured output libraries with local Ollama models?

Instructor supports Ollama via its OpenAI-compatible endpoint. For better local model reliability, Outlines with constrained decoding eliminates JSON failures entirely and works natively with llama.cpp and HuggingFace models.

Comparing the best libraries for getting reliable JSON from LLMs in 2026 — Instructor, BAML, XGrammar, Vercel AI SDK, PydanticAI, Outlines, Marvin, and Mirascope. Which one should your team use?

Have broken JSON right now? Fix it free in under 1 second — no signup.

Fix My JSON →

8 LLM Structured Output Libraries Ranked (2026): Instructor, BAML, XGrammar & More

Getting reliable JSON out of an LLM used to mean prompt engineering, manual parsing, and retry logic cobbled together with duct tape. In 2026 that work is handled by structured output libraries — but now there are eight serious options, and choosing the wrong one costs weeks of refactoring.

This guide ranks all eight by use case, compares the architectures that matter, and tells you exactly which one to reach for.

Why Structured Output Still Matters in 2026

Even with OpenAI's JSON Mode, Anthropic's output_config.format, and Gemini's response_schema, raw LLM JSON output fails in production. Schema violations under load, truncated arrays in long contexts, and provider-specific edge cases mean your application still needs a reliability layer.

Structured output libraries add that layer — either by constraining generation (so the model literally cannot produce invalid JSON) or by validating and retrying after generation. Understanding which approach a library uses determines whether it fits your stack.

> When even structured output fails: Complex nested schemas, very long outputs, and older provider APIs still produce broken JSON. Use AI JSONMedic's JSON repair tool to repair malformed JSON in those edge cases — it handles 95%+ of corruption patterns that structured output libraries and model-native JSON modes miss.

The Two Architectures

Before the rankings, understand the fundamental split:

Post-generation validation (Instructor, BAML, PydanticAI, Vercel AI SDK, Marvin, Mirascope) — the LLM generates tokens normally, then the library validates the output against your schema. If it fails, the library retries with the error fed back to the model as a correction prompt. Recovery rates above 95% for schemas under 15 fields. Constrained decoding (XGrammar, Outlines) — runs during token generation, masking tokens that would produce invalid JSON/schema. The model physically cannot generate output that violates your schema. No retries needed. Requires access to the inference stack (self-hosted models).

If you're calling OpenAI/Anthropic/Gemini APIs, you're in the post-generation camp. If you're running vLLM, SGLang, or TensorRT-LLM on your own hardware, constrained decoding is available and dramatically faster.

1. Instructor — Best Overall for Python

GitHub stars: 11K+ | Monthly downloads: 3M+ | Language: Python (TypeScript port available)

Instructor wraps any LLM API — OpenAI, Anthropic, Gemini, Cohere, Ollama, and 10+ more — with a unified Pydantic-based interface. Define your schema as a Pydantic model, pass it to instructor.patch(), and call the API as normal. Instructor handles validation, retries with error feedback, and partial streaming.

import instructor
from anthropic import Anthropic
from pydantic import BaseModel

class RepairResult(BaseModel):
    fixed_json: str
    error_count: int
    error_types: list[str]

client = instructor.from_anthropic(Anthropic())

result = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    response_model=RepairResult,
    messages=[{"role": "user", "content": "Analyze this JSON: {\"name\": \"test\", \"value\": }"}],
)

print(result.error_count)  # Typed, validated

Where Instructor wins: Multi-provider projects, teams already using Pydantic, fastest time-to-working-code (under an hour for a new endpoint), automatic retry recovery above 95% for typical schemas. Where it falls short: The retry mechanism adds latency on failure. Deeply nested schemas with 30+ fields see lower recovery rates. No build-time schema validation.

2. BAML — Best for Teams Who Want Compile-Time Safety

Approach: Schema-first DSL | Languages: Python, TypeScript, Ruby (auto-generated clients)

BAML (Boundary AI Markup Language) takes a fundamentally different approach: you define schemas in .baml files using a dedicated DSL, then BAML auto-generates typed clients for your language. Think Prisma for LLMs.

// user_extraction.baml
class UserProfile {
  name string
  email string
  role "admin" | "user" | "viewer"
  metadata map<string, string>?
}

function ExtractUser(raw_text: string) -> UserProfile {
  client GPT4o
  prompt #"
    Extract the user profile from: {{ raw_text }}
    {{ ctx.output_format }}
  "#
}

# Auto-generated Python client
from baml_client import b

profile = b.ExtractUser("John is an admin at [email protected]")
print(profile.role)  # Type-checked at compile time

Where BAML wins: Type safety surfaced at build time (not runtime), better autocomplete in IDEs, schema changes caught before deployment, excellent for large teams with many schema types. Where it falls short: Requires a build step in your CI/CD pipeline. More setup overhead for small projects. Custom DSL has a learning curve if your team is Pydantic-native.

3. XGrammar — Best for Self-Hosted Inference

Approach: Constrained decoding | Integration: vLLM, SGLang, TensorRT-LLM default backend

XGrammar works during token generation, not after it. Using vocabulary partitioning, it masks tokens that would produce schema-invalid output — the model literally cannot generate malformed JSON. It's up to 100x faster than post-generation retry approaches for complex schemas.

from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams

llm = LLM(model="meta-llama/Meta-Llama-3.1-8B-Instruct")

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "score": {"type": "number", "minimum": 0, "maximum": 100}
    },
    "required": ["name", "score"]
}

params = SamplingParams(
    guided_decoding=GuidedDecodingParams(json_schema=schema)
)
outputs = llm.generate("Rate this product", params)

Where XGrammar wins: Self-hosted models in high-throughput production, no retry latency, guaranteed schema compliance, already the default in vLLM and SGLang. Where it falls short: Requires access to the inference stack — not usable with OpenAI/Anthropic/Gemini APIs. Overkill for simple schemas where Instructor retry rates are already above 99%.

4. Vercel AI SDK — Best for TypeScript / Full-Stack

Language: TypeScript | Key functions: generateObject(), streamObject()

The Vercel AI SDK is what Instructor is to Python — but built TypeScript-first. Use Zod schemas to define your output shape, and generateObject() returns a fully typed object with validation.

import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const JsonRepairSchema = z.object({
  isValid: z.boolean(),
  errors: z.array(z.string()),
  repaired: z.string().optional(),
});

const { object } = await generateObject({
  model: openai("gpt-4o"),
  schema: JsonRepairSchema,
  prompt: "Analyze this JSON and list any errors: { name: 'test', }",
});

console.log(object.errors); // string[] — fully typed

Where Vercel AI SDK wins: Next.js projects (it's built for this stack), streaming structured objects with streamObject(), React hooks for UI state (useObject()), edge runtime compatibility. Where it falls short: TypeScript-only. Less provider coverage than Instructor. Streaming structured objects adds complexity for backend-only services.

5. PydanticAI — Best for Agentic Python Pipelines

Version: v1.85.1 (April 2026) | By: Pydantic team

PydanticAI brings type safety to agentic Python workflows. Beyond structured output, it adds tool registration via decorators (auto-generating JSON schemas from type hints), dependency injection for testable agents, and integration with Pydantic Logfire for observability.

from pydantic_ai import Agent
from pydantic import BaseModel

class ParsedError(BaseModel):
    error_type: str
    line_number: int | None
    suggestion: str

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    result_type=ParsedError,
    system_prompt="You are a JSON error analyst.",
)

result = await agent.run("Parse error: Unexpected token } at line 42")
print(result.data.error_type)  # Validated ParsedError instance

Where PydanticAI wins: Agentic pipelines where tools and structured output coexist, observability requirements (Logfire), teams already on Pydantic v2. Where it falls short: More opinionated than Instructor — the agent pattern adds overhead if you just need simple extraction.

6. Outlines — Best Open-Source Constrained Decoding

Approach: Constrained decoding | Framework: HuggingFace / transformers native

Outlines pioneered constrained decoding for open-source models. It supports JSON Schema, Pydantic, regex, and context-free grammars — and unlike XGrammar, it's not tied to vLLM. If you're running models via HuggingFace transformers, Outlines is the constrained decoding choice.

import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

schema = '{"type": "object", "properties": {"name": {"type": "string"}, "score": {"type": "integer"}}, "required": ["name", "score"]}'

generator = outlines.generate.json(model, schema)
result = generator("Extract: John scored 94 on the test")
print(result)  # Guaranteed valid JSON

Where Outlines wins: HuggingFace-native inference, fine-grained grammar control, research and experimentation with local models. Where it falls short: Not production-optimized for throughput. XGrammar has superseded it in the vLLM/SGLang ecosystem for raw performance.

7. Marvin — Best for Pythonic AI Functions

Stars: 5K+ | Philosophy: "AI should feel like calling a function"

Marvin takes the most minimal API approach — define a Python function with type hints, decorate it with @marvin.fn, and the library handles the LLM calls, prompting, and structured output automatically.

import marvin
from pydantic import BaseModel

class JsonError(BaseModel):
    type: str
    message: str
    fixable: bool

@marvin.fn
def classify_json_error(error_message: str) -> JsonError:
    """Classify this JSON parsing error and determine if it's auto-fixable."""

result = classify_json_error("Unexpected end of JSON input at position 142")
print(result.fixable)  # True — typed + LLM-powered

Where Marvin wins: Rapid prototyping, small scripts, data extraction pipelines where the "AI function" abstraction is intuitive. Where it falls short: Less control over prompting and retry behavior compared to Instructor. Smaller community.

8. Mirascope — Best Lightweight Alternative

Philosophy: "Pydantic-native, no magic"

Mirascope offers Instructor-like functionality with less internal complexity. If you want structured output without Instructor's full wrapper layer — or you're building a library on top and don't want Instructor's opinionated retry logic — Mirascope gives you fine-grained control.

from mirascope.core import anthropic, prompt_template
from pydantic import BaseModel

class ExtractedSchema(BaseModel):
    fields: list[str]
    is_nested: bool
    depth: int

@anthropic.call("claude-sonnet-4-6", response_model=ExtractedSchema)
@prompt_template("Analyze the structure of this JSON schema: {schema}")
def analyze_schema(schema: str): ...

result = analyze_schema('{"type": "object", "properties": {"user": {"type": "object"}}}')
print(result.depth)  # 2 — typed

Where Mirascope wins: Library authors, teams wanting explicit control over the retry/validation loop, minimal dependencies. Where it falls short: Smaller ecosystem, fewer tutorials, less community momentum than Instructor.

Decision Matrix

Library	Best For	Language	Approach	Provider Coverage
Instructor	Most Python projects	Python	Post-gen retry	15+ providers
BAML	Large teams, compile-time safety	Python/TS/Ruby	Post-gen retry	Any via DSL
XGrammar	Self-hosted vLLM/SGLang	Python	Constrained decoding	Local models only
Vercel AI SDK	Next.js / TypeScript	TypeScript	Post-gen retry	OpenAI, Anthropic, Google
PydanticAI	Agentic pipelines	Python	Post-gen retry	OpenAI, Anthropic, Gemini
Outlines	HuggingFace research	Python	Constrained decoding	Local models only
Marvin	Rapid prototyping	Python	Post-gen retry	OpenAI, Anthropic
Mirascope	Library authors	Python	Post-gen retry	OpenAI, Anthropic

Quick Selection Guide

Starting a new Python project? → Instructor. Largest ecosystem, fastest setup, handles 95%+ of cases. TypeScript / Next.js project? → Vercel AI SDK with Zod schemas. Running vLLM or SGLang on your own hardware? → XGrammar (already built in, zero extra setup). Building a multi-agent system with observability? → PydanticAI + Logfire. Need compile-time schema validation in CI/CD? → BAML. Research / HuggingFace transformers? → Outlines.

When Structured Output Libraries Aren't Enough

Structured output libraries handle the happy path. Three situations still produce broken JSON in 2026:

Legacy provider APIs — older Claude 3.x and GPT-3.5 endpoints without native structured output support
Very long outputs — 4,000+ token JSON arrays where constrained decoding can degrade or retry budgets are exhausted
Third-party integrations — webhook payloads, API responses, and no-code tool outputs that arrive broken regardless of how you prompt

For these cases, AI JSONMedic's repair engine applies 14 progressive repair layers to recover JSON that structured output libraries cannot. It handles truncated arrays, mismatched brackets, single-quoted strings, Python True/False/None literals, and trailing commas — all client-side, no upload.

If you're working through a Claude model migration (Sonnet 4 / Opus 4 retire June 15, 2026), see our Claude prefill deprecation guide for the exact migration to output_config.format. For a broader comparison of LLM JSON output approaches, see OpenAI JSON Mode vs Structured Outputs.

FAQ

What's the difference between JSON Mode and structured output libraries?

JSON Mode (OpenAI) or output_config.format (Anthropic) are provider-native features that guarantee JSON-parseable output but don't enforce your specific schema shape. Structured output libraries add schema validation, typed return values, and retry logic on top of these APIs — or bypass them entirely with constrained decoding.

Does using a structured output library eliminate the need for JSON repair?

For well-scoped schemas under 15 fields with supported providers, yes — Instructor and similar libraries recover from nearly all failures. For very long outputs, complex nested schemas, legacy APIs, or third-party integrations, malformed JSON still occurs and a repair tool like AI JSONMedic handles it.

Which library works with the most LLM providers?

Instructor supports 15+ providers via its provider wrappers, making it the most portable choice. BAML supports any provider through its DSL client configuration. XGrammar and Outlines only work with self-hosted models.

Is BAML worth the build step overhead?

For teams with 5+ schema types and multiple contributors, yes. The compile-time type checking prevents a category of production bugs that Instructor only surfaces at runtime. For solo projects or simple schemas, Instructor's zero-setup approach wins.

Can I use structured output libraries with local Ollama models?

Instructor supports Ollama via instructor.from_openai(openai.OpenAI(base_url="http://localhost:11434/v1")). For better local model reliability, Outlines with constrained decoding eliminates JSON failures entirely — it works natively with llama.cpp and HuggingFace models.

What happens to my structured output code after Sonnet 4 / Opus 4 retire on June 15?

For Instructor and most libraries, the migration is a one-line model string change: claude-sonnet-4-20250514 → claude-sonnet-4-6. API compatibility is maintained across the 4.x generation. See our Claude deprecation guide for the full migration checklist.

Still dealing with broken JSON?

Paste it in and get it fixed in under 1 second — free, no signup, no install. Works with ChatGPT, Claude, n8n, and any AI output.

Fix My JSON Free →

Pydantic '1 validation error' in LLM Output: Causes and Fixes (2026)10 min read MCP outputSchema Validation Failures: Why Structured Output Breaks and How to Fix It (2026)11 min read JSON5 vs JSON: Why LLMs Produce JSON5-Flavored Output (and How to Fix It)12 min read

By AI JSONMedic Team·June 8, 2026·13 min read["structured output""llm""json""python""typescript""instructor""baml"]

8 LLM Structured Output Libraries Ranked (2026): Instructor, BAML, XGrammar & More

Comparing the best libraries for getting reliable JSON from LLMs in 2026 — Instructor, BAML, XGrammar, Vercel AI SDK, PydanticAI, Outlines, Marvin, and Mirascope. Which one should your team use?

Have broken JSON right now? Fix it free in under 1 second — no signup.

Fix My JSON →

8 LLM Structured Output Libraries Ranked (2026): Instructor, BAML, XGrammar & More

This guide ranks all eight by use case, compares the architectures that matter, and tells you exactly which one to reach for.

Why Structured Output Still Matters in 2026

The Two Architectures

Before the rankings, understand the fundamental split:

1. Instructor — Best Overall for Python

GitHub stars: 11K+ | Monthly downloads: 3M+ | Language: Python (TypeScript port available)

import instructor
from anthropic import Anthropic
from pydantic import BaseModel

class RepairResult(BaseModel):
    fixed_json: str
    error_count: int
    error_types: list[str]

client = instructor.from_anthropic(Anthropic())

result = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    response_model=RepairResult,
    messages=[{"role": "user", "content": "Analyze this JSON: {\"name\": \"test\", \"value\": }"}],
)

print(result.error_count)  # Typed, validated

2. BAML — Best for Teams Who Want Compile-Time Safety

Approach: Schema-first DSL | Languages: Python, TypeScript, Ruby (auto-generated clients)

// user_extraction.baml
class UserProfile {
  name string
  email string
  role "admin" | "user" | "viewer"
  metadata map<string, string>?
}

function ExtractUser(raw_text: string) -> UserProfile {
  client GPT4o
  prompt #"
    Extract the user profile from: {{ raw_text }}
    {{ ctx.output_format }}
  "#
}

# Auto-generated Python client
from baml_client import b

profile = b.ExtractUser("John is an admin at [email protected]")
print(profile.role)  # Type-checked at compile time

3. XGrammar — Best for Self-Hosted Inference

Approach: Constrained decoding | Integration: vLLM, SGLang, TensorRT-LLM default backend

from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams

llm = LLM(model="meta-llama/Meta-Llama-3.1-8B-Instruct")

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "score": {"type": "number", "minimum": 0, "maximum": 100}
    },
    "required": ["name", "score"]
}

params = SamplingParams(
    guided_decoding=GuidedDecodingParams(json_schema=schema)
)
outputs = llm.generate("Rate this product", params)

4. Vercel AI SDK — Best for TypeScript / Full-Stack

Language: TypeScript | Key functions: generateObject(), streamObject()

The Vercel AI SDK is what Instructor is to Python — but built TypeScript-first. Use Zod schemas to define your output shape, and generateObject() returns a fully typed object with validation.

import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const JsonRepairSchema = z.object({
  isValid: z.boolean(),
  errors: z.array(z.string()),
  repaired: z.string().optional(),
});

const { object } = await generateObject({
  model: openai("gpt-4o"),
  schema: JsonRepairSchema,
  prompt: "Analyze this JSON and list any errors: { name: 'test', }",
});

console.log(object.errors); // string[] — fully typed

5. PydanticAI — Best for Agentic Python Pipelines

Version: v1.85.1 (April 2026) | By: Pydantic team

from pydantic_ai import Agent
from pydantic import BaseModel

class ParsedError(BaseModel):
    error_type: str
    line_number: int | None
    suggestion: str

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    result_type=ParsedError,
    system_prompt="You are a JSON error analyst.",
)

result = await agent.run("Parse error: Unexpected token } at line 42")
print(result.data.error_type)  # Validated ParsedError instance

6. Outlines — Best Open-Source Constrained Decoding

Approach: Constrained decoding | Framework: HuggingFace / transformers native

import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

schema = '{"type": "object", "properties": {"name": {"type": "string"}, "score": {"type": "integer"}}, "required": ["name", "score"]}'

generator = outlines.generate.json(model, schema)
result = generator("Extract: John scored 94 on the test")
print(result)  # Guaranteed valid JSON

7. Marvin — Best for Pythonic AI Functions

Stars: 5K+ | Philosophy: "AI should feel like calling a function"

import marvin
from pydantic import BaseModel

class JsonError(BaseModel):
    type: str
    message: str
    fixable: bool

@marvin.fn
def classify_json_error(error_message: str) -> JsonError:
    """Classify this JSON parsing error and determine if it's auto-fixable."""

result = classify_json_error("Unexpected end of JSON input at position 142")
print(result.fixable)  # True — typed + LLM-powered

8. Mirascope — Best Lightweight Alternative

Philosophy: "Pydantic-native, no magic"

from mirascope.core import anthropic, prompt_template
from pydantic import BaseModel

class ExtractedSchema(BaseModel):
    fields: list[str]
    is_nested: bool
    depth: int

@anthropic.call("claude-sonnet-4-6", response_model=ExtractedSchema)
@prompt_template("Analyze the structure of this JSON schema: {schema}")
def analyze_schema(schema: str): ...

result = analyze_schema('{"type": "object", "properties": {"user": {"type": "object"}}}')
print(result.depth)  # 2 — typed

Decision Matrix

Library	Best For	Language	Approach	Provider Coverage
Instructor	Most Python projects	Python	Post-gen retry	15+ providers
BAML	Large teams, compile-time safety	Python/TS/Ruby	Post-gen retry	Any via DSL
XGrammar	Self-hosted vLLM/SGLang	Python	Constrained decoding	Local models only
Vercel AI SDK	Next.js / TypeScript	TypeScript	Post-gen retry	OpenAI, Anthropic, Google
PydanticAI	Agentic pipelines	Python	Post-gen retry	OpenAI, Anthropic, Gemini
Outlines	HuggingFace research	Python	Constrained decoding	Local models only
Marvin	Rapid prototyping	Python	Post-gen retry	OpenAI, Anthropic
Mirascope	Library authors	Python	Post-gen retry	OpenAI, Anthropic

Quick Selection Guide

When Structured Output Libraries Aren't Enough

Structured output libraries handle the happy path. Three situations still produce broken JSON in 2026:

Legacy provider APIs — older Claude 3.x and GPT-3.5 endpoints without native structured output support
Very long outputs — 4,000+ token JSON arrays where constrained decoding can degrade or retry budgets are exhausted
Third-party integrations — webhook payloads, API responses, and no-code tool outputs that arrive broken regardless of how you prompt

FAQ

What's the difference between JSON Mode and structured output libraries?

Does using a structured output library eliminate the need for JSON repair?

Which library works with the most LLM providers?

Is BAML worth the build step overhead?

Can I use structured output libraries with local Ollama models?

What happens to my structured output code after Sonnet 4 / Opus 4 retire on June 15?

Still dealing with broken JSON?

Paste it in and get it fixed in under 1 second — free, no signup, no install. Works with ChatGPT, Claude, n8n, and any AI output.

Fix My JSON Free →