Instructor: Structured Outputs for LLMs with Pydantic and Python

Summary
Instructor is a powerful Python library that simplifies extracting structured data from Large Language Models (LLMs). It integrates Pydantic for robust validation, type safety, and IDE support, eliminating the need for manual JSON parsing, error handling, and retries. This tool provides a streamlined and reliable way to get structured outputs from any LLM.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Instructor is a powerful Python library designed to simplify the process of extracting structured data from Large Language Models (LLMs). It leverages Pydantic to provide robust validation, type safety, and excellent IDE support, making it easier to get reliable JSON outputs from any LLM. This tool eliminates the need for manual JSON parsing, error handling, and retries, streamlining LLM integration into your applications.
Installation
Getting started with Instructor is straightforward. You can install it using pip:
pip install instructor
For other package managers, you can use:
uv add instructor
poetry add instructor
Examples
Instructor allows you to define your desired output structure using Pydantic models and then extract that structure directly from natural language. Here's a basic example:
import instructor
from pydantic import BaseModel
# Define what you want
class User(BaseModel):
name: str
age: int
# Extract it from natural language
client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "John is 25 years old"}],
)
print(user) # User(name='John', age=25)
Instructor also supports advanced features like automatic retries for failed validations, streaming partial objects, and extracting complex nested data structures, making it suitable for production environments.
Why Use Instructor?
Instructor addresses common challenges in working with LLMs by offering several key advantages:
- Simplified LLM Interactions: It abstracts away the complexity of writing intricate JSON schemas, handling validation errors, managing retries, and parsing unstructured responses.
- Pydantic Integration: By building on Pydantic, Instructor provides out-of-the-box type safety, data validation, and enhanced developer experience with IDE support.
- Provider Agnostic: Use the same simple API across various LLM providers, including OpenAI, Anthropic, Google, and local models like Ollama.
- Production-Ready Features: Includes automatic retries with error feedback for validation failures and streaming support for partial object generation, ensuring robust applications.
- Battle-Tested: Trusted by over 100,000 developers and companies, with millions of monthly downloads and thousands of GitHub stars, proving its reliability in real-world scenarios.
Compared to alternatives:
- vs Raw JSON mode: Instructor offers automatic validation, retries, streaming, and nested object support without manual schema writing.
- vs LangChain/LlamaIndex: Instructor is a lighter, faster, and more focused solution specifically for structured extraction.
- vs Custom solutions: It's a battle-tested library that handles edge cases and provides a robust foundation for your AI applications.
Links
Explore Instructor further with these official resources: