Tool Profile: Pydantic AI
A deep-dive into PydanticAI — the type-safe, Python-native agent framework from the Pydantic team. Covers its validation-first philosophy, async agent API, dependency injection system, and a direct comparison with LangGraph for a structured research task.
The RAG for Agents article ended with a concrete retrieval tool and a note about type safety: once your agent is pulling structured data from external sources, the gap between "it works in tests" and "it works in production" often comes down to whether the LLM's output is what your application code actually expects. PydanticAI is built specifically to close that gap.
Important
This profile builds on Tool Use Patterns, which covers function calling mechanics, and the Agent Frameworks Landscape overview. The comparisons below assume familiarity with both.
PydanticAI was created by the team that built Pydantic and FastAPI — tools that became the default for validated data handling in Python web development. The stated goal is to bring the same discipline to agent development: if FastAPI made HTTP handlers type-safe and self-documenting, PydanticAI should do the same for LLM tool calls and structured outputs.
Philosophy: Validation-First, No Magic
Most agent frameworks introduce a DSL. LangChain has LCEL and chains. LangGraph has state graphs and reducers. These abstractions provide real benefits — but they also hide execution flow, produce opaque stack traces, and require learning a new mental model before you can do anything useful.
PydanticAI's answer is to add almost no abstraction at all. An agent's logic is ordinary Python: if statements, for loops, await expressions. Tool calls are registered functions with type hints. State is a dataclass you define. There is no chain to understand, no graph to visualize, no proprietary execution runtime to debug.
The cost of this minimalism is that you write more plumbing yourself. The benefit is that every line of agent code is readable, testable, and debuggable with standard Python tooling. When something goes wrong, the stack trace points to your code — not into the framework internals.
"AI is just code." — the implicit mantra of the PydanticAI docs
This philosophy is not unique — it is also evident in OpenAI Swarm's approach. What distinguishes PydanticAI is that it pairs this minimalism with a rigorous validation layer: Pydantic V2 sits at every boundary where Python code meets LLM output, enforcing schema contracts and retrying automatically when the model returns something malformed.
The Core API
Agent and Result Types
The Agent class is generic. You declare the shape of its output at instantiation:
from pydantic import BaseModel
from pydantic_ai import Agent
class ResearchSummary(BaseModel):
topic: str
key_findings: list[str]
confidence: float # 0.0 – 1.0
sources_consulted: int
agent = Agent(
"openai:gpt-4o",
result_type=ResearchSummary,
instructions="You are a research assistant. Summarize findings concisely.",
)
When you call agent.run(prompt), PydanticAI instructs the model to produce JSON matching ResearchSummary's schema, validates the response, and returns a typed RunResult[ResearchSummary]. The .data attribute is a real ResearchSummary instance — not a dictionary, not a string — and your IDE knows its type.
result = await agent.run("What are the main approaches to agent memory?")
print(result.data.confidence) # float, not Any
print(result.data.key_findings[0]) # str, not Any
If the model returns invalid JSON or a field fails validation, PydanticAI sends the Pydantic error message back to the model and asks for a correction. This automatic retry loop runs up to a configurable limit before raising. In practice, GPT-4o and Claude 3.5 Sonnet rarely need more than one retry on a well-defined schema.
Agents as Async Python Functions
agent.run() is a coroutine. This is not a convenience wrapper — it is the primary interface, designed for production environments where concurrency matters:
import asyncio
async def process_topics(topics: list[str]) -> list[ResearchSummary]:
# Fan out: run all agents concurrently
results = await asyncio.gather(*[agent.run(t) for t in topics])
return [r.data for r in results]
There is also agent.run_sync() for scripts and REPLs, but in a web service or background worker you use run() directly. No thread-pool tricks, no synchronous wrappers leaking into your async code.
Tools and the @agent.tool Decorator
Tools are registered with a decorator. The framework inspects the function's type hints and docstring to generate the JSON Schema the model sees — you do not write it manually:
import httpx
from pydantic_ai import RunContext
@agent.tool
async def web_search(ctx: RunContext[Deps], query: str) -> list[str]:
"""Search the web and return a list of result snippets."""
async with httpx.AsyncClient() as client:
response = await client.get(
"https://api.search.example.com/search",
params={"q": query, "api_key": ctx.deps.search_api_key},
)
return [r["snippet"] for r in response.json()["results"]]
The ctx: RunContext[Deps] argument gives the tool access to injected dependencies (covered next). If a tool does not need the agent context — a pure utility function — use @agent.tool_plain instead:
@agent.tool_plain
def calculate_word_count(text: str) -> int:
"""Count the number of words in a text string."""
return len(text.split())
The distinction keeps your dependency graph explicit: tool_plain functions are pure, testable without any agent machinery. tool functions have side effects or require runtime state.
Dependency Injection
This is PydanticAI's most underappreciated feature, and the one that most clearly reflects its FastAPI lineage.
In most frameworks, services that tools need — database connections, API keys, HTTP clients — end up as module-level globals or closures. Both approaches make unit testing painful: you either patch globals or reconstruct closures just to swap in a mock.
PydanticAI uses explicit dependency injection. You define a deps_type — typically a dataclass — that holds everything the agent needs at runtime:
from dataclasses import dataclass
import httpx
@dataclass
class Deps:
search_api_key: str
http_client: httpx.AsyncClient
db_pool: asyncpg.Pool # or any other resource
agent = Agent(
"openai:gpt-4o",
deps_type=Deps,
result_type=ResearchSummary,
)
When running the agent, you pass a Deps instance:
async with httpx.AsyncClient() as client:
deps = Deps(
search_api_key=os.environ["SEARCH_KEY"],
http_client=client,
db_pool=pool,
)
result = await agent.run("Summarize recent work on agent memory", deps=deps)
For tests, swap in mocks without touching the agent definition:
async def test_agent_uses_search():
mock_deps = Deps(
search_api_key="test",
http_client=MockHTTPClient(fixture="search_results.json"),
db_pool=MockPool(),
)
result = await agent.run("test query", deps=mock_deps)
assert len(result.data.key_findings) > 0
This pattern — define the interface, inject the implementation — is standard in well-tested software. PydanticAI brings it to agent code where it has historically been absent.
Model Agnosticism
Switching models is a one-line change:
# Development: fast and cheap
agent = Agent("openai:gpt-4o-mini", result_type=ResearchSummary)
# Production: high quality
agent = Agent("openai:gpt-4o", result_type=ResearchSummary)
# Anthropic alternative
agent = Agent("anthropic:claude-3-5-sonnet-latest", result_type=ResearchSummary)
# Local model via Ollama
agent = Agent("ollama:llama3.1", result_type=ResearchSummary)
The model string is the only thing that changes. Your tools, result type, and dependency injection wiring remain identical. PydanticAI handles provider-specific API formatting, tool schema serialization, and retry behavior internally.
For Azure deployments, a Provider object handles authentication separately from the model identifier, keeping credentials out of the model string.
Head-to-Head: PydanticAI vs. LangGraph
The clearest way to understand PydanticAI's trade-offs is to implement the same task in both frameworks. The task: given a topic, search the web and return a structured research summary — the kind of retrieval-plus-synthesis step at the heart of many real agents.
The task contract
# Shared output model (used in both implementations)
class ResearchSummary(BaseModel):
topic: str
key_findings: list[str] # 3–5 bullet points
confidence: float
PydanticAI implementation
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass
@dataclass
class Deps:
search_fn: callable # injected search function
agent = Agent(
"openai:gpt-4o",
deps_type=Deps,
result_type=ResearchSummary,
instructions=(
"You are a research assistant. Use the search tool to find information, "
"then return a structured summary with 3-5 key findings."
),
)
@agent.tool
async def search(ctx: RunContext[Deps], query: str) -> list[str]:
"""Search the web and return result snippets."""
return await ctx.deps.search_fn(query)
# Run
result = await agent.run("Recent advances in agent memory architectures", deps=Deps(search_fn=my_search))
summary: ResearchSummary = result.data # typed, validated
Total: ~25 lines. The control flow is: agent decides to call search, gets snippets back, synthesizes them into a ResearchSummary. If the output fails validation, the framework retries automatically.
LangGraph implementation
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
import operator
class ResearchState(TypedDict):
topic: str
search_results: list[str]
summary: ResearchSummary | None
def search_node(state: ResearchState) -> dict:
results = my_search(state["topic"])
return {"search_results": results}
def summarize_node(state: ResearchState) -> dict:
llm = ChatOpenAI(model="gpt-4o").with_structured_output(ResearchSummary)
summary = llm.invoke(
f"Summarize findings on '{state['topic']}' from: {state['search_results']}"
)
return {"summary": summary}
# Build graph
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("summarize", summarize_node)
graph.add_edge("search", "summarize")
graph.add_edge("summarize", END)
graph.set_entry_point("search")
app = graph.compile()
result = app.invoke({"topic": "Recent advances in agent memory architectures", "search_results": [], "summary": None})
summary: ResearchSummary = result["summary"] # dict access, not attribute
Total: ~35 lines, and this is the simple version — no conditional branching, no checkpointing, no human-in-the-loop.
Reading the comparison
Neither implementation is wrong. They reflect fundamentally different mental models:
| Dimension | PydanticAI | LangGraph |
|---|---|---|
| Control flow | Agent decides tool sequence autonomously | Developer defines nodes and edges explicitly |
| State | Implicit in the run context | Explicit TypedDict, flows through graph |
| Output validation | Automatic — retries on schema failure | Manual — .with_structured_output() does not retry |
| Observability | Logfire integration built in | LangSmith integration built in |
| Multi-step branching | Requires hand-coded Python logic | First-class via conditional edges |
| Best for | Tasks where the LLM should determine its own steps | Tasks where the execution path must be predictable and auditable |
The key trade-off: PydanticAI trusts the LLM to decide how many tool calls to make and in what order. LangGraph trusts the developer to pre-specify the execution graph. For a research agent with variable retrieval needs, PydanticAI's autonomous approach fits naturally. For a multi-step approval workflow where every state transition needs to be logged and potentially paused, LangGraph's explicit graph is the right choice.
Neither framework is a superset of the other. The Agent Frameworks Landscape overview covers this trade-off at a higher level — if you are undecided, that is the right starting point.
Observability
PydanticAI integrates with Pydantic Logfire out of the box. A single logfire.configure() call instruments every LLM call, tool invocation, and validation retry as a structured span:
import logfire
logfire.configure() # reads LOGFIRE_TOKEN from env
# All subsequent agent.run() calls are traced automatically
result = await agent.run("my prompt", deps=deps)
Each span captures: model used, token count, tool arguments, tool return values, validation errors and retries, and total latency. No manual instrumentation required.
For teams already using OpenTelemetry, Logfire exports standard OTEL spans, so traces can flow into Datadog, Honeycomb, or any compatible backend without lock-in.
When to Use PydanticAI
- You are building on top of FastAPI or already use Pydantic V2 — the mental model is identical and the integration is seamless.
- Your agent must return structured data that downstream code will consume programmatically (APIs, databases, UI components).
- Testability is a priority — the dependency injection system makes agents unit-testable without mocking framework internals.
- You want to start simple and add complexity only when needed, rather than inheriting a framework's full abstraction stack from day one.
When Not to Use PydanticAI
- You need complex, multi-step workflows with explicit branching, human approval queues, or long-running checkpointed state — LangGraph is better suited here.
- Your team is heavily invested in the LangChain ecosystem and relies on its pre-built integrations (document loaders, retrievers, memory stores) — switching introduces migration cost without proportional gain.
- The target LLM has unreliable structured output support. PydanticAI's retry loop can recover from occasional failures, but if the model consistently fails to produce valid JSON, every run becomes expensive.
Resources
- GitHub — pydantic/pydantic-ai
- Documentation
- Introducing PydanticAI (official blog post)
- Pydantic Logfire — observability companion