toolLLM AgentsProductionFrameworks

Tool Profile: Pydantic AI

A deep-dive into PydanticAI — the type-safe, Python-native agent framework from the Pydantic team. Covers its validation-first philosophy, async agent API, dependency injection system, and a direct comparison with LangGraph for a structured research task.

AgentEngineering EditorialApril 26, 202610 min read

Share Y

The RAG for Agents article ended with a concrete retrieval tool and a note about type safety: once your agent is pulling structured data from external sources, the gap between "it works in tests" and "it works in production" often comes down to whether the LLM's output is what your application code actually expects. PydanticAI is built specifically to close that gap.

Important

This profile builds on Tool Use Patterns, which covers function calling mechanics, and the Agent Frameworks Landscape overview. The comparisons below assume familiarity with both.

PydanticAI was created by the team that built Pydantic and FastAPI — tools that became the default for validated data handling in Python web development. The stated goal is to bring the same discipline to agent development: if FastAPI made HTTP handlers type-safe and self-documenting, PydanticAI should do the same for LLM tool calls and structured outputs.

Philosophy: Validation-First, No Magic

Most agent frameworks introduce a DSL. LangChain has LCEL and chains. LangGraph has state graphs and reducers. These abstractions provide real benefits — but they also hide execution flow, produce opaque stack traces, and require learning a new mental model before you can do anything useful.

PydanticAI's answer is to add almost no abstraction at all. An agent's logic is ordinary Python: if statements, for loops, await expressions. Tool calls are registered functions with type hints. State is a dataclass you define. There is no chain to understand, no graph to visualize, no proprietary execution runtime to debug.

The cost of this minimalism is that you write more plumbing yourself. The benefit is that every line of agent code is readable, testable, and debuggable with standard Python tooling. When something goes wrong, the stack trace points to your code — not into the framework internals.

"AI is just code." — the implicit mantra of the PydanticAI docs

This philosophy is not unique — it is also evident in OpenAI Swarm's approach. What distinguishes PydanticAI is that it pairs this minimalism with a rigorous validation layer: Pydantic V2 sits at every boundary where Python code meets LLM output, enforcing schema contracts and retrying automatically when the model returns something malformed.

The Core API

Agent and Result Types

The Agent class is generic. You declare the shape of its output at instantiation:

from pydantic import BaseModel
from pydantic_ai import Agent

class ResearchSummary(BaseModel):
    topic: str
    key_findings: list[str]
    confidence: float  # 0.0 – 1.0
    sources_consulted: int

agent = Agent(
    "openai:gpt-4o",
    result_type=ResearchSummary,
    instructions="You are a research assistant. Summarize findings concisely.",
)

When you call agent.run(prompt), PydanticAI instructs the model to produce JSON matching ResearchSummary's schema, validates the response, and returns a typed RunResult[ResearchSummary]. The .data attribute is a real ResearchSummary instance — not a dictionary, not a string — and your IDE knows its type.

result = await agent.run("What are the main approaches to agent memory?")
print(result.data.confidence)       # float, not Any
print(result.data.key_findings[0])  # str, not Any

If the model returns invalid JSON or a field fails validation, PydanticAI sends the Pydantic error message back to the model and asks for a correction. This automatic retry loop runs up to a configurable limit before raising. In practice, GPT-4o and Claude 3.5 Sonnet rarely need more than one retry on a well-defined schema.

Agents as Async Python Functions

agent.run() is a coroutine. This is not a convenience wrapper — it is the primary interface, designed for production environments where concurrency matters:

import asyncio

async def process_topics(topics: list[str]) -> list[ResearchSummary]:
    # Fan out: run all agents concurrently
    results = await asyncio.gather(*[agent.run(t) for t in topics])
    return [r.data for r in results]

There is also agent.run_sync() for scripts and REPLs, but in a web service or background worker you use run() directly. No thread-pool tricks, no synchronous wrappers leaking into your async code.

Tools and the `@agent.tool` Decorator

Tools are registered with a decorator. The framework inspects the function's type hints and docstring to generate the JSON Schema the model sees — you do not write it manually:

import httpx
from pydantic_ai import RunContext

@agent.tool
async def web_search(ctx: RunContext[Deps], query: str) -> list[str]:
    """Search the web and return a list of result snippets."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "https://api.search.example.com/search",
            params={"q": query, "api_key": ctx.deps.search_api_key},
        )
        return [r["snippet"] for r in response.json()["results"]]

The ctx: RunContext[Deps] argument gives the tool access to injected dependencies (covered next). If a tool does not need the agent context — a pure utility function — use @agent.tool_plain instead:

@agent.tool_plain
def calculate_word_count(text: str) -> int:
    """Count the number of words in a text string."""
    return len(text.split())

The distinction keeps your dependency graph explicit: tool_plain functions are pure, testable without any agent machinery. tool functions have side effects or require runtime state.

Dependency Injection

This is PydanticAI's most underappreciated feature, and the one that most clearly reflects its FastAPI lineage.

In most frameworks, services that tools need — database connections, API keys, HTTP clients — end up as module-level globals or closures. Both approaches make unit testing painful: you either patch globals or reconstruct closures just to swap in a mock.

PydanticAI uses explicit dependency injection. You define a deps_type — typically a dataclass — that holds everything the agent needs at runtime:

from dataclasses import dataclass
import httpx

@dataclass
class Deps:
    search_api_key: str
    http_client: httpx.AsyncClient
    db_pool: asyncpg.Pool  # or any other resource

agent = Agent(
    "openai:gpt-4o",
    deps_type=Deps,
    result_type=ResearchSummary,
)

When running the agent, you pass a Deps instance:

async with httpx.AsyncClient() as client:
    deps = Deps(
        search_api_key=os.environ["SEARCH_KEY"],
        http_client=client,
        db_pool=pool,
    )
    result = await agent.run("Summarize recent work on agent memory", deps=deps)

For tests, swap in mocks without touching the agent definition:

async def test_agent_uses_search():
    mock_deps = Deps(
        search_api_key="test",
        http_client=MockHTTPClient(fixture="search_results.json"),
        db_pool=MockPool(),
    )
    result = await agent.run("test query", deps=mock_deps)
    assert len(result.data.key_findings) > 0

This pattern — define the interface, inject the implementation — is standard in well-tested software. PydanticAI brings it to agent code where it has historically been absent.

Model Agnosticism

Switching models is a one-line change:

# Development: fast and cheap
agent = Agent("openai:gpt-4o-mini", result_type=ResearchSummary)

# Production: high quality
agent = Agent("openai:gpt-4o", result_type=ResearchSummary)

# Anthropic alternative
agent = Agent("anthropic:claude-3-5-sonnet-latest", result_type=ResearchSummary)

# Local model via Ollama
agent = Agent("ollama:llama3.1", result_type=ResearchSummary)

The model string is the only thing that changes. Your tools, result type, and dependency injection wiring remain identical. PydanticAI handles provider-specific API formatting, tool schema serialization, and retry behavior internally.

For Azure deployments, a Provider object handles authentication separately from the model identifier, keeping credentials out of the model string.

Head-to-Head: PydanticAI vs. LangGraph

The clearest way to understand PydanticAI's trade-offs is to implement the same task in both frameworks. The task: given a topic, search the web and return a structured research summary — the kind of retrieval-plus-synthesis step at the heart of many real agents.

The task contract

# Shared output model (used in both implementations)
class ResearchSummary(BaseModel):
    topic: str
    key_findings: list[str]  # 3–5 bullet points
    confidence: float

PydanticAI implementation

from pydantic_ai import Agent, RunContext
from dataclasses import dataclass

@dataclass
class Deps:
    search_fn: callable  # injected search function

agent = Agent(
    "openai:gpt-4o",
    deps_type=Deps,
    result_type=ResearchSummary,
    instructions=(
        "You are a research assistant. Use the search tool to find information, "
        "then return a structured summary with 3-5 key findings."
    ),
)

@agent.tool
async def search(ctx: RunContext[Deps], query: str) -> list[str]:
    """Search the web and return result snippets."""
    return await ctx.deps.search_fn(query)

# Run
result = await agent.run("Recent advances in agent memory architectures", deps=Deps(search_fn=my_search))
summary: ResearchSummary = result.data  # typed, validated

Total: ~25 lines. The control flow is: agent decides to call search, gets snippets back, synthesizes them into a ResearchSummary. If the output fails validation, the framework retries automatically.

LangGraph implementation

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
import operator

class ResearchState(TypedDict):
    topic: str
    search_results: list[str]
    summary: ResearchSummary | None

def search_node(state: ResearchState) -> dict:
    results = my_search(state["topic"])
    return {"search_results": results}

def summarize_node(state: ResearchState) -> dict:
    llm = ChatOpenAI(model="gpt-4o").with_structured_output(ResearchSummary)
    summary = llm.invoke(
        f"Summarize findings on '{state['topic']}' from: {state['search_results']}"
    )
    return {"summary": summary}

# Build graph
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("summarize", summarize_node)
graph.add_edge("search", "summarize")
graph.add_edge("summarize", END)
graph.set_entry_point("search")

app = graph.compile()
result = app.invoke({"topic": "Recent advances in agent memory architectures", "search_results": [], "summary": None})
summary: ResearchSummary = result["summary"]  # dict access, not attribute

Total: ~35 lines, and this is the simple version — no conditional branching, no checkpointing, no human-in-the-loop.

Reading the comparison

Neither implementation is wrong. They reflect fundamentally different mental models:

Dimension	PydanticAI	LangGraph
Control flow	Agent decides tool sequence autonomously	Developer defines nodes and edges explicitly
State	Implicit in the run context	Explicit `TypedDict`, flows through graph
Output validation	Automatic — retries on schema failure	Manual — `.with_structured_output()` does not retry
Observability	Logfire integration built in	LangSmith integration built in
Multi-step branching	Requires hand-coded Python logic	First-class via conditional edges
Best for	Tasks where the LLM should determine its own steps	Tasks where the execution path must be predictable and auditable

The key trade-off: PydanticAI trusts the LLM to decide how many tool calls to make and in what order. LangGraph trusts the developer to pre-specify the execution graph. For a research agent with variable retrieval needs, PydanticAI's autonomous approach fits naturally. For a multi-step approval workflow where every state transition needs to be logged and potentially paused, LangGraph's explicit graph is the right choice.

Neither framework is a superset of the other. The Agent Frameworks Landscape overview covers this trade-off at a higher level — if you are undecided, that is the right starting point.

Observability

PydanticAI integrates with Pydantic Logfire out of the box. A single logfire.configure() call instruments every LLM call, tool invocation, and validation retry as a structured span:

import logfire

logfire.configure()  # reads LOGFIRE_TOKEN from env

# All subsequent agent.run() calls are traced automatically
result = await agent.run("my prompt", deps=deps)

Each span captures: model used, token count, tool arguments, tool return values, validation errors and retries, and total latency. No manual instrumentation required.

For teams already using OpenTelemetry, Logfire exports standard OTEL spans, so traces can flow into Datadog, Honeycomb, or any compatible backend without lock-in.

When to Use PydanticAI

You are building on top of FastAPI or already use Pydantic V2 — the mental model is identical and the integration is seamless.
Your agent must return structured data that downstream code will consume programmatically (APIs, databases, UI components).
Testability is a priority — the dependency injection system makes agents unit-testable without mocking framework internals.
You want to start simple and add complexity only when needed, rather than inheriting a framework's full abstraction stack from day one.

When Not to Use PydanticAI

You need complex, multi-step workflows with explicit branching, human approval queues, or long-running checkpointed state — LangGraph is better suited here.
Your team is heavily invested in the LangChain ecosystem and relies on its pre-built integrations (document loaders, retrievers, memory stores) — switching introduces migration cost without proportional gain.
The target LLM has unreliable structured output support. PydanticAI's retry loop can recover from occasional failures, but if the model consistently fails to produce valid JSON, every run becomes expensive.

Resources

GitHub — pydantic/pydantic-ai
Documentation
Introducing PydanticAI (official blog post)
Pydantic Logfire — observability companion