Agentic Tool Calling: Structured JSON Outputs and Validation with Pydantic

LLMs are notoriously bad at following strict formatting instructions. If you ask a model to "return a JSON object," you’ll often get a chatty preamble like "Sure, here is the JSON you requested:" followed by malformed syntax. When you're building agentic workflows where an LLM needs to trigger a database query or call an API, this behavior isn't just annoying—it breaks your pipeline.

After building several agentic systems, I’ve found that the only way to make tool calling production-ready is by forcing the LLM to adhere to a strict schema. Pydantic is the industry standard for this, acting as the interface between the chaotic output of an LLM and the rigid requirements of your backend logic.

Why Schema Enforcement Matters

When an agent decides to use a tool, it generates arguments based on a function signature. If the LLM hallucinates a field or gets the data type wrong, your downstream service will crash.

By using Pydantic models, you gain three critical advantages:

Type Safety: You catch invalid data before it hits your database.
Self-Documentation: The LLM uses the Pydantic schema to "understand" what parameters are required.
Structured Validation: You can define custom validators to ensure, for example, that an email field actually contains an "@" symbol or that a date is in the future.

Practical Implementation: The "Tool-First" Approach

I use a pattern where I define the Pydantic model first, then pass it to the LLM's tool-calling interface (like those provided by OpenAI, Anthropic, or LangChain).

from pydantic import BaseModel, Field, validator
from typing import Optional

# Define the expected structure for our tool
class WeatherSearch(BaseModel):
    city: str = Field(..., description="The city name, e.g., 'San Francisco'")
    unit: str = Field("celsius", description="Temperature unit: 'celsius' or 'fahrenheit'")
    days: int = Field(1, ge=1, le=7, description="Number of days to forecast")

    # Custom validation logic to ensure the city isn't just whitespace
    @validator('city')
    def city_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError("City name cannot be empty")
        return v.title()

# In your agent logic, you convert this to a JSON schema
# The LLM sees this schema and is forced to output JSON that matches it
tool_schema = WeatherSearch.model_json_schema()

Architectural Trade-offs

When you force an LLM to output structured data, you introduce a bit of latency. The model has to "think" in terms of syntax, which can sometimes lead to longer generation times.

I’ve observed that smaller, faster models (like GPT-4o-mini or Claude Haiku) are actually better at this than larger models. They are less prone to "chatty" behavior and follow system instructions more rigidly. If you are building a high-throughput agent, stick to these smaller models for tool calling; you’ll save money and get more predictable JSON.

Debugging Tips for Production

Even with Pydantic, things will go wrong. Here is how I handle failures:

The "Retry" Loop: If Pydantic throws a validation error, don't just log it and quit. Pass the error message back to the LLM as a system message. Tell it: "The previous output failed validation because [error]. Please correct it." This usually fixes the issue in one extra turn.
Log the Raw Output: Always store the raw string returned by the LLM before it hits your parser. If your Pydantic model fails, you need to see if the LLM hallucinated a key or if the JSON was truncated.
Temperature Matters: Keep your temperature setting low (0.0 to 0.2) for tool-calling agents. High temperature increases the likelihood of the LLM generating "creative" but invalid JSON syntax.

By treating the LLM output as untrusted user input—no matter how confident the model seems—you build systems that are significantly more resilient. Pydantic isn't just a library here; it's the guardrail that keeps your agent from going off the tracks.

Why Schema Enforcement Matters

Practical Implementation: The "Tool-First" Approach

Architectural Trade-offs

Debugging Tips for Production

Aditya Shenvi