Text-to-SQL Pipelines: Avoiding Hallucinations in Database Queries
LLMs are great at writing code, but they are notoriously bad at writing SQL for complex, production-grade schemas. I’ve spent the last six months building internal data-querying tools, and I’ve learned the hard way that if you simply feed a schema to GPT-4 or Claude and ask for a query, you’re going to get hallucinations—column names that don't exist, joins on the wrong keys, and non-existent aggregate functions.
The problem isn't the model's reasoning; it's the lack of context and the ambiguity of natural language. Here is how I’ve been structuring my pipelines to minimize these errors and keep my database safe.
The Semantic Layer Strategy
Instead of letting the LLM look at raw DDL (Data Definition Language) exports, I build a semantic layer. If you expose 200 tables to an LLM, it will get distracted. I create a "context file" that is essentially a condensed version of the schema.
I strip out all the noise—no triggers, no indexes, no constraints. I only include table names, relevant column names, and, most importantly, a description field for every column. If a column is named status_code, the LLM needs to know that 1 means "Active" and 0 means "Archived."
Implementation: The Guarded Query Builder
I prefer using a Pydantic-based approach to enforce structure. I don't let the LLM return raw strings. I force it to return a JSON object that I can validate before it ever touches my database.
from pydantic import BaseModel, Field
from typing import List
# We define the structure we expect back from the LLM
class SQLQuery(BaseModel):
reasoning: str = Field(description="Step-by-step logic for the query")
sql: str = Field(description="The executable SQL query")
columns_used: List[str] = Field(description="List of columns used to verify existence")
def generate_safe_query(user_prompt: str, schema_context: str):
# Use a system prompt that explicitly forbids joins on non-existent keys
system_prompt = f"""
You are a SQL expert. Use the following schema context:
{schema_context}
If the user asks for data not present in the schema, return an empty query string.
Always use qualified column names (table.column).
Only use columns explicitly listed in the context.
"""
# In practice, I use instructor or LangChain's structured output
# to ensure the LLM returns the SQLQuery model above.
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}],
response_model=SQLQuery
)
Operational Trade-offs
The biggest trade-off I’ve faced is latency versus accuracy. Providing a massive schema context increases tokens and cost, but it lowers the hallucination rate.
My current workflow involves a two-step retrieval process:
- Schema Retrieval: I use a vector store (like Pinecone or pgvector) to find only the tables relevant to the user’s question. If the user asks about "monthly revenue," I only inject the
invoicesandpaymentstables into the prompt. - Validation: Before execution, I run a "dry run" using
EXPLAINin PostgreSQL. If the query fails to parse or references a column that doesn't exist in my actual metadata store, I don't execute it. I feed the error back to the LLM and ask for a correction.
Debugging Tips for Production
If you’re seeing weird behavior, stop looking at the SQL and start looking at the prompt.
- The "Case Sensitivity" Trap: I’ve had models fail because they assumed
user_idwasUser_ID. I now normalize all schema names to lowercase before they hit the prompt. - The "Limit" Problem: LLMs love to dump data. I force a
LIMIT 100in the system prompt to prevent a user from accidentally querying 50 million rows and blowing up the memory of my application server. - Human-in-the-loop: For anything involving
UPDATEorDELETEstatements, I never let the LLM execute directly. I generate the SQL, display it to the user in the UI, and require a manual "Confirm" click before the transaction starts.
Building these pipelines is less about the model's raw intelligence and more about how much guardrail you place around its input and output. Keep the schema tight, validate the JSON, and always use an EXPLAIN dry-run. That’s how you move from a prototype to a reliable production tool.
Aditya Shenvi
AI Engineer & Full-Stack Architect. Passionate about building intelligent systems, elegant UIs, and scaling web infrastructure. Open to exciting engineering opportunities in April 2026 and beyond.