Building an AST-Based Vulnerability Scanner with LLM Security Analysis

Static analysis tools often fall into two camps: pattern-based linters that miss context, or heavy-weight scanners that generate endless noise. When I started building a custom vulnerability scanner for my team, I realized the sweet spot wasn't just matching regex—it was understanding the code's structure through Abstract Syntax Trees (AST) and offloading the nuanced, "is this actually exploitable?" logic to an LLM.

The Architectural Approach

The core idea is simple: don't send the entire codebase to an LLM. That’s expensive, slow, and prone to hallucinations. Instead, I use a two-pass system.

The AST Filter: I use tree-sitter to parse the source code. I define queries to extract only the sensitive parts—think database queries, file system operations, or authentication checks.
The Contextual Analysis: I send only those specific code snippets to a local or API-based LLM, providing it with the surrounding function signature and imports to give it the necessary context for a security verdict.

This reduces token usage by 90% and keeps the analysis focused on high-risk nodes.

Implementing the AST Extractor

I prefer Python for this because the tree-sitter bindings are robust and easy to debug. Here is a simplified version of how I extract dangerous sinks from a Python file.

from tree_sitter import Language, Parser

# Load your grammar (python.so must be compiled)
PY_LANGUAGE = Language('build/my-languages.so', 'python')
parser = Parser()
parser.set_language(PY_LANGUAGE)

def extract_dangerous_calls(source_code: str):
    tree = parser.parse(bytes(source_code, "utf8"))
    
    # S-expression query to find SQL execution patterns
    query = PY_LANGUAGE.query("""
        (call 
            function: (attribute 
                attribute: (identifier) @func_name (#eq? @func_name "execute"))
            arguments: (argument_list (string) @sql_query))
    """)
    
    captures = query.captures(tree.root_node)
    results = []
    
    for node, name in captures:
        if name == 'sql_query':
            results.append(node.text.decode('utf8'))
            
    return results

# Usage example:
# If the code has cursor.execute("SELECT * FROM users WHERE id = " + user_input)
# The scanner flags the string concatenation for LLM inspection.

Integrating the LLM Analysis

Once I have the suspicious snippet, I pass it to a model with a specific system prompt. I've found that giving the LLM a role—like "Senior Security Auditor"—and providing a JSON schema for the output works best for automation.

Don't just ask "Is this safe?" Instead, use a prompt like:

"Analyze this SQL query construction for SQL Injection. If the query uses string formatting or concatenation with untrusted input, return 'VULNERABLE'. Otherwise, return 'SECURE'."

Operational Trade-offs

I’ve learned the hard way that you cannot trust the LLM with the final decision on production deploys. I treat the LLM output as a "High-Confidence Signal" rather than a source of truth.

False Positives: If the LLM flags something, I cross-reference it against the AST node's line number in my CI/CD pipeline.
Latency: Running the AST parser takes milliseconds, but the LLM call takes seconds. I run the AST pass on every commit, but the LLM pass only on Pull Requests to keep the developer loop tight.
Context Window: Even if you only send snippets, ensure you include the function definition line. If the LLM doesn't see the function arguments, it can't know if the input is sanitized or coming straight from a request object.

Debugging Tips

When building this, keep an eye on how tree-sitter handles malformed code. If a developer pushes a syntax error, the AST parser might return an empty tree, causing your scanner to skip the file entirely. I added a pre-check: if tree.root_node.has_error is true, I log a warning to the developer to fix their syntax before the security scan runs.

Also, always version your LLM prompts. A prompt that works for GPT-4o might fail miserably on a smaller, fine-tuned Llama 3 model. I store my prompts in a prompts/ directory within the repo, treating them as code that needs review.

The Architectural Approach

Implementing the AST Extractor

Integrating the LLM Analysis

Operational Trade-offs

Debugging Tips

Aditya Shenvi