OWASP September 30, 2024

OWASP Top 10 Coverage in SAST: What the Numbers Miss

"Full OWASP Top 10 coverage" means something different to every vendor. Here's how to evaluate whether a scanner's coverage claim translates to actual detection.

What "coverage" means and why it's ambiguous

Every SAST vendor lists OWASP Top 10 coverage in their feature matrix. The check mark appears next to each category. This tells you almost nothing about whether the tool will actually detect a vulnerable instance of that pattern in your codebase.

"Coverage" in this context is a category-level claim. It means the tool has at least one rule that addresses at least one manifestation of that vulnerability class. It says nothing about which languages are covered, which frameworks, which code patterns, whether the rule uses shallow pattern matching or deep taint analysis, what the false negative rate is, or whether the detection mechanism would catch anything more sophisticated than the textbook example.

This matters because security teams and engineering teams make procurement decisions based on coverage claims. A tool claiming "full OWASP Top 10 coverage" that detects CWE-89 only via string matching on cursor.execute in Python 3 with no taint tracking will miss SQL injection in Go, Java, Ruby, and any Python pattern that builds the query through intermediate variables. The coverage claim is technically true; the actual detection is a small fraction of the vulnerable surface.

The OWASP Top 10 is a risk taxonomy, not a test suite

Understanding what OWASP Top 10 actually is changes how you evaluate coverage claims. The Top 10 is a risk taxonomy: a prioritized list of the most critical web application security risks, compiled from analysis of publicly reported vulnerabilities, CVE data, and practitioner surveys. Each category aggregates multiple CWEs.

A01:2021-Broken Access Control, for example, maps to dozens of CWEs including CWE-22 (path traversal), CWE-639 (insecure direct object reference), CWE-862 (missing authorization), CWE-863 (incorrect authorization), CWE-284 (improper access control), and others. A vendor claiming "A01 coverage" might have rules for CWE-22 and nothing else. Path traversal is real and worth catching. But insecure direct object reference — which accounts for a substantial portion of real-world A01 incidents — requires understanding the application's authorization model, something SAST fundamentally cannot do through static analysis alone.

A03:2021-Injection covers SQL injection (CWE-89), cross-site scripting (CWE-79), command injection (CWE-78), LDAP injection (CWE-90), XPath injection, template injection (CWE-94), and others. A scanner with excellent CWE-89 and CWE-79 coverage but no template injection rules can claim "A03 coverage." For teams running Python Flask with Jinja2 templates, that gap is material.

Depth of detection: pattern matching vs. taint analysis

For injection-class vulnerabilities — A03, part of A01 — the depth of detection matters more than rule count. Two tools can both claim CWE-89 coverage. One uses AST-level pattern matching to find calls to cursor.execute() with string arguments. The other uses interprocedural dataflow analysis to trace taint from HTTP request parameters through function call chains to database sink calls, across file and module boundaries.

The pattern-matching tool will catch obvious cases like:

cursor.execute("SELECT * FROM users WHERE id = " + user_id)

It will miss:

def build_query(filter_val):
    return f"SELECT * FROM orders WHERE status = '{filter_val}'"

# ... in a route handler ...
query = build_query(request.args.get('status'))
db.cursor().execute(query)

The taint analysis tool traces the dataflow from request.args.get('status') through build_query() to cursor.execute() and flags the full path. The difference in real-world recall between these two detection depths is significant — particularly in production codebases where injection vulnerabilities rarely manifest as textbook single-line examples.

When evaluating coverage, ask whether the tool does interprocedural taint tracking for injection categories. Ask for the taint source list (what inputs the engine treats as tainted by default) and the sink list (what call patterns terminate a taint flow). The specificity of those answers tells you more about practical detection capability than the feature matrix checkbox.

Language-framework specificity

OWASP Top 10 coverage is also language-framework dependent in ways that aggregate coverage claims obscure. A tool with excellent Python/Django CWE-89 coverage — rules that know Django's ORM safe query API, recognize parameterized query patterns, register Django's template auto-escaping — may have no specific rules for Go's database/sql package or for Java's JDBC raw query patterns.

For engineering teams with polyglot codebases, this is a coverage gap that the vendor's matrix won't show. The matrix says "Python: CWE-89 covered, Go: CWE-89 covered" — both true, but the Python coverage may be deep taint analysis and the Go coverage may be a single pattern match on db.QueryRow( with no taint tracking.

A concrete evaluation method: take three representative code snippets from your codebase for each language — one obvious vulnerable pattern, one vulnerable pattern using an intermediate variable, one safe pattern using parameterized queries. Run them through the scanner. The detection rate on the two vulnerable patterns and the false positive rate on the safe pattern gives you a direct measurement of effective coverage for that language-CWE combination, independent of marketing claims.

What "10/10 OWASP categories covered" doesn't tell you

A05:2021-Security Misconfiguration and A09:2021-Security Logging and Monitoring Failures are OWASP Top 10 categories that SAST tools routinely claim coverage for and routinely provide minimal detection on.

Security misconfiguration is primarily a deployment-time issue: disabled TLS verification, debug mode enabled in production, default credentials, open S3 bucket policies. SAST can catch some of these patterns in code — a rule that flags ssl_verify=False in Python requests calls is legitimate and useful. But the majority of A05 findings come from infrastructure configuration that exists outside the application source code, making SAST an inherently limited tool for this category.

Security logging failures are similarly difficult for static analysis. Whether an application adequately logs authentication events, failed authorization checks, and sensitive data access is a question about what the code doesn't do — absence of logging calls in the right places — rather than the presence of a vulnerable pattern. Some SAST tools have rules for "function X has no corresponding audit log call," but these are highly specific and prone to false positives without deep knowledge of the logging framework.

We're not saying SAST coverage for these categories is worthless. We're saying that an honest evaluation of SAST's coverage posture treats A01-A03 (where taint analysis provides real detection depth) differently from A05, A07, and A09 (where coverage claims are thinner and should be supplemented with other controls — DAST, infrastructure scanning, manual review).

A better framing for coverage evaluation

Instead of asking "does this tool cover the OWASP Top 10," ask these questions:

For injection categories (A03, parts of A01): Does the tool do interprocedural taint analysis for CWE-89 and CWE-79? What are the taint source and sink definitions for my primary languages? Can I add custom sources and sinks for internal APIs?

For A02 (Cryptographic Failures): Does the tool detect hardcoded secrets (CWE-798) and weak cipher usage (CWE-327)? Does secret detection cover environment variable injection patterns or only string literals? Can it detect when keys are logged?

For A08 (Software and Data Integrity Failures): Does the tool have prototype pollution (CWE-1321) detection for JavaScript? Can it detect deserialization of untrusted data patterns in Java and Python?

The specificity of these questions will reveal whether coverage is shallow (a rule exists) or deep (the rule uses analysis techniques appropriate to the vulnerability class). A tool that can answer all of these with specific, technical responses is demonstrating coverage in the sense that matters — detection in practice, not just category presence in the feature matrix.

Coverage percentage as a headline metric is a marketing artifact. The detection depth per vulnerability class per language-framework combination is the operational measure. They are not the same thing, and confusing them produces security programs that are more confident than they should be about the actual coverage they have.