Custom Rules December 16, 2024

Writing Custom SAST Rules in YAML: A Practical Guide

Built-in rules cover the OWASP Top 10. Custom rules cover the things your codebase does that no one else does. Here's the pattern syntax you need to know.

Why built-in rules aren't enough

A SAST tool's built-in rule library covers the vulnerability classes that appear across codebases broadly: SQL injection (CWE-89), cross-site scripting (CWE-79), path traversal (CWE-22), server-side request forgery (CWE-918), insecure deserialization, hardcoded credentials (CWE-798), and the rest of the OWASP Top 10 family. These rules are written against common patterns in common frameworks — Django ORM, Flask route handlers, Express.js middleware, Spring MVC, and so on.

Your codebase isn't the average codebase. It has internal libraries with their own API surfaces. It has custom authentication middleware that wraps the framework's built-in auth. It has data access patterns specific to your domain — maybe a graph database client, a message queue consumer pattern, or a custom serialization layer that has its own injection risk profile. Built-in rules know nothing about any of this.

Custom rules are how you extend the scanner's knowledge to your specific system. This guide walks through the YAML rule DSL that Gritcadence uses — which is closely derived from the Semgrep rule schema, the de facto open standard for SAST rule authoring — to write rules that match your codebase's specific patterns, track taint through your internal APIs, and test that the rules behave correctly before deploying them to the PR pipeline.

Rule anatomy: the minimal structure

A valid rule has five required fields: id, message, severity, languages, and at least one pattern key. Here's the minimal structure:

rules:
  - id: grcd-internal-sql-raw
    message: |
      Direct string interpolation into db.execute() call.
      Use parameterized queries: db.execute(query, (param,))
    severity: HIGH
    languages: [python]
    patterns:
      - pattern: db.execute($QUERY + ...)
      - pattern: db.execute(f"... {$VAR} ...")

The id should be stable and unique — it appears in SARIF output and in suppression file entries, so changing it after deployment orphans existing suppressions. The message is what appears in the PR comment; it should include both the vulnerability description and a remediation path. severity controls whether the finding blocks a merge (HIGH) or appears as an annotation (MEDIUM/LOW).

Pattern matching and AST-level matching

The pattern key uses AST-level matching, not string matching. This distinction is critical. String-level matching would flag db.execute(query + user_input) but miss db.execute( query + user_input ) because of whitespace. AST matching operates on the abstract syntax tree parsed from the code, so whitespace, comments, and minor syntactic variations are normalized before matching. The pattern db.execute($QUERY + ...) matches any call to db.execute where the first argument involves concatenation, regardless of formatting.

Metavariables (prefixed with $) bind to specific AST nodes in the match. $QUERY will bind to whatever expression appears as the first argument. You can then reference that binding in pattern-not or metavariable-pattern conditions:

patterns:
  - pattern: cursor.execute($QUERY)
  - pattern-not: cursor.execute("...")   # literal string — safe
  - pattern-not: cursor.execute($QUERY)
    where:
      - metavariable-pattern:
          metavariable: $QUERY
          pattern: SQL_SAFE_QUERY

The pattern-not condition excludes matches where the argument is a string literal — parameterized queries with hardcoded SQL. The metavariable-pattern condition excludes matches where the variable name follows a naming convention you've established (e.g., a constant naming rule for safe SQL strings). These conditions narrow precision without reducing recall for the actual vulnerable pattern.

Taint tracking: sources and sinks

Pattern matching finds structural patterns in code. Taint tracking finds dataflow paths between untrusted data sources and sensitive sinks. For injection vulnerabilities, you need taint tracking — a pattern match alone won't tell you whether the argument to db.execute() came from user input or from a config file.

Taint rules declare sources, sinks, and optionally propagators and sanitizers:

rules:
  - id: grcd-ssrf-http-client
    message: |
      SSRF risk: HTTP request URL derives from user-controlled input.
      Validate against an allowlist before making outbound requests.
      CWE-918.
    severity: HIGH
    languages: [python]
    mode: taint
    pattern-sources:
      - patterns:
          - pattern: request.args.get(...)
          - pattern: request.form.get(...)
          - pattern: request.json.get(...)
    pattern-sinks:
      - patterns:
          - pattern: requests.get($URL, ...)
          - pattern: requests.post($URL, ...)
          - pattern: httpx.get($URL, ...)
    pattern-sanitizers:
      - pattern: validate_url_allowlist($URL)

The engine traces dataflow from every declared source to every declared sink. If a value reachable from a source reaches a sink without passing through a sanitizer, it's flagged. The sanitizer declaration teaches the engine about your validate_url_allowlist function — it won't flag calls where the URL has passed through your validator.

This is the mechanism that eliminates the sanitizer blindness false positives discussed in other articles: register your internal sanitization functions as pattern-sanitizers, and the taint engine correctly models your security controls.

Testing your rules before deployment

A rule that fires on real vulnerabilities is valuable. A rule that fires on safe code destroys trust. Testing rules before deploying them to the PR pipeline is not optional.

The test harness uses annotated test fixtures — code files with inline comments that declare the expected finding behavior:

# grcd_test_ssrf.py

import requests
from flask import request
from validators import validate_url_allowlist

# ruleid: grcd-ssrf-http-client
url = request.args.get('target')
requests.get(url)  # should flag: tainted URL to requests.get

# ok: grcd-ssrf-http-client
url = request.args.get('target')
safe_url = validate_url_allowlist(url)
requests.get(safe_url)  # should NOT flag: sanitizer in path

# ok: grcd-ssrf-http-client
requests.get("https://api.internal.example.com/health")  # literal — safe

The # ruleid: annotation tells the test runner that the immediately following code should produce a finding for that rule ID. The # ok: annotation tells it that the following code should not produce a finding. Running the test suite against this fixture verifies both that the rule fires when it should (recall) and doesn't fire when it shouldn't (precision).

The test suite should be committed alongside the rule definition and run in CI. Rules that fail their own tests don't get deployed. This sounds obvious but is frequently skipped in practice — teams write the rule, deploy it, and discover the false positive rate empirically in production. The test harness makes the feedback loop much tighter.

Rule organization and the suppression file relationship

Custom rules accumulate over time. A codebase that's been running a SAST tool for a year may have 20-40 custom rules covering internal libraries, specific data access patterns, and business-logic security constraints. Organization matters for maintainability.

A practical structure groups rules by security category: rules/injection/, rules/auth/, rules/crypto/, rules/secrets/. Within each category, individual YAML files named by the rule ID. Each rule file includes a comment header with the author, creation date, last-tested date, and a CWE or internal threat model reference. The CWE reference — CWE-89 for SQLi, CWE-79 for XSS, CWE-918 for SSRF — connects the rule to the public vulnerability taxonomy, which helps when onboarding new engineers and when auditing rule coverage against the OWASP Top 10 or MITRE ATT&CK mappings.

The relationship between custom rules and the suppression file needs explicit policy. When an engineer suppresses a custom rule finding, the suppression entry should include the reason the specific instance is not exploitable — not "not applicable" but "URL is validated against the INTERNAL_API_ALLOWLIST constant before reaching this call." That specificity makes quarterly suppression audits meaningful: you can check whether the allowlist is still in place, whether the code has changed, whether the suppression is still accurate.

Where custom rules return the most value

Not every codebase feature needs a custom rule. The highest-value places to write custom rules are the patterns where built-in rules have high false positive rates because they don't know about your specific framework, and the internal API surfaces that handle security-sensitive operations.

The internal API category is often overlooked: your authentication service client, your encryption utility library, your audit logging functions. If a developer calls the authentication service bypass method in a code path where it shouldn't be called, or calls the encryption function with an insecure key size, those are security issues that no generic rule will catch — because the generic rule doesn't know your internal library exists. Custom rules for internal sensitive APIs are where SAST coverage genuinely extends beyond what any off-the-shelf tool can provide.