Skip to content

Add taint analysis (analysis level 3) via CodeQL#29

Open
sinha108 wants to merge 4 commits into
mainfrom
codeql-taint-analysis
Open

Add taint analysis (analysis level 3) via CodeQL#29
sinha108 wants to merge 4 commits into
mainfrom
codeql-taint-analysis

Conversation

@sinha108
Copy link
Copy Markdown
Contributor

Motivation and Context

This PR adds analysis level 3: inter-procedural taint analysis that tracks data flow from user-controlled sources to security-sensitive sinks and reports each path as a typed, severity-rated vulnerability.

The implementation delegates detection to CodeQL's codeql/python-all built-in security models rather than manually enumerating APIs, giving broad coverage (SQL injection, command injection, path traversal, XSS, SSRF, SSTI, unsafe deserialization, and 12 more vulnerability classes) without maintaining fragile pattern lists.

How Has This Been Tested?

  • Unit tests (no CodeQL required): schema validation, query generation, configuration loading and merging, the three config modes, disabled_builtin_sinks filtering, and validate_config — all run in CI without any external dependency.
  • Integration tests (require CodeQL CLI): 9 purpose-built vulnerable fixture applications (sql_injection_app, command_injection_app, path_traversal_app, xss_app, flask_app, sanitizer_app, ssti_app, deserialization_app, ssrf_app) each with known-vulnerable code. Tests assert that expected vulnerability types are detected, flow counts are correct, severity values are accurate, and sanitized paths are not reported.
  • Regression tests: existing CLI tests pass unchanged.

Breaking Changes

None. All new flags have safe defaults (--taint-defaults on, no --taint-config required) and existing invocations at levels 1 and 2 are unaffected. The AnalysisOptions dataclass has two new optional fields (taint_config, taint_use_defaults) both with defaults.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the Codellm-Devkit Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Architecture. The implementation has three layers:

  1. Config layer (taint_config_defaults.py, taint_config_loader.py): default sources/sanitizers, YAML/JSON file loading, name-based merge, disabled_builtin_sinks suppression, and validate_config warnings surfaced at load time.
  2. Query generation layer (taint_query_generator.py): dynamically generates a CodeQL DataFlow::ConfigSig / TaintTracking::Global<Config> query from the active configuration. Built-in sinks are driven by the BUILTIN_SINKS table; user-defined sinks and sources are appended as additional predicates.
  3. Analysis layer (codeql_analysis.py): executes the generated query, parses results into PyTaintFlow / PyTaintAnalysisResult Pydantic models, and resolves source/sink locations against the symbol table when available.

Config modes. Users have three options controlled by --taint-defaults / --no-taint-defaults:

  • Defaults only (no --taint-config): covers most projects out of the box.
  • Union: --taint-config file.yaml extends the defaults with project-specific sources/sinks.
  • Custom only: --taint-config file.yaml --no-taint-defaults replaces all defaults, useful for scoped audits.

sinha108 added 4 commits May 15, 2026 12:40
…xtures.

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
…models; add

related test fixtures and unit tests.

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
…iltin_sinks, three-mode config control, and validation

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
@sinha108 sinha108 requested a review from rahlk May 20, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant