Add taint analysis (analysis level 3) via CodeQL by sinha108 · Pull Request #29 · codellm-devkit/codeanalyzer-python

sinha108 · 2026-05-20T21:25:33Z

Motivation and Context

This PR adds analysis level 3: inter-procedural taint analysis that tracks data flow from user-controlled sources to security-sensitive sinks and reports each path as a typed, severity-rated vulnerability.

The implementation delegates detection to CodeQL's codeql/python-all built-in security models rather than manually enumerating APIs, giving broad coverage (SQL injection, command injection, path traversal, XSS, SSRF, SSTI, unsafe deserialization, and 12 more vulnerability classes) without maintaining fragile pattern lists.

How Has This Been Tested?

Unit tests (no CodeQL required): schema validation, query generation, configuration loading and merging, the three config modes, disabled_builtin_sinks filtering, and validate_config — all run in CI without any external dependency.
Integration tests (require CodeQL CLI): 9 purpose-built vulnerable fixture applications (sql_injection_app, command_injection_app, path_traversal_app, xss_app, flask_app, sanitizer_app, ssti_app, deserialization_app, ssrf_app) each with known-vulnerable code. Tests assert that expected vulnerability types are detected, flow counts are correct, severity values are accurate, and sanitized paths are not reported.
Regression tests: existing CLI tests pass unchanged.

Breaking Changes

None. All new flags have safe defaults (--taint-defaults on, no --taint-config required) and existing invocations at levels 1 and 2 are unaffected. The AnalysisOptions dataclass has two new optional fields (taint_config, taint_use_defaults) both with defaults.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the Codellm-Devkit Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Architecture. The implementation has three layers:

Config layer (taint_config_defaults.py, taint_config_loader.py): default sources/sanitizers, YAML/JSON file loading, name-based merge, disabled_builtin_sinks suppression, and validate_config warnings surfaced at load time.
Query generation layer (taint_query_generator.py): dynamically generates a CodeQL DataFlow::ConfigSig / TaintTracking::Global<Config> query from the active configuration. Built-in sinks are driven by the BUILTIN_SINKS table; user-defined sinks and sources are appended as additional predicates.
Analysis layer (codeql_analysis.py): executes the generated query, parses results into PyTaintFlow / PyTaintAnalysisResult Pydantic models, and resolves source/sink locations against the symbol table when available.

Config modes. Users have three options controlled by --taint-defaults / --no-taint-defaults:

Defaults only (no --taint-config): covers most projects out of the box.
Union: --taint-config file.yaml extends the defaults with project-specific sources/sinks.
Custom only: --taint-config file.yaml --no-taint-defaults replaces all defaults, useful for scoped audits.

…xtures. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

…models; add related test fixtures and unit tests. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

…iltin_sinks, three-mode config control, and validation Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

sinha108 added 4 commits May 15, 2026 12:40

Implementation of taint analysis with CodeQL, along with tests and fi…

7e03cfc

…xtures. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

Expand taint analysis to use all applicable CodeQL built-in security …

08ee3c9

…models; add related test fixtures and unit tests. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

Improve taint analysis extensibility: fix merge bugs, add disabled_bu…

509a541

…iltin_sinks, three-mode config control, and validation Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

Add test case with taint config in json format; add user guide

d0d1568

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>

sinha108 requested a review from rahlk May 20, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add taint analysis (analysis level 3) via CodeQL#29

Add taint analysis (analysis level 3) via CodeQL#29
sinha108 wants to merge 4 commits into
mainfrom
codeql-taint-analysis

sinha108 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sinha108 commented May 20, 2026

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant