Add taint analysis (analysis level 3) via CodeQL#29
Open
sinha108 wants to merge 4 commits into
Open
Conversation
…xtures. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
…models; add related test fixtures and unit tests. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
…iltin_sinks, three-mode config control, and validation Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
This PR adds analysis level 3: inter-procedural taint analysis that tracks data flow from user-controlled sources to security-sensitive sinks and reports each path as a typed, severity-rated vulnerability.
The implementation delegates detection to CodeQL's
codeql/python-allbuilt-in security models rather than manually enumerating APIs, giving broad coverage (SQL injection, command injection, path traversal, XSS, SSRF, SSTI, unsafe deserialization, and 12 more vulnerability classes) without maintaining fragile pattern lists.How Has This Been Tested?
disabled_builtin_sinksfiltering, andvalidate_config— all run in CI without any external dependency.sql_injection_app,command_injection_app,path_traversal_app,xss_app,flask_app,sanitizer_app,ssti_app,deserialization_app,ssrf_app) each with known-vulnerable code. Tests assert that expected vulnerability types are detected, flow counts are correct, severity values are accurate, and sanitized paths are not reported.Breaking Changes
None. All new flags have safe defaults (
--taint-defaultson, no--taint-configrequired) and existing invocations at levels 1 and 2 are unaffected. TheAnalysisOptionsdataclass has two new optional fields (taint_config,taint_use_defaults) both with defaults.Types of changes
Checklist
Additional context
Architecture. The implementation has three layers:
taint_config_defaults.py,taint_config_loader.py): default sources/sanitizers, YAML/JSON file loading, name-based merge,disabled_builtin_sinkssuppression, andvalidate_configwarnings surfaced at load time.taint_query_generator.py): dynamically generates a CodeQLDataFlow::ConfigSig/TaintTracking::Global<Config>query from the active configuration. Built-in sinks are driven by theBUILTIN_SINKStable; user-defined sinks and sources are appended as additional predicates.codeql_analysis.py): executes the generated query, parses results intoPyTaintFlow/PyTaintAnalysisResultPydantic models, and resolves source/sink locations against the symbol table when available.Config modes. Users have three options controlled by
--taint-defaults/--no-taint-defaults:--taint-config): covers most projects out of the box.--taint-config file.yamlextends the defaults with project-specific sources/sinks.--taint-config file.yaml --no-taint-defaultsreplaces all defaults, useful for scoped audits.