Context
Surfaced in Copilot's 2026-04-16 review of #531.
kg_microbe/query_utils/organism_queries.py::get_media_composition() builds its SQL by interpolating a list of medium IDs directly into an IN ('...') clause. Upstream inputs are currently trusted, so this is not presently exploitable — but it is a brittle pattern that:
- becomes an injection vector the moment the function is reused against unvalidated input
- silently breaks if upstream data ever contains single quotes or other SQL metacharacters
Suggested fix
Parameterize the query. Two DuckDB-friendly options:
- pass the list as a Python list bound to a
? placeholder with UNNEST on the SQL side, or
- load the IDs into a temp table and
JOIN against it.
File involved
kg_microbe/query_utils/organism_queries.py (get_media_composition)
References
- PR #531
- Copilot review at commit
1de973d, 2026-04-16T23:15Z
Context
Surfaced in Copilot's 2026-04-16 review of #531.
kg_microbe/query_utils/organism_queries.py::get_media_composition()builds its SQL by interpolating a list of medium IDs directly into anIN ('...')clause. Upstream inputs are currently trusted, so this is not presently exploitable — but it is a brittle pattern that:Suggested fix
Parameterize the query. Two DuckDB-friendly options:
?placeholder withUNNESTon the SQL side, orJOINagainst it.File involved
kg_microbe/query_utils/organism_queries.py(get_media_composition)References
1de973d, 2026-04-16T23:15Z