Skip to content

Parameterize the IN (...) list in get_media_composition() to remove SQL-injection shape #540

@turbomam

Description

@turbomam

Context

Surfaced in Copilot's 2026-04-16 review of #531.

kg_microbe/query_utils/organism_queries.py::get_media_composition() builds its SQL by interpolating a list of medium IDs directly into an IN ('...') clause. Upstream inputs are currently trusted, so this is not presently exploitable — but it is a brittle pattern that:

  • becomes an injection vector the moment the function is reused against unvalidated input
  • silently breaks if upstream data ever contains single quotes or other SQL metacharacters

Suggested fix

Parameterize the query. Two DuckDB-friendly options:

  • pass the list as a Python list bound to a ? placeholder with UNNEST on the SQL side, or
  • load the IDs into a temp table and JOIN against it.

File involved

  • kg_microbe/query_utils/organism_queries.py (get_media_composition)

References

  • PR #531
  • Copilot review at commit 1de973d, 2026-04-16T23:15Z

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions