Skip to content

Normalize OWASP cheat sheet references#865

Open
Bornunique911 wants to merge 10 commits into
OWASP:mainfrom
Bornunique911:review/issue-471-cheatsheet-references
Open

Normalize OWASP cheat sheet references#865
Bornunique911 wants to merge 10 commits into
OWASP:mainfrom
Bornunique911:review/issue-471-cheatsheet-references

Conversation

@Bornunique911
Copy link
Copy Markdown
Contributor

@Bornunique911 Bornunique911 commented Apr 6, 2026

Summary

This PR addresses part of issue #471 by normalizing OWASP cheat sheet references used by the OWASP resource mapping flow.

This is the third upstream PR in the stacked #471 review series.

Problem Fixed

Cheat sheet references were inconsistent, which made later mapping and analysis less reliable.

Solution

Normalized cheat sheet references and updated parser expectations/tests.

Tests

./venv/bin/python -m pytest application/tests/cheatsheets_parser_test.py -q

Why this is split out

The full #471 work is too large to review effectively as one PR.

This PR isolates one OWASP resource family so the parser/data model can be reviewed independently before the later Kubernetes, cheat sheet, backend analysis, and frontend changes.

@Bornunique911
Copy link
Copy Markdown
Contributor Author

Before :
image

After :
image

@Bornunique911
Copy link
Copy Markdown
Contributor Author

Requesting kind reviews and feedback for this feature from : @northdpole , @Pa04rth , @robvanderveer

@Bornunique911 Bornunique911 force-pushed the review/issue-471-cheatsheet-references branch from ef1baee to 901c9a0 Compare April 11, 2026 07:58
@Bornunique911 Bornunique911 force-pushed the review/issue-471-cheatsheet-references branch 2 times, most recently from 10585d9 to 38051c9 Compare April 22, 2026 21:00
@Bornunique911
Copy link
Copy Markdown
Contributor Author

The fourth upstream PR in the stacked #471 review series is #877 .

@Bornunique911 Bornunique911 force-pushed the review/issue-471-cheatsheet-references branch 2 times, most recently from 386a106 to 20b0aab Compare April 30, 2026 19:18
@Bornunique911 Bornunique911 force-pushed the review/issue-471-cheatsheet-references branch from bb7b759 to b08e8e5 Compare May 7, 2026 09:37
@Bornunique911 Bornunique911 force-pushed the review/issue-471-cheatsheet-references branch from b08e8e5 to 2b18fb2 Compare June 2, 2026 17:33
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ad43feb4-6a25-41e9-8a77-460d06e6a8b1

📥 Commits

Reviewing files that changed from the base of the PR and between 2b18fb2 and b3bab92.

📒 Files selected for processing (7)
  • application/tests/owasp_aisvs_parser_test.py
  • application/tests/owasp_api_top10_2023_parser_test.py
  • application/tests/owasp_kubernetes_top10_2022_parser_test.py
  • application/tests/owasp_kubernetes_top10_2025_parser_test.py
  • application/tests/owasp_llm_top10_2025_parser_test.py
  • application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
  • application/utils/external_project_parsers/parsers/cheatsheets_parser.py
✅ Files skipped from review due to trivial changes (1)
  • application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
🚧 Files skipped from review as they are similar to previous changes (4)
  • application/tests/owasp_aisvs_parser_test.py
  • application/tests/owasp_kubernetes_top10_2022_parser_test.py
  • application/tests/owasp_api_top10_2023_parser_test.py
  • application/tests/owasp_kubernetes_top10_2025_parser_test.py

Summary by CodeRabbit

  • New Features

    • Added support for importing and linking multiple OWASP security standards including Top 10 2025, API Security Top 10 2023, Kubernetes Top Ten 2022 and 2025, Top 10 for LLM and Gen AI Apps 2025, and AI Security Verification Standard.
    • Enhanced cheatsheet functionality with supplemental data and automatic deduplication.
    • Expanded CLI with new import flags for additional OWASP datasets.
  • Tests

    • Added comprehensive test coverage for all new security standard parsers.

Walkthrough

Adds bundled JSON datasets and parser implementations for multiple OWASP standards (AISVS, API Top10 2023, Kubernetes Top10 2022/2025, LLM Top10 2025, Top10 2025), enhances the Cheatsheets parser with supplemental data, updates unit tests for each parser, and adds CLI flags to import the datasets.

Changes

OWASP External Project Parser Expansion

Layer / File(s) Summary
Parser data files
application/utils/external_project_parsers/data/owasp_*.json
Bundled JSON datasets for AISVS 1.0 (14 entries), API Top10 2023 (10), Cheatsheets supplement (multiple items), Kubernetes Top10 2022 (10) and 2025 (10 with fallback ids), LLM Top10 2025 (10), and Top10 2025 (10).
Standard parser implementations
application/utils/external_project_parsers/parsers/owasp_*.py
New parser classes (OwaspAisvs, OwaspApiTop10_2023, OwaspKubernetesTop10_2022, OwaspKubernetesTop10_2025, OwaspLlmTop10_2025, OwaspTop10_2025) that load JSON, construct defs.Standard entries, resolve CREs via cache.get_CREs(external_id=...), attach defs.Link items, and return ParseResult with gap analysis and embeddings disabled.
Kubernetes 2025 fallback
application/utils/.../owasp_kubernetes_top10_2025.py
Parser implements fallback: when primary 2025 entry links none, resolve CREs from referenced 2022 fallback section_ids.
Cheatsheets parser enhancements and tests
application/utils/.../parsers/cheatsheets_parser.py, application/tests/cheatsheets_parser_test.py
Adds official_cheatsheet_url() helper, wraps repo clone in try/except with supplemental JSON fallback, registers supplemental cheatsheets with CRE linking and per-failure warnings, and deduplicates entries by (section, hyperlink). Tests updated to match new hyperlink and selection behavior.
Parser validation tests
application/tests/owasp_*_parser_test.py
Unit tests for AISVS, API Top10 2023, Kubernetes Top10 2022/2025 (including fallback case), LLM Top10 2025, and Top10 2025. Each test creates a Flask test app, initializes DB schema, seeds CRE records, runs parser.parse(...), and asserts expected entry counts and CRE link mappings.
CLI integration
cre.py
Adds boolean --owasp-* flags to enable selective importing of the new OWASP datasets.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Normalize OWASP cheat sheet references' accurately describes the primary focus of the PR - normalizing cheat sheet references in OWASP parsers and data.
Description check ✅ Passed The description is related to the changeset, explaining the problem (inconsistent cheat sheet references), solution (normalization), and context (part of issue #471 and a stacked review series).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (2)
application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py (1)

19-47: 🏗️ Heavy lift

Consider extracting shared parse logic into a base class.

This parse implementation is identical to owasp_aisvs.py and the other standard parsers (API Top 10, Kubernetes 2022, Top 10 2025) — only name and data_file differ. A small base class holding the load-JSON / build-Standard / link-CRE loop, with subclasses supplying just name/data_file, would remove ~5 copies and keep behavior consistent as the logic evolves.

♻️ Sketch of a shared base parser
class _JsonStandardParser(ParserInterface):
    name: str
    data_file: Path

    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
        with self.data_file.open("r", encoding="utf-8") as handle:
            raw_entries = json.load(handle)
        entries = []
        for entry in raw_entries:
            standard = defs.Standard(
                name=self.name,
                sectionID=entry["section_id"],
                section=entry["section"],
                hyperlink=entry["hyperlink"],
            )
            for cre_id in entry.get("cre_ids", []):
                cres = cache.get_CREs(external_id=cre_id)
                if not cres:
                    continue
                standard.add_link(
                    defs.Link(
                        ltype=defs.LinkTypes.LinkedTo,
                        document=cres[0].shallow_copy(),
                    )
                )
            entries.append(standard)
        return ParseResult(
            results={self.name: entries},
            calculate_gap_analysis=False,
            calculate_embeddings=False,
        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py`
around lines 19 - 47, Create a shared base parser class (e.g.,
_JsonStandardParser) that implements the parse method and encapsulates the JSON
load + build-Standard + link-CRE loop currently duplicated in
owasp_llm_top10_2025.py: move the logic that opens self.data_file, iterates
raw_entries, constructs defs.Standard (using name, section_id, section,
hyperlink), calls cache.get_CREs and standard.add_link with
defs.Link(defs.LinkTypes.LinkedTo, cres[0].shallow_copy()), and returns the
ParseResult into the base class; then have the existing parser classes simply
subclass _JsonStandardParser and define their name and data_file attributes so
their parse methods can be removed.
application/utils/external_project_parsers/parsers/owasp_top10_2025.py (1)

13-47: 🏗️ Heavy lift

Consider consolidating the duplicated JSON-driven parsers.

This module is essentially identical to owasp_api_top10_2023.py (and, per the PR stack, the AISVS / Kubernetes 2022 / LLM 2025 parsers) — only name and data_file differ. A single data-driven base parser parameterized by name + data_file would remove ~5x duplication and keep linking/ParseResult behavior in one place.

♻️ Sketch of a shared base parser
class _JsonStandardParser(ParserInterface):
    name: str
    data_file: Path

    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
        with self.data_file.open("r", encoding="utf-8") as handle:
            raw_entries = json.load(handle)
        entries = []
        for entry in raw_entries:
            standard = defs.Standard(
                name=self.name,
                sectionID=entry["section_id"],
                section=entry["section"],
                hyperlink=entry["hyperlink"],
            )
            for cre_id in entry.get("cre_ids", []):
                cres = cache.get_CREs(external_id=cre_id)
                if not cres:
                    continue
                standard.add_link(
                    defs.Link(ltype=defs.LinkTypes.LinkedTo, document=cres[0].shallow_copy())
                )
            entries.append(standard)
        return ParseResult(
            results={self.name: entries},
            calculate_gap_analysis=False,
            calculate_embeddings=False,
        )


class OwaspTop10_2025(_JsonStandardParser):
    name = "OWASP Top 10 2025"
    data_file = Path(__file__).resolve().parent.parent / "data" / "owasp_top10_2025.json"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/owasp_top10_2025.py`
around lines 13 - 47, Consolidate duplicated JSON-driven parsers by introducing
a shared base class (e.g. _JsonStandardParser inheriting ParserInterface) that
implements parse(...) using self.name and self.data_file, move the JSON loading,
Standard creation, CRE linking (using cache.get_CREs and
defs.Link/defs.LinkTypes.LinkedTo) and ParseResult construction into that base,
then change OwaspTop10_2025 to subclass the base and only set name = "OWASP Top
10 2025" and data_file = Path(...)/"owasp_top10_2025.json"; ensure existing
symbols used in callers (OwaspTop10_2025.parse behavior, defs.Standard,
ParseResult) remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/owasp_aisvs_parser_test.py`:
- Around line 51-62: The list comprehensions use the ambiguous loop variable
name `l` causing a linter warning; update the comprehensions in the test (the
expressions `[l.document.id for l in entries[0].links]` and `[l.document.id for
l in entries[-1].links]`) to a clearer name like `link` (e.g. `[link.document.id
for link in entries[0].links]`) so the variables remain descriptive while
preserving the same behavior and references to `entries`, `.links`, and
`.document.id`.

In `@application/tests/owasp_api_top10_2023_parser_test.py`:
- Around line 39-43: The test uses the ambiguous loop variable name `l` in two
list comprehensions ([l.document.id for l in entries[0].links] and
[l.document.id for l in entries[-1].links]) which triggers Ruff E741; rename `l`
to a clearer name like `link` in both occurrences to match the sibling test
(owasp_top10_2025_parser_test.py) and satisfy the linter while keeping the
expressions behavior unchanged.

In `@application/tests/owasp_kubernetes_top10_2022_parser_test.py`:
- Around line 41-45: The list comprehensions in the test use the ambiguous loop
variable name "l" which Ruff flags; update both comprehensions to use a
descriptive name like "link" (e.g., [link.document.id for link in
entries[0].links] and [link.document.id for link in entries[-1].links]) so the
variables referenced around entries and links/document.id are clear and the lint
error (Ruff E741) is resolved.

In `@application/tests/owasp_kubernetes_top10_2025_parser_test.py`:
- Around line 45-52: The test uses an ambiguous single-letter loop variable `l`
in list comprehensions ([l.document.id for l in entries[0].links] and
[l.document.id for l in entries[-1].links]); rename `l` to a clearer name like
`link` in both comprehensions so they become [link.document.id for link in
entries[0].links] and [link.document.id for link in entries[-1].links] (update
occurrences in owasp_kubernetes_top10_2025_parser_test.py where `entries` and
`.links` are inspected).

In `@application/tests/owasp_llm_top10_2025_parser_test.py`:
- Around line 40-45: Rename the ambiguous loop variable `l` to `link` in the
three list comprehensions that build document id lists from entries[*].links
(i.e., change `[l.document.id for l in entries[0].links]`, `[l.document.id for l
in entries[4].links]`, and `[l.document.id for l in entries[-1].links]` to use
`link` instead of `l`) so the comprehensions read `[link.document.id for link in
entries[...] .links]`, preserving the existing assertions and behavior.

In `@application/utils/external_project_parsers/data/owasp_aisvs_1_0.json`:
- Line 5: The hyperlink value for the JSON key "hyperlink" uses a /tree/ URL;
update that string to the canonical /blob/ form (use
/blob/main/0x10-C01-Training-Data-Governance.md) so the OWASP markdown file link
does not rely on GitHub redirects and matches the project's OWASP reference
normalization; locate the "hyperlink" entry in owasp_aisvs_1_0.json and change
the path accordingly.

In
`@application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json`:
- Around line 50-55: The entry with "section_id": "K09" currently has the same
cre_ids as "K01" which is likely incorrect; locate the JSON object where
"section_id": "K09" and either replace the duplicated cre_ids value ["233-748",
"486-813"] with the correct CRE id array for K09 or, if the duplication was
intentional, add a clarifying comment in your PR message confirming that K09
intentionally maps to the same CREs as K01; ensure you update the "cre_ids"
field for the K09 object (or document the intent) so the mapping is unambiguous.

In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`:
- Around line 132-136: The deduplicate_entries method currently replaces earlier
defs.Standard objects when a (entry.section, entry.hyperlink) key repeats;
change it to merge duplicates instead: in deduplicate_entries iterate entries,
use the (section, hyperlink) key to find existing entries and, when found, union
the relevant list/collection fields (at minimum entry.links) into the existing
defs.Standard object (avoiding duplicates), and merge any other supplemental
fields that should be combined (e.g., tags or metadata) rather than overwriting;
return the values of this merged map. Ensure you keep the original defs.Standard
objects for the first-seen entry and update them in-place to preserve other
fields.
- Around line 116-129: The loop currently swallows all exceptions from
cs.add_link (for cre in cres) which hides broken CRE payloads or Link contract
changes; change the bare except to catch exceptions but log them with context
(include cre_id and cre identifiers and the exception stack/str) and set a local
failure flag when any add_link fails; after processing cres, only append cs to
standard_entries if cs.links is non-empty and no add_link failures occurred (use
symbols: cs.add_link, cache.get_CREs, defs.Link,
defs.LinkTypes.AutomaticallyLinkedTo) so parse failures are visible and faulty
cheatsheets aren’t treated as successful.
- Around line 54-67: The try currently wraps both git.clone(...) and
register_cheatsheets(...), so move the try/except to only cover git.clone
(git.clone), logging the clone-specific warning via self.logger.warning on
failure, and ensure that register_cheatsheets(repo=..., cache=...,
cheatsheets_path=..., repo_path=...) is called outside that try block only when
clone succeeds; also ensure cheatsheets is initialized (e.g., empty list) before
extending with self.register_supplemental_cheatsheets(cache=...) so that a
parsing/runtime error inside register_cheatsheets is not swallowed and will
surface rather than silently falling back to supplemental data.

---

Nitpick comments:
In `@application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py`:
- Around line 19-47: Create a shared base parser class (e.g.,
_JsonStandardParser) that implements the parse method and encapsulates the JSON
load + build-Standard + link-CRE loop currently duplicated in
owasp_llm_top10_2025.py: move the logic that opens self.data_file, iterates
raw_entries, constructs defs.Standard (using name, section_id, section,
hyperlink), calls cache.get_CREs and standard.add_link with
defs.Link(defs.LinkTypes.LinkedTo, cres[0].shallow_copy()), and returns the
ParseResult into the base class; then have the existing parser classes simply
subclass _JsonStandardParser and define their name and data_file attributes so
their parse methods can be removed.

In `@application/utils/external_project_parsers/parsers/owasp_top10_2025.py`:
- Around line 13-47: Consolidate duplicated JSON-driven parsers by introducing a
shared base class (e.g. _JsonStandardParser inheriting ParserInterface) that
implements parse(...) using self.name and self.data_file, move the JSON loading,
Standard creation, CRE linking (using cache.get_CREs and
defs.Link/defs.LinkTypes.LinkedTo) and ParseResult construction into that base,
then change OwaspTop10_2025 to subclass the base and only set name = "OWASP Top
10 2025" and data_file = Path(...)/"owasp_top10_2025.json"; ensure existing
symbols used in callers (OwaspTop10_2025.parse behavior, defs.Standard,
ParseResult) remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 1cbdca8b-5833-42a4-97b8-1f4c5c4727a6

📥 Commits

Reviewing files that changed from the base of the PR and between e93ce92 and 2b18fb2.

📒 Files selected for processing (22)
  • application/tests/cheatsheets_parser_test.py
  • application/tests/owasp_aisvs_parser_test.py
  • application/tests/owasp_api_top10_2023_parser_test.py
  • application/tests/owasp_kubernetes_top10_2022_parser_test.py
  • application/tests/owasp_kubernetes_top10_2025_parser_test.py
  • application/tests/owasp_llm_top10_2025_parser_test.py
  • application/tests/owasp_top10_2025_parser_test.py
  • application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
  • application/utils/external_project_parsers/data/owasp_api_top10_2023.json
  • application/utils/external_project_parsers/data/owasp_cheatsheets_supplement.json
  • application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json
  • application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json
  • application/utils/external_project_parsers/data/owasp_llm_top10_2025.json
  • application/utils/external_project_parsers/data/owasp_top10_2025.json
  • application/utils/external_project_parsers/parsers/cheatsheets_parser.py
  • application/utils/external_project_parsers/parsers/owasp_aisvs.py
  • application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py
  • application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2022.py
  • application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.py
  • application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py
  • application/utils/external_project_parsers/parsers/owasp_top10_2025.py
  • cre.py

Comment thread application/tests/owasp_aisvs_parser_test.py Outdated
Comment thread application/tests/owasp_api_top10_2023_parser_test.py Outdated
Comment thread application/tests/owasp_kubernetes_top10_2022_parser_test.py Outdated
Comment thread application/tests/owasp_kubernetes_top10_2025_parser_test.py
Comment thread application/tests/owasp_llm_top10_2025_parser_test.py Outdated
Comment thread application/utils/external_project_parsers/data/owasp_aisvs_1_0.json Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant