Skip to content

Add folder summary for source report#278

Draft
JustinWonjaePark wants to merge 5 commits into
mainfrom
dev_folder-summary
Draft

Add folder summary for source report#278
JustinWonjaePark wants to merge 5 commits into
mainfrom
dev_folder-summary

Conversation

@JustinWonjaePark

@JustinWonjaePark JustinWonjaePark commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
  • New Features
    • Added folder-level summary rows for source reports by merging compatible file-based scan results within the same folder.
    • Added --no_merge option to keep source report output file-based when folder summary is not needed.
    • Added a hidden SRC_FL_Source_before_merge sheet in XLSX reports to preserve the original pre-merge source rows.

Summary by CodeRabbit

  • New Features
    • Added --no_merge CLI option to disable automatic folder-based merging of source file paths in generated reports.
    • When merging is enabled, compatible rows are consolidated within each folder and an additional “before merge” worksheet is included but hidden in the final .xlsx.
  • Documentation
    • Updated source scanner CLI help text to document the new --no_merge behavior.
  • Chores
    • Added defusedxml to project dependencies.

@JustinWonjaePark JustinWonjaePark self-assigned this Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d2028bf-531b-43cf-8873-361622cce131

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a --no_merge CLI flag to fosslight_source that controls folder-based merging of scan results. When enabled (default), the tool creates a hidden "before merge" worksheet in XLSX outputs, merges compatible scan rows within each folder tree, and hides the pre-merge sheet by modifying the embedded workbook.xml in the zip. Includes imports for XLSX manipulation, the merge algorithm with OSS normalization and compatibility checks, helper functions for worksheet management, function signature updates, and integration through the main CLI flow.

Changes

Merge-by-folder feature

Layer / File(s) Summary
CLI flag, help, and constant setup
src/fosslight_source/_help.py, src/fosslight_source/cli.py, pyproject.toml
Adds --no_merge argument and help text, PRE_MERGE_SHEET_NAME constant, and adds defusedxml dependency for XML parsing in worksheet hiding.
Imports for ZIP and XML handling
src/fosslight_source/cli.py
Adds imports for tempfile, zipfile, defusedxml, and xml.etree.ElementTree to support XLSX worksheet manipulation.
Worksheet manipulation helpers
src/fosslight_source/cli.py
Implements functions to collect SourceItem printable rows, attach pre-merge external sheets onto ScannerItem, and hide named worksheets in .xlsx by rewriting workbook.xml within the zip archive.
Folder merge algorithm
src/fosslight_source/cli.py
Implements merge_results_by_folder() with OSS name/version/license normalization and compatibility checks, representative row selection by shortest path, top-occurrence download location and copyright aggregation, and recursive folder-tree merging with depth guards.
Function signature updates
src/fosslight_source/cli.py
Updates run_scanners() to accept merge_by_folder parameter (default True) and extends create_report_file() with merge_by_folder and optional kb_reference_result for conditional pre-merge sheet and KB fallback.
CLI main and report generation integration
src/fosslight_source/cli.py
Computes merge_by_folder in main(), passes it through run_scanners(), updates KB reference sheet selection with optional fallback, adds pre-merge sheet creation with merge_results_by_folder() call in create_report_file(), and updates the call site to pass new arguments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add folder summary for source report' accurately describes the main change: introducing folder-level summary functionality that aggregates file-based scan results within folders.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev_folder-summary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JustinWonjaePark JustinWonjaePark force-pushed the dev_folder-summary branch 2 times, most recently from a6f1d0d to 3d4abc6 Compare June 16, 2026 05:59
@JustinWonjaePark JustinWonjaePark marked this pull request as ready for review June 16, 2026 07:43

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/fosslight_source/cli.py (3)

94-94: ⚡ Quick win

Consider using defusedxml for safer XML parsing.

While the XML being parsed comes from XLSX files generated by this tool, users could potentially process third-party XLSX files. xml.etree.ElementTree.fromstring is vulnerable to XML External Entity (XXE) attacks. Consider using defusedxml.ElementTree as a safer alternative.

-import xml.etree.ElementTree as ET
+import defusedxml.ElementTree as ET

This would require adding defusedxml to dependencies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_source/cli.py` at line 94, The code at the ET.fromstring call
is vulnerable to XXE attacks when processing third-party XLSX files. Replace the
import statement to use defusedxml.ElementTree instead of xml.etree.ElementTree,
then update the ET.fromstring call to use the safer defusedxml variant.
Additionally, add defusedxml to the project's dependencies in the appropriate
configuration file (such as requirements.txt, setup.py, or pyproject.toml) to
ensure the required package is available.

Source: Linters/SAST tools


224-224: ⚡ Quick win

Fix implicit Optional type hint.

kb_reference_result: list = None violates PEP 484. When a parameter defaults to None, the type hint should explicitly include None.

-    run_kb_msg: str = "", merge_by_folder: bool = False, kb_reference_result: list = None
+    run_kb_msg: str = "", merge_by_folder: bool = False, kb_reference_result: list | None = None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_source/cli.py` at line 224, The parameter kb_reference_result
has a default value of None but the type hint only specifies list, which
violates PEP 484. Update the type hint for kb_reference_result to explicitly
indicate that None is a valid type by using either Optional[list] from the
typing module or the union syntax list | None (for Python 3.10+), ensuring the
type annotation matches the actual default value.

Source: Linters/SAST tools


503-503: ⚡ Quick win

Rename license to avoid shadowing Python builtin.

The variable name license shadows Python's built-in license function.

-    return tuple(sorted([license.strip() for license in scan_item.licenses if license and license.strip()]))
+    return tuple(sorted([lic.strip() for lic in scan_item.licenses if lic and lic.strip()]))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_source/cli.py` at line 503, The variable name `license` in the
list comprehension at line 503 shadows Python's built-in `license` function.
Rename this loop variable to a non-shadowing name such as `lic`, `license_item`,
or `item` throughout the list comprehension that iterates over
`scan_item.licenses`, ensuring all three occurrences of the variable name in the
comprehension (the loop variable, and the two conditional checks) are updated
consistently.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/fosslight_source/cli.py`:
- Line 94: The code at the ET.fromstring call is vulnerable to XXE attacks when
processing third-party XLSX files. Replace the import statement to use
defusedxml.ElementTree instead of xml.etree.ElementTree, then update the
ET.fromstring call to use the safer defusedxml variant. Additionally, add
defusedxml to the project's dependencies in the appropriate configuration file
(such as requirements.txt, setup.py, or pyproject.toml) to ensure the required
package is available.
- Line 224: The parameter kb_reference_result has a default value of None but
the type hint only specifies list, which violates PEP 484. Update the type hint
for kb_reference_result to explicitly indicate that None is a valid type by
using either Optional[list] from the typing module or the union syntax list |
None (for Python 3.10+), ensuring the type annotation matches the actual default
value.
- Line 503: The variable name `license` in the list comprehension at line 503
shadows Python's built-in `license` function. Rename this loop variable to a
non-shadowing name such as `lic`, `license_item`, or `item` throughout the list
comprehension that iterates over `scan_item.licenses`, ensuring all three
occurrences of the variable name in the comprehension (the loop variable, and
the two conditional checks) are updated consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9f898ad6-59ec-46d3-a38d-bb384ed195f3

📥 Commits

Reviewing files that changed from the base of the PR and between 36755c8 and 3d4abc6.

📒 Files selected for processing (2)
  • src/fosslight_source/_help.py
  • src/fosslight_source/cli.py

Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>
Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/fosslight_source/cli.py (1)

94-94: 💤 Low value

Consider using defusedxml for XML parsing as defense-in-depth.

While the workbook.xml is parsed from an xlsx file generated by this tool, using defusedxml.ElementTree.fromstring instead of xml.etree.ElementTree.fromstring provides protection against XML entity expansion attacks as a defense-in-depth measure.

Suggested change
-import xml.etree.ElementTree as ET
+import defusedxml.ElementTree as ET
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_source/cli.py` at line 94, Replace the standard
xml.etree.ElementTree with defusedxml for secure XML parsing. Change the
ET.fromstring call that parses workbook_xml to use
defusedxml.ElementTree.fromstring instead. This requires importing
defusedxml.ElementTree (or importing just ElementTree from defusedxml) and using
that safe alternative for the XML parsing operation to protect against XML
entity expansion attacks.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fosslight_source/cli.py`:
- Around line 502-503: The loop variable `license` in the list comprehension
within the _get_merge_licenses function shadows Python's built-in license()
function. Rename the loop variable from `license` to a different name (such as
`lic` or `item`) throughout the list comprehension to avoid shadowing the
built-in, ensuring all occurrences in the `[license.strip() for license in
scan_item.licenses if license and license.strip()]` expression are updated
consistently.
- Line 224: The type annotation for the kb_reference_result parameter uses an
implicit Optional by setting a default value of None without explicitly
declaring the type as Optional, which violates PEP 484. Import Optional from the
typing module (if not already imported) and change the kb_reference_result
parameter annotation from list to Optional[list] to make the parameter's
nullable nature explicit in the type hint.

---

Nitpick comments:
In `@src/fosslight_source/cli.py`:
- Line 94: Replace the standard xml.etree.ElementTree with defusedxml for secure
XML parsing. Change the ET.fromstring call that parses workbook_xml to use
defusedxml.ElementTree.fromstring instead. This requires importing
defusedxml.ElementTree (or importing just ElementTree from defusedxml) and using
that safe alternative for the XML parsing operation to protect against XML
entity expansion attacks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 64e72122-61e1-49fc-b7c3-7b7afbe7d676

📥 Commits

Reviewing files that changed from the base of the PR and between 3d4abc6 and 89e5adf.

📒 Files selected for processing (2)
  • src/fosslight_source/_help.py
  • src/fosslight_source/cli.py
✅ Files skipped from review due to trivial changes (1)
  • src/fosslight_source/_help.py

Comment thread src/fosslight_source/cli.py Outdated
Comment thread src/fosslight_source/cli.py Outdated
@JustinWonjaePark JustinWonjaePark marked this pull request as draft June 17, 2026 06:50
- Use defusedxml for safer XML parsing
- Fix implicit Optional type hint in kb_reference_result
- Rename 'license' to 'lic' to avoid shadowing built-in function

Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>
Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>
@JustinWonjaePark JustinWonjaePark marked this pull request as ready for review June 17, 2026 10:22
@JustinWonjaePark JustinWonjaePark added the enhancement [PR/Issue] New feature or request label Jun 17, 2026
@JustinWonjaePark JustinWonjaePark marked this pull request as draft June 18, 2026 04:32
…arate excluded files

Signed-off-by: Wonjae Park <j.wonjae.park@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement [PR/Issue] New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant