The current fix using rsplit() is better than the original split(), but it can still fail in edge cases where directory names repeat patterns.
Consider a file path like /abc/abcd/abc/abcd/file.py where root_module = "abc":
- Current approach:
rsplit(f"/abc/", 1)splits on the last/abc/, producingabcd/file.py - Result:
abc.abcd.fileinstead of the correctabc.abcd.abc.abcd.file
Instead of relying on string parsing with the module name, get the actual filesystem location of the root module and remove it from the file path:
def get_module_path_from_file_path(file_path: str, root_module: str) -> str | None:
import importlib.util
# Validate inputs
if (
not file_path.endswith(".py")
or not os.path.exists(file_path)
or f"/{root_module}/" not in file_path
):
return None
# Get the actual location of the root module
try:
spec = importlib.util.find_spec(root_module)
if spec is None or spec.origin is None:
return None
module_dir = os.path.dirname(spec.origin)
# Go up one level if it's an __init__.py
if os.path.basename(spec.origin) == "__init__.py":
module_dir = os.path.dirname(module_dir)
except (ImportError, AttributeError, ValueError):
return None
# Get relative path from module directory
try:
relative_path = os.path.relpath(file_path, module_dir)
except ValueError:
# Paths on different drives on Windows
return None
# Convert to module path
if relative_path.endswith(".py"):
relative_path = relative_path[:-3]
module_path = root_module + "." + relative_path.replace(os.sep, ".")
return module_path- No string parsing vulnerability - Works regardless of repeated naming patterns
- Uses actual filesystem paths - More reliable and maintainable
- Handles edge cases - Works with init.py files and package structures
- More explicit - The intent is clearer to future maintainers
This approach is immune to all naming collision edge cases!