Skip to content

Latest commit

 

History

History
59 lines (47 loc) · 2.16 KB

File metadata and controls

59 lines (47 loc) · 2.16 KB

Suggestion for PR #137

The current fix using rsplit() is better than the original split(), but it can still fail in edge cases where directory names repeat patterns.

The Problem

Consider a file path like /abc/abcd/abc/abcd/file.py where root_module = "abc":

  • Current approach: rsplit(f"/abc/", 1) splits on the last /abc/, producing abcd/file.py
  • Result: abc.abcd.file instead of the correct abc.abcd.abc.abcd.file

More Robust Solution

Instead of relying on string parsing with the module name, get the actual filesystem location of the root module and remove it from the file path:

def get_module_path_from_file_path(file_path: str, root_module: str) -> str | None:
    import importlib.util
    
    # Validate inputs
    if (
        not file_path.endswith(".py")
        or not os.path.exists(file_path)
        or f"/{root_module}/" not in file_path
    ):
        return None
    
    # Get the actual location of the root module
    try:
        spec = importlib.util.find_spec(root_module)
        if spec is None or spec.origin is None:
            return None
        
        module_dir = os.path.dirname(spec.origin)
        # Go up one level if it's an __init__.py
        if os.path.basename(spec.origin) == "__init__.py":
            module_dir = os.path.dirname(module_dir)
    except (ImportError, AttributeError, ValueError):
        return None
    
    # Get relative path from module directory
    try:
        relative_path = os.path.relpath(file_path, module_dir)
    except ValueError:
        # Paths on different drives on Windows
        return None
    
    # Convert to module path
    if relative_path.endswith(".py"):
        relative_path = relative_path[:-3]
    module_path = root_module + "." + relative_path.replace(os.sep, ".")
    
    return module_path

Benefits

  • No string parsing vulnerability - Works regardless of repeated naming patterns
  • Uses actual filesystem paths - More reliable and maintainable
  • Handles edge cases - Works with init.py files and package structures
  • More explicit - The intent is clearer to future maintainers

This approach is immune to all naming collision edge cases!