Skip to content

VMBackup: Fix Python 3.12+ compatibility for Azure Linux 4.0#2177

Open
jonathanbrenes wants to merge 1 commit into
Azure:masterfrom
jonathanbrenes:fix/vmbackup-azl4-compat
Open

VMBackup: Fix Python 3.12+ compatibility for Azure Linux 4.0#2177
jonathanbrenes wants to merge 1 commit into
Azure:masterfrom
jonathanbrenes:fix/vmbackup-azl4-compat

Conversation

@jonathanbrenes
Copy link
Copy Markdown

Problem

The VMSnapshot (VMBackup) extension crashes on Azure Linux 4.0 (Python 3.14) due to four Python APIs that were removed in recent Python versions. The extension fails at import time, causing Azure Backup jobs to fail with error 400039 after hours of retries.

Removed APIs used by VMBackup

API Removed In Usage
distutils.version.LooseVersion Python 3.12 Version string comparisons
imp module Python 3.12 Fallback module loader for waagent
platform.dist() Python 3.8 Distro detection in telemetry
platform.linux_distribution() Python 3.8 Distro detection for patching

Changes

WaagentLib.py

  • Add try/except for LooseVersion import with a minimal shim class providing comparison operators (__lt__, __gt__, __eq__, __le__, __ge__) via regex-based version parsing.

Why a shim instead of packaging.version.Version? PEP 632 recommends the third-party packaging library as the replacement, but it is not suitable here: (1) it is not in the stdlib and may be absent on minimal VM installs, (2) it enforces strict PEP 440 parsing and rejects the loose version strings this codebase passes (e.g. kernel versions, agent versions), and (3) adding a new dependency to a VM extension that must run on every Linux distro from RHEL 7 (Python 2.7) through AZL4 (Python 3.14) is too risky. The inline shim is ~20 lines, zero-dependency, and drop-in compatible.

WAAgentUtil.py

  • Add try/except around the import imp fallback so it doesn't crash on Python 3.12+ when the primary importlib.util path fails.

HandlerUtil.py

  • Replace platform.dist() in get_dist_info() with /etc/os-release parsing (NAME + VERSION fields).
  • Fix invalid escape sequence: replace('\/', '/') was a no-op (invalid escape treated as literal); changed to replace('\\/', '/').

patch/__init__.py

  • Add /etc/os-release ID parsing to DistInfo() fallback for distros where platform.linux_distribution() is gone.
  • Add 'azurelinux' mapping in GetMyPatching()AzureLinuxPatching.

patch/AzureLinuxPatching.py (new)

  • Dedicated patching class for Azure Linux 4.0.
  • All binary paths set to /usr/bin/ (AZL4 uses merged /usr).
  • Uses dnf for package installation.

Testing

End-to-end Azure Backup on AZL4

Full backup job completed successfully (26 min, all phases: pre-snapshot, snapshot, post-snapshot).

Backward compatibility

All patches are no-ops on older Python versions (guarded by try/except with fallback to original imports). Verified across 10 distros:

Distro Python Result
RHEL 8.10 3.6.8 PASS
RHEL 9.7 3.9.25 PASS
Oracle Linux 9.7 3.9.25 PASS
SLES 15 SP6 3.6.15 PASS
SLES 16.0 3.13.13 PASS
Debian 12 3.11.2 PASS
Ubuntu 20.04 3.8.10 PASS
Ubuntu 22.04 3.10.12 PASS
Ubuntu 24.04 3.12.3 PASS
Azure Linux 4.0 3.14.3 PASS

The VMSnapshot extension crashes on Azure Linux 4.0 (Python 3.14) due to
four Python APIs removed in recent versions. The extension fails at import
time, causing Azure Backup jobs to fail with error 400039 after hours of
retries.

Root cause — removed Python APIs used by VMBackup:
  - distutils.version.LooseVersion (removed in 3.12)
  - imp module (removed in 3.12)
  - platform.dist() (removed in 3.8)
  - platform.linux_distribution() (removed in 3.8)

Changes:

WaagentLib.py:
  - Add try/except for LooseVersion import with a minimal shim class
    that provides comparison operators (__lt__, __gt__, __eq__, __le__,
    __ge__) using regex-based version parsing.
    Note: Python never added a stdlib replacement for LooseVersion.
    PEP 632 recommends the third-party `packaging` library, but it is
    not suitable here: (1) it is not in the stdlib and may be absent on
    minimal installs, (2) it enforces strict PEP 440 parsing and rejects
    the loose version strings this codebase passes, and (3) adding a
    dependency to a VM extension that must run on every Linux distro from
    RHEL 7 (Python 2.7) to AZL4 (Python 3.14) is fragile. The inline
    shim is zero-dependency and drop-in compatible.

WAAgentUtil.py:
  - Add try/except around the `import imp` fallback so it does not crash
    on Python 3.12+ when the primary importlib path fails

HandlerUtil.py:
  - Replace platform.dist() in get_dist_info() with /etc/os-release
    parsing (NAME + VERSION fields)
  - Fix invalid escape sequence: replace('\/', '/') was a no-op
    (invalid escape treated as literal char); changed to
    replace('\\/', '/') which correctly replaces \/ with /

patch/__init__.py:
  - Add /etc/os-release ID parsing to DistInfo() fallback
  - Add 'azurelinux' mapping in GetMyPatching() -> AzureLinuxPatching

patch/AzureLinuxPatching.py (new):
  - Dedicated patching class for Azure Linux 4.0
  - All binary paths set to /usr/bin/ (AZL4 merged /usr)
  - Uses dnf for package installation

Tested:
  - End-to-end Azure Backup on AZL4 VM: PASS (26 min, all phases)
  - Backward compat across 10 distros (Python 3.6-3.14): all PASS
    RHEL 8/9, Oracle Linux 9, SLES 15/16, Debian 12,
    Ubuntu 20.04/22.04/24.04, Azure Linux 4.0
@jonathanbrenes jonathanbrenes requested a review from a team as a code owner May 23, 2026 01:04
@jonathanbrenes
Copy link
Copy Markdown
Author

Evidence: Extension is broken on Azure Linux 4.0

Production failure

On an unpatched Azure Linux 4.0 VM (Python 3.14.3), the VMSnapshot extension cannot start at all. Every enable attempt exits with code 1, causing Azure Backup jobs to fail with error 400039 after ~9 hours of retries.

Extension log (shell.log) — 7 consecutive failures:

Thu May 21 05:04:31 PM UTC 2026- 1 returned from handle.py
Thu May 21 05:06:01 PM UTC 2026- 1 returned from handle.py
Thu May 21 08:04:37 PM UTC 2026- 1 returned from handle.py
Thu May 21 11:24:42 PM UTC 2026- 1 returned from handle.py
Fri May 22 12:49:51 AM UTC 2026- 1 returned from handle.py
Fri May 22 02:36:42 AM UTC 2026- 1 returned from handle.py
Fri May 22 03:08:32 AM UTC 2026- 1 returned from handle.py

Crash traceback — cascading ModuleNotFoundError:

File "WAAgentUtil.py", line 66
    spec.loader.exec_module(waagent)
  File "WaagentLib.py", line 59
    from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'

During handling of the above exception:
  File "WAAgentUtil.py", line 69
    import imp
ModuleNotFoundError: No module named 'imp'

The import chain handle.pymounts.pyDiskUtil.pyHandlerUtil.pyWAAgentUtil.pyWaagentLib.py fails at the very first import, so no code in the extension ever executes.

All 4 removed APIs confirmed broken

Python version: 3.14.3
distutils.version.LooseVersion: FAIL - No module named 'distutils'
import imp:                     FAIL - No module named 'imp'
platform.linux_distribution():  FAIL - module 'platform' has no attribute 'linux_distribution'
platform.dist():                FAIL - module 'platform' has no attribute 'dist'

After patching — extension works

Unit tests on AZL4 VM (8/8 pass):

=== Test 1: Import chain ===
  Utils.HandlerUtil: OK
=== Test 2: DistInfo ===
  DistInfo() = ['azurelinux', '4.0']
=== Test 3: GetMyPatching ===
  Patching class = AzureLinuxPatching
=== Test 4: Binary paths ===
  cryptsetup_path = /usr/bin/cryptsetup -> EXISTS
  getenforce_path = /usr/bin/getenforce -> EXISTS
  setenforce_path = /usr/bin/setenforce -> EXISTS
=== Test 5: get_dist_info ===
  get_dist_info() = ('Azure Linux-4.0 (Cloud Variant Alpha2)', '6.18.5-...')
=== Test 7: LooseVersion shim ===
  LooseVersion: OK (comparisons work)
=== Test 8: Full handle.py import chain ===
  mounts.Mounts: OK
=== ALL TESTS PASSED ===

End-to-end backup — PASS:

Metric Unpatched Patched
Duration ~9 hours → FAILED 26 minutes
Backup size 0 MB 93 MB
Exit code 1 (all attempts) 0
Error 400039 GuestAgentSnapshotTaskStatusError None

Backward compatibility — 10 distros, all PASS

Distro Python Result
RHEL 8.10 3.6.8 PASS
RHEL 9.7 3.9.25 PASS
Oracle Linux 9.7 3.9.25 PASS
SLES 15 SP6 3.6.15 PASS
SLES 16.0 3.13.13 PASS
Debian 12 3.11.2 PASS
Ubuntu 20.04 3.8.10 PASS
Ubuntu 22.04 3.10.12 PASS
Ubuntu 24.04 3.12.3 PASS
Azure Linux 4.0 3.14.3 PASS

All patches are guarded by try/except — on older Python versions where the deprecated APIs still exist, the original code path runs unchanged (zero behavior change).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant