feat: add quarot transform support and related changes, and asym for … by wenhuach21 · Pull Request #1682 · intel/auto-round

wenhuach21 · 2026-04-14T09:39:16Z

…activations. only for test.

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

…activations. only for test.

azure-pipelines · 2026-04-14T09:39:41Z

Azure Pipelines: Successfully started running 6 pipeline(s). 1 pipeline(s) require an authorized user to comment /azp run to run.

for more information, see https://pre-commit.ci

azure-pipelines · 2026-04-14T09:41:57Z

Azure Pipelines: 6 pipeline(s) were filtered out due to trigger conditions. 1 pipeline(s) require an authorized user to comment /azp run to run.

Copilot

Pull request overview

This PR introduces a new W4A4 preset scheme and adds experimental QuaRot-style (Hadamard/rotation) support for Llama models, wiring the configuration through CLI → compression → export/load → inference/eval paths.

Changes:

Add W4A4 preset scheme and enable it across formats and CPU tests.
Add llama_quarot placement strategy with offline weight rotation + online activation transforms (hooks + wrapper monkey-patches).
Extend CLI/config plumbing for --hadamard_config and activation symmetry overrides; adjust evaluation to keep in-memory models for fake so runtime hooks persist.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/test_cpu/schemes/test_scheme.py	Adds a CPU test covering `W4A4` quantize-and-save + reload.
auto_round/schemes.py	Introduces the `W4A4` preset and registers it in `PRESET_SCHEMES`.
auto_round/inference/convert_model.py	Simplifies HadamardConfig construction when registering input hooks during model conversion.
auto_round/inference/backend.py	Makes Hadamard config handling robust to dict vs object; adds placement strategy check.
auto_round/formats.py	Advertises `W4A4` support for AutoGPTQ + AutoRound formats.
auto_round/experimental/utils.py	Adds `llama_quarot` shorthand and relaxes scheme gating when using that placement strategy.
auto_round/experimental/transform/patch_modules.py	Adds wrapper forward monkey-patches to apply selective online activation transforms.
auto_round/experimental/transform/llama_quarot.py	New implementation for offline Llama QuaRot weight rotation + online transforms.
auto_round/experimental/transform/hadamard_config.py	Extends `HadamardConfig` with `placement_strategy` and QuaRot-specific options.
auto_round/experimental/transform/apply.py	Routes `placement_strategy == llama_quarot` to the new offline+online transform flow.
auto_round/eval/evaluation.py	Forces in-memory evaluation for `fake` format so runtime hooks remain attached.
auto_round/compressors/base.py	Normalizes/validates hadamard config once, passes target device, and applies QuaRot layer-config overrides.
auto_round/main.py	Adds `--hadamard_config`, `--act_sym/--act_asym`, and argument resolution helpers.

Comments suppressed due to low confidence (1)

auto_round/formats.py:1071

AutoRoundFormat.support_schemes now includes W4A4, but the default format="auto_round" export path selects the GPTQ backend for symmetric int schemes and GPTQ inference ignores activation-quantization fields (act_bits, etc.). As a result, users can export W4A4 successfully but won’t actually get 4-bit activation quantization at load/inference time. Consider either (1) disallowing act_bits < 16 for auto_round exports (consistent with the existing error message later in __init__), or (2) adding an activation-quant-capable export/inference backend before advertising W4A4 support here.

class AutoRoundFormat(OutputFormat):
    support_schemes = [
        "W4A16",
        "W4A4",
        "W4A16_MIXED",
        "W2A16",
        "W3A16",
        "W8A16",

Copilot · 2026-04-14T09:45:28Z

auto_round/experimental/transform/llama_quarot.py

+        input_tensor = args[0]
+        transformed_input = llama_quarot_online_transform(module, input_tensor)
+        if len(args) == 1:
+            return transformed_input


In _build_online_hook, the forward pre-hook returns a bare Tensor when the module has a single positional input (len(args) == 1). For register_forward_pre_hook, the hook must return None or a tuple of positional args; returning a Tensor will cause PyTorch to treat it as an iterable of inputs when splatting (*input), breaking the forward call. Return (transformed_input,) instead (and keep the tuple path for multi-arg cases).

Suggested change

return transformed_input

return (transformed_input,)

Copilot · 2026-04-14T09:45:29Z

auto_round/formats.py

 class AutoGPTQFormat(OutputFormat):
-    support_schemes = ["W4A16", "W2A16", "W3A16", "W8A16", "BF16", "W2A16G64", "W2A16G32", "W4A16_MIXED"]
+    support_schemes = ["W4A16", "W4A4", "W2A16", "W3A16", "W8A16", "BF16", "W2A16G64", "W2A16G32", "W4A16_MIXED"]


Adding W4A4 to AutoGPTQFormat.support_schemes is misleading because the GPTQ export/inference path only uses bits/group_size/sym when creating inference layers (activation quantization fields like act_bits/act_group_size/act_sym are ignored in convert_model._create_quant_layer for GPTQ backends). This means a model exported as auto_gptq from W4A4 will effectively behave like weight-only W4A16 at inference. Either remove W4A4 from this list or add a hard check that scheme.act_bits is 16 for this format (and route users to fake if they want activation quantization).

feat: add quarot transform support and related changes, and asym for …

654ca65

…activations. only for test.

Copilot AI review requested due to automatic review settings April 14, 2026 09:39

wenhuach21 marked this pull request as draft April 14, 2026 09:39

Copilot started reviewing on behalf of wenhuach21 April 14, 2026 09:39 View session

[pre-commit.ci] auto fixes from pre-commit.com hooks

06a2c86

for more information, see https://pre-commit.ci

Copilot AI reviewed Apr 14, 2026

View reviewed changes

wenhuach21 closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add quarot transform support and related changes, and asym for …#1682

feat: add quarot transform support and related changes, and asym for …#1682
wenhuach21 wants to merge 2 commits intomainfrom
feat/autoround-quarot

wenhuach21 commented Apr 14, 2026

Uh oh!

azure-pipelines bot commented Apr 14, 2026

Uh oh!

azure-pipelines bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wenhuach21 commented Apr 14, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

azure-pipelines bot commented Apr 14, 2026

Uh oh!

azure-pipelines bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants