Add per-factor eigenvalue correction for Distributed Shampoo by runame · Pull Request #263 · facebookresearch/optimizers

runame · 2026-05-07T08:46:17Z

Summary

Add PerFactorEigenvalueCorrectedShampooPreconditionerList and its KL variant. These store eigenvectors and eigenvalues per factor matrix like EigendecomposedShampoo, but recompute eigenvalues every iteration as diag(Q^T M Q) instead of from eigendecomposition. Eigenvectors are still updated via amortized eigendecomposition.
Add PerFactorEigenvalueCorrectedShampooPreconditionerConfig and PerFactorEigenvalueCorrectedKLShampooPreconditionerConfig to shampoo_types.py and export them from distributed_shampoo/__init__.py.
Add tests for both variants, including the KL variant with epsilon=1.0 for clean expected values.

Stack

This PR is part of a stack adding per-factor eigenvalue correction to Distributed Shampoo:

Refactor: extract shared EigendecompositionBasedShampooKroneckerFactorsUnwrapped base class #261 — extract shared base class
Refactor: eliminate _compute_outer_product_list via _transform_grad_for_outer_product hook #262 — add _transform_grad_for_outer_product hook (KL refactor)
This PR — per-factor eigenvalue correction (implementation + tests)
Eigenvalue EMA over per-step outer products

Test plan

New tests for PerFactorEigenvalueCorrectedShampooPreconditionerList pass
New tests for PerFactorEigenvalueCorrectedKLShampooPreconditionerList pass
Existing tests still pass (distributed_shampoo/tests/, distributed_shampoo/preconditioner/tests/)
mypy clean (make type-check)
ruff clean

Generated with Claude Code

…rsUnwrapped base class Consolidate duplicated eigendecomposition logic from EigendecomposedShampooKroneckerFactorsUnwrapped and EigenvalueCorrectedShampooKroneckerFactorsUnwrapped into a shared base class. The base class provides _perform_eigendecomposition and _amortized_computation, with subclass behavior controlled via hasattr checks on field presence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…or_outer_product hook Inline the outer product loop into BaseShampooPreconditionerList._update_factor_matrices and introduce _transform_grad_for_outer_product as the single extension point. The base returns grad unchanged; KL-Shampoo subclasses override it to precondition the gradient. This eliminates _compute_outer_product_list from all three classes that defined it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce PerFactorEigenvalueCorrectedShampoo, which stores m+n eigenvalues per block (one per factor dimension) computed directly as diag(Q^T M Q), where Q are cached eigenvectors and M is the already-accumulated factor matrix. This is more memory-efficient than EShampoo/SOAP's m*n eigenvalues while still providing eigenvalue correction. New classes: - PerFactorEigenvalueCorrectedShampooKroneckerFactorsUnwrapped - PerFactorEigenvalueCorrectedShampooPreconditionerList - PerFactorEigenvalueCorrectedKLShampooPreconditionerList (KL variant) - PerFactorEigenvalueCorrectedShampooPreconditionerConfig - PerFactorEigenvalueCorrectedKLShampooPreconditionerConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Test the combined PerFactor+KL variant which recomputes eigenvalues every step and preconditions gradients before outer products. Uses beta2=0 and epsilon=1.0 to get clean expected values, leveraging the perturb_before_computation happy path where KL is effectively a no-op when eigenvalues are equal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

meta-codesync · 2026-05-12T20:01:12Z

@hjmshi has imported this pull request. If you are a Meta employee, you can view this in D104875458.

runame and others added 6 commits May 7, 2026 09:34

Add full inverse_exponent_override docstrings to PerFactor config cla…

84e7807

…sses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Export PerFactor config classes from distributed_shampoo __init__.py

e1bdea8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

runame mentioned this pull request May 7, 2026

Use eigenvalue EMA over per-step outer products for per-factor correction #264

Open

3 tasks

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-factor eigenvalue correction for Distributed Shampoo#263

Add per-factor eigenvalue correction for Distributed Shampoo#263
runame wants to merge 6 commits into
facebookresearch:mainfrom
runame:pr3/per-factor-eigenvalue-correction

runame commented May 7, 2026

Uh oh!

meta-codesync Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

runame commented May 7, 2026

Summary

Stack

Test plan

Uh oh!

meta-codesync Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant