Skip to content

Add per-factor eigenvalue correction for Distributed Shampoo#263

Open
runame wants to merge 6 commits into
facebookresearch:mainfrom
runame:pr3/per-factor-eigenvalue-correction
Open

Add per-factor eigenvalue correction for Distributed Shampoo#263
runame wants to merge 6 commits into
facebookresearch:mainfrom
runame:pr3/per-factor-eigenvalue-correction

Conversation

@runame
Copy link
Copy Markdown
Contributor

@runame runame commented May 7, 2026

Summary

  • Add PerFactorEigenvalueCorrectedShampooPreconditionerList and its KL variant. These store eigenvectors and eigenvalues per factor matrix like EigendecomposedShampoo, but recompute eigenvalues every iteration as diag(Q^T M Q) instead of from eigendecomposition. Eigenvectors are still updated via amortized eigendecomposition.
  • Add PerFactorEigenvalueCorrectedShampooPreconditionerConfig and PerFactorEigenvalueCorrectedKLShampooPreconditionerConfig to shampoo_types.py and export them from distributed_shampoo/__init__.py.
  • Add tests for both variants, including the KL variant with epsilon=1.0 for clean expected values.

Stack

This PR is part of a stack adding per-factor eigenvalue correction to Distributed Shampoo:

  1. Refactor: extract shared EigendecompositionBasedShampooKroneckerFactorsUnwrapped base class #261 — extract shared base class
  2. Refactor: eliminate _compute_outer_product_list via _transform_grad_for_outer_product hook #262 — add _transform_grad_for_outer_product hook (KL refactor)
  3. This PR — per-factor eigenvalue correction (implementation + tests)
  4. Eigenvalue EMA over per-step outer products

Test plan

  • New tests for PerFactorEigenvalueCorrectedShampooPreconditionerList pass
  • New tests for PerFactorEigenvalueCorrectedKLShampooPreconditionerList pass
  • Existing tests still pass (distributed_shampoo/tests/, distributed_shampoo/preconditioner/tests/)
  • mypy clean (make type-check)
  • ruff clean

Generated with Claude Code

runame and others added 6 commits May 7, 2026 09:34
…rsUnwrapped base class

Consolidate duplicated eigendecomposition logic from EigendecomposedShampooKroneckerFactorsUnwrapped
and EigenvalueCorrectedShampooKroneckerFactorsUnwrapped into a shared base class. The base class
provides _perform_eigendecomposition and _amortized_computation, with subclass behavior controlled
via hasattr checks on field presence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or_outer_product hook

Inline the outer product loop into BaseShampooPreconditionerList._update_factor_matrices
and introduce _transform_grad_for_outer_product as the single extension point. The base
returns grad unchanged; KL-Shampoo subclasses override it to precondition the gradient.
This eliminates _compute_outer_product_list from all three classes that defined it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce PerFactorEigenvalueCorrectedShampoo, which stores m+n eigenvalues
per block (one per factor dimension) computed directly as diag(Q^T M Q), where
Q are cached eigenvectors and M is the already-accumulated factor matrix. This
is more memory-efficient than EShampoo/SOAP's m*n eigenvalues while still
providing eigenvalue correction.

New classes:
- PerFactorEigenvalueCorrectedShampooKroneckerFactorsUnwrapped
- PerFactorEigenvalueCorrectedShampooPreconditionerList
- PerFactorEigenvalueCorrectedKLShampooPreconditionerList (KL variant)
- PerFactorEigenvalueCorrectedShampooPreconditionerConfig
- PerFactorEigenvalueCorrectedKLShampooPreconditionerConfig

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test the combined PerFactor+KL variant which recomputes eigenvalues
every step and preconditions gradients before outer products. Uses
beta2=0 and epsilon=1.0 to get clean expected values, leveraging the
perturb_before_computation happy path where KL is effectively a no-op
when eigenvalues are equal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 12, 2026

@hjmshi has imported this pull request. If you are a Meta employee, you can view this in D104875458.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant