Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
-
Updated
Sep 26, 2025 - Python
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
Dynamic weight generation for recursive transformers via input-conditioned LoRA modulation
(NeurIPS-2019 MicroNet Challenge - 3rd Winner) Open source code for "SIPA: A simple framework for efficient networks"
Temporalmesh-transformer. It is the first architecture to simultaneously fuse dynamic graph topology, token-level adaptive compute, and temporal semantic decay into a single unified model. No prior work does all three together.
The ARL Hierarchical MultiScale Framework (ARL-HMS) is a software library for development of multiscale models on heterogeneous high-performance computing systems.
Frozen KV Context for Mixture-of-Recursions on a Modernized BERT
Volumetric language model with Triangle Cross-Scan State Modelling. Without Attention. With Neural Turing Machines (NTM) & Differentiable Neural Computers (DNC) smells
PyTorch benchmark for CTM-style adaptive computation, sparse-retrieval failure analysis, adaptive halting, and attention-supervised recovery.
Model implementation for "Adaptive computation as a new mechanism of dynamic human attention"
Recursive Convergent Inference — dynamic MoE with convergence-gated stopping. Unexpected finding: model-relative complexity diverges from human difficulty labels
Lightweight PyTorch implementation of Mixture-of-Recursions with Expert-Choice & Token-Choice routing | Runs on your laptop!
Mixture-of-Recursions on a Modernized BERT (Prototype)
Curated papers on dynamic neural networks, pruning, growing architectures, sparse training, and adaptive computation.
Add a description, image, and links to the adaptive-computation topic page so that developers can more easily learn about it.
To associate your repository with the adaptive-computation topic, visit your repo's landing page and select "manage topics."