vllm-project/llm-compressor : Quantization toolkit for LLM deployment with vLLM
- Add iMatrix weighted MSE observer and IMatrixGatherer : importance-weighted quantization, improves PPL across RTN/GPTQ/AWQ
- Add norm calibration context for unit-offset RMSNorm (Gemma/Qwen3Next) : fixes AWQ/SmoothQuant on Gemma models
- Add MoE calibration module for GlmMoeDsa (GLM-5) : packed 3D tensor handling for MoE architectures
- Fix topological ordering in FX graph cleanup : erase_node crash fix for Granite4 GPTQ
- Handle packed weights in granite4 to_3d_expert (W4A16)
- Fix SmoothQuant regex for DeepSeek/GLM-5
- Add SmoothQuant mapping for GLM-5
- Add AWQ mapping for GLM-5
vllm-project/compressed-tensors : Safetensors extension for sparse and quantized tensor storage
- Support N-dimensional tensors in pack/unpack_int32 : fixes 3D MoE expert weight packing
