AMAP-ML

DreamX Team @ Amap (Alibaba)

We are the DreamX team at Amap (Alibaba), focusing on delivering AI products and cutting-edge research in large language models, reinforcement learning, multimodal understanding, generative AI (image/video), world models, efficient inference, generative recommendation and intelligent mobility. Our work has been published at top-tier venues including ICLR, CVPR, ICCV, ACL, EMNLP, and AAAI.

We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.

🔥 News

2026.04.10 💻 We open-sourced -- Let Skills Evolve Collectively with Agentic Evolver.
2026.03.23 💻 We open-sourced -- A Comprehensive Benchmark for Evaluating Interactive Response Capabilities of World Models.
2026.03.11 💻 We open-sourced -- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing.
2026.03.01 🎉 is accepted by CVPR 2026 -- Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation.
2026.02.27 🎉 is accepted by Findings of CVPR 2026 -- Towards Close-up High-resolution Video-based Virtual Try-on.
2026.02.06 💻 We open-sourced -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
2026.02.06 🎉 is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
2026.02.06 🎉 is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
2026.02.06 🎉 is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
2026.02.05 🎉 is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
2026.02.04 💻 We open-sourced -- A GUI World Model via Renderable Code Generation.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
2026.02.04 🎉 is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
2026.02.04 🎉 is accepted by AAAI 2026.
2026.02.02 🎉 is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
2026.01.31 🎉 is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
2026.01.07 💻 We open-sourced -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
2025.06.20 💻 We open-sourced -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
2025.05.21 💻 We open-sourced -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
2025.04.07 💻 We open-sourced -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.

📚 Research Areas

🧠 LLM Reasoning & Reinforcement Learning

Repository	Description	Venue
Tree-GRPO	Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget.	ICLR 2026
GPG	A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO.	ICLR 2026
MathForge	Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives.	ICLR 2026
HS-STaR	A hierarchical sampling framework that identifies boundary-level problems and dynamically reallocates sampling budget toward high-utility problems for self-taught reasoners.	EMNLP 2025
Pos2Distill	A position-to-position knowledge distillation framework that transfers knowledge from advantageous positions to mitigate position bias in LLMs.	EMNLP 2025
AutoDrive-R2	Incentivizing reasoning and self-reflection capacity for VLA model in autonomous driving trajectory prediction via rule-based RL.	ICLR 2026

🎨 Image Generation & Editing

Repository	Description	Venue
FluxText	A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss.	-
S2-Guidance	Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation.	ICLR 2026
EPG	Advancing end-to-end pixel-space generative modeling via self-supervised pre-training.	ICLR 2026
Omni-Effects	A unified framework for prompt-guided and spatially controllable composite visual effects generation, using LoRA-MoE and spatial-aware prompts.	AAAI 2026
SCALAR	Scale-wise controllable visual autoregressive learning for image generation.	AAAI 2026
USP	Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality.	ICCV 2025
SpatialGenEval	A benchmark with 1,230 information-dense prompts and 12,300 multi-choice questions to evaluate complex spatial intelligence in text-to-image models.	ICLR 2026
SCAR	Semantic context matters: improving conditioning for autoregressive image generation with enhanced semantic guidance.	CVPR 2026
SCAR	Semantic context matters: improving conditioning for autoregressive image generation with enhanced semantic guidance.	CVPR 2026
RL3DEdit	An RL-based single-pass 3D scene editing framework using VGGT as geometry-aware reward model and GRPO to anchor 2D editing priors onto the 3D consistency manifold.	CVPR 2026

🎬 Video Generation & Understanding

Repository	Description	Venue
NarrLV	The first benchmark to comprehensively evaluate narrative expression capabilities of long video generation models, inspired by film narrative theory.	ICLR 2026
VMBench	A perception-aligned video motion benchmark with human-aligned metrics achieving 35.3% improvement in Spearman's correlation over baselines.	ICCV 2025
Eevee	A high-resolution dataset and benchmark for video-based virtual try-on, supporting both full-shot and close-up garment detail views.	Findings of CVPR 2026
FE2E	Beyond Generation: Advancing image editing priors for depth and normal estimation.	CVPR 2026

👁️ Multimodal & Vision-Language Models

Repository	Description	Venue
UniVG-R1	Reasoning guided universal visual grounding with reinforcement learning.	CVPR 2026
SocioReasoner	A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning.	ICLR 2026
RealQA	A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction.	TMM
Q-Hawkeye	A GRPO-based framework for reliable visual policy optimization in image quality assessment with uncertainty-aware dynamic optimization and perception-aware optimization.	CVPR 2026
Code2World	A VLM-based GUI world model that predicts dynamic transitions via renderable code generation, boosting Gemini-2.5-Flash by +9.5% on AndroidWorld navigation.	-

🌍 World Models & Interactive AI

Repository	Description	Venue
Omni-WorldBench	A comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models across diverse scenarios.	arXiv 2026

🗺️ Maps, Mobility & Spatial Intelligence

Repository	Description	Venue
Thinking-with-Map	A map-augmented agent that conducts reasoning with real-world maps for geolocalization, trained via reinforcement learning.	arXiv 2026
MobilityBench	A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios.	arXiv 2026
DSFNet	Disentangled scenario factorization for multi-scenario route ranking with the first large-scale public MSDR dataset; deployed in AMap for online traffic.	WWW 2025
AR-MAP	Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models? A framework for transferring alignment knowledge from AR-LLMs to Diffusion Models.	ACL 2026

🔍 Object Detection & Segmentation

Repository	Description	Venue
UPRE	Zero-shot domain adaptation for object detection via unified prompt and representation enhancement.	ICCV 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMAP-ML

AMAP-ML

DreamX Team @ Amap (Alibaba)

🔥 News

📚 Research Areas

🧠 LLM Reasoning & Reinforcement Learning

🎨 Image Generation & Editing

🎬 Video Generation & Understanding

👁️ Multimodal & Vision-Language Models

🌍 World Models & Interactive AI

🗺️ Maps, Mobility & Spatial Intelligence

🔍 Object Detection & Segmentation

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!