Skip to content
@AMAP-ML

AMAP-ML

AMAP-ML

DreamX Team @ Amap (Alibaba)

GitHub Followers

We are the DreamX team at Amap (Alibaba), focusing on delivering AI products and cutting-edge research in large language models, reinforcement learning, multimodal understanding, generative AI (image/video), world models, efficient inference, generative recommendation and intelligent mobility. Our work has been published at top-tier venues including ICLR, CVPR, ICCV, ACL, EMNLP, and AAAI.

We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.


🔥 News

  • 2026.04.10 💻 We open-sourced SkillClaw -- Let Skills Evolve Collectively with Agentic Evolver.
  • 2026.03.23 💻 We open-sourced Omni-WorldBench -- A Comprehensive Benchmark for Evaluating Interactive Response Capabilities of World Models.
  • 2026.03.11 💻 We open-sourced RL3DEdit -- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing.
  • 2026.03.01 🎉 FE2E is accepted by CVPR 2026 -- Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation.
  • 2026.02.27 🎉 Eevee is accepted by Findings of CVPR 2026 -- Towards Close-up High-resolution Video-based Virtual Try-on.
  • 2026.02.06 💻 We open-sourced MobilityBench -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
  • 2026.02.06 🎉 SpatialGenEval is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
  • 2026.02.06 🎉 Tree-GRPO is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
  • 2026.02.06 🎉 S2-Guidance is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
  • 2026.02.05 🎉 MathForge is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
  • 2026.02.04 💻 We open-sourced Code2World -- A GUI World Model via Renderable Code Generation.
  • 2026.02.04 🎉 GPG is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
  • 2026.02.04 🎉 NarrLV is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
  • 2026.02.04 🎉 EPG is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
  • 2026.02.04 🎉 Omni-Effects is accepted by AAAI 2026.
  • 2026.02.02 🎉 VMBench is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
  • 2026.01.31 🎉 SocioReasoner is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
  • 2026.01.07 💻 We open-sourced Thinking-with-Map -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
  • 2025.06.20 💻 We open-sourced FluxText -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
  • 2025.05.21 💻 We open-sourced UniVG-R1 -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
  • 2025.04.07 💻 We open-sourced RealQA -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.

📚 Research Areas

🧠 LLM Reasoning & Reinforcement Learning

Repository Description Venue
Tree-GRPO Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget. ICLR 2026
GPG A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO. ICLR 2026
MathForge Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives. ICLR 2026
HS-STaR A hierarchical sampling framework that identifies boundary-level problems and dynamically reallocates sampling budget toward high-utility problems for self-taught reasoners. EMNLP 2025
Pos2Distill A position-to-position knowledge distillation framework that transfers knowledge from advantageous positions to mitigate position bias in LLMs. EMNLP 2025
AutoDrive-R2 Incentivizing reasoning and self-reflection capacity for VLA model in autonomous driving trajectory prediction via rule-based RL. ICLR 2026

🎨 Image Generation & Editing

Repository Description Venue
FluxText A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss. -
S2-Guidance Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation. ICLR 2026
EPG Advancing end-to-end pixel-space generative modeling via self-supervised pre-training. ICLR 2026
Omni-Effects A unified framework for prompt-guided and spatially controllable composite visual effects generation, using LoRA-MoE and spatial-aware prompts. AAAI 2026
SCALAR Scale-wise controllable visual autoregressive learning for image generation. AAAI 2026
USP Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality. ICCV 2025
SpatialGenEval A benchmark with 1,230 information-dense prompts and 12,300 multi-choice questions to evaluate complex spatial intelligence in text-to-image models. ICLR 2026
SCAR Semantic context matters: improving conditioning for autoregressive image generation with enhanced semantic guidance. CVPR 2026
SCAR Semantic context matters: improving conditioning for autoregressive image generation with enhanced semantic guidance. CVPR 2026
RL3DEdit An RL-based single-pass 3D scene editing framework using VGGT as geometry-aware reward model and GRPO to anchor 2D editing priors onto the 3D consistency manifold. CVPR 2026

🎬 Video Generation & Understanding

Repository Description Venue
NarrLV The first benchmark to comprehensively evaluate narrative expression capabilities of long video generation models, inspired by film narrative theory. ICLR 2026
VMBench A perception-aligned video motion benchmark with human-aligned metrics achieving 35.3% improvement in Spearman's correlation over baselines. ICCV 2025
Eevee A high-resolution dataset and benchmark for video-based virtual try-on, supporting both full-shot and close-up garment detail views. Findings of CVPR 2026
FE2E Beyond Generation: Advancing image editing priors for depth and normal estimation. CVPR 2026

👁️ Multimodal & Vision-Language Models

Repository Description Venue
UniVG-R1 Reasoning guided universal visual grounding with reinforcement learning. CVPR 2026
SocioReasoner A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning. ICLR 2026
RealQA A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction. TMM
Q-Hawkeye A GRPO-based framework for reliable visual policy optimization in image quality assessment with uncertainty-aware dynamic optimization and perception-aware optimization. CVPR 2026
Code2World A VLM-based GUI world model that predicts dynamic transitions via renderable code generation, boosting Gemini-2.5-Flash by +9.5% on AndroidWorld navigation. -

🌍 World Models & Interactive AI

Repository Description Venue
Omni-WorldBench A comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models across diverse scenarios. arXiv 2026

🗺️ Maps, Mobility & Spatial Intelligence

Repository Description Venue
Thinking-with-Map A map-augmented agent that conducts reasoning with real-world maps for geolocalization, trained via reinforcement learning. arXiv 2026
MobilityBench A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios. arXiv 2026
DSFNet Disentangled scenario factorization for multi-scenario route ranking with the first large-scale public MSDR dataset; deployed in AMap for online traffic. WWW 2025
AR-MAP Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models? A framework for transferring alignment knowledge from AR-LLMs to Diffusion Models. ACL 2026

🔍 Object Detection & Segmentation

Repository Description Venue
UPRE Zero-shot domain adaptation for object detection via unified prompt and representation enhancement. ICCV 2025

Pinned Loading

  1. FluxText FluxText Public

    Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing"

    Python 443 31

  2. Tree-GRPO Tree-GRPO Public

    [ICLR 2026] Tree Search for LLM Agent Reinforcement Learning

    Python 328 27

  3. Code2World Code2World Public

    Code2World: A GUI World Model via Renderable Code Generation

    Python 307 16

  4. FE2E FE2E Public

    [CVPR 2026] Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation

    Python 225 8

  5. RL3DEdit RL3DEdit Public

    HTML 193 7

  6. GPG GPG Public

    [ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

    Python 180 5

Repositories

Showing 10 of 37 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…