GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation

🎉 Accepted to ICLR 2026 🎉

Weijia Dou¹, Xu Zhang², Yi Bin^1*, Jian Liu³, Bo Peng², Guoqing Wang³, Yang Yang³, Heng Tao Shen¹ (*Corresponding author)

¹Tongji University ²Tianjin University ³University of Electronic Science and Technology of China

This is the official repository for GeoPurify. Our work tackles a key challenge in open-vocabulary 3D segmentation: the noisy and fragmented results produced when lifting features from 2D Vision-Language Models (VLMs) to 3D space.

GeoPurify introduces a framework that learns to purify these semantically-rich but geometrically-inconsistent 3D features. By distilling robust, class-agnostic geometric priors from a 3D self-supervised model, it effectively reconciles 2D semantics with 3D structure—all without needing any 3D semantic labels for its training.

Our key novelty in a sentence: GeoPurify achieves state-of-the-art open-vocabulary 3D segmentation with only ~1.5% of training data by learning to purify noisy 2D VLM features using distilled 3D geometric priors.

📝 Contents

GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation

🧠 Method Overview

Our method explicitly decouples semantics and geometry into a two-stage pipeline:

Stage 1: Training (Geometric Distillation) A sparse 3D Student Affinity Network (φ_S) is trained to comprehend 3D structure. It learns geometric relationships directly from the point cloud by using contrastive distillation to mimic the embeddings of a powerful, frozen 3D SSL teacher (φ_T, e.g., Sonata). Crucially, this training phase requires no 3D semantic labels.
Stage 2: Inference (Geometry-Guided Pooling) A frozen generalist 2D VLM (Ψ_2D, e.g., X-Decoder) generates initial 3D features by projecting rich semantic content from multi-view images. Because these features are geometrically inconsistent, our pre-trained student network applies a geometry-aware pooling operation, using its learned affinities to iteratively refine and denoise the initial features. This process yields a final representation that is both semantically rich and geometrically coherent.

✨ Key Features

⚡ Unrivaled Data Efficiency: Achieves or surpasses SOTA performance on major benchmarks (ScanNetV2, Matterport3D) while training on only ~1.5% of the data, eliminating the need for large-scale 3D annotations.
🎓 Novel Geometric Distillation: Introduces a teacher-student framework that distills purely geometric affinities from a 3D self-supervised model. This learns a class-agnostic prior to correct structural inconsistencies in 2D-lifted features.
🌍 Strong Generalization: The decoupled architecture provides robust zero-shot performance on long-tail benchmarks and excels in cross-dataset generalization, unlike methods that learn entangled geo-semantic representations.
🎯 Simple & Effective Purification: At inference, a lightweight Geometry-Guided Pooling module uses the learned affinities to denoise features, producing coherent and accurate segmentation maps.

🛠️ Installation

For detailed setup instructions, please see the Installation Guide.

🚀 Usage

Data Preparation

Input: Multi-view RGB-D images + 3D point clouds.
Datasets supported: ScanNetV2, Matterport3D, and ScanNet200.
Follow preprocessing scripts in scripts/preprocess.

Training GeoPurify

Run training with the curated subset (~1.5% of data):

sh run/train.sh --exp_dir=out/scannet --config=config/geopurify_scannet.yaml

Inference

Apply trained model for open-vocabulary 3D segmentation. Pretrained checkpoints are provided under:

Matterport3D: result/matterport/model
ScanNetV2: result/scannet/model

sh run/val.sh --exp_dir=out/scannet --config=config/geopurify_scannet.yaml --ckpt_name=geopurify.pth

📊 Evaluation

Datasets

ScanNetV2: 1,500 RGB-D scans.
Matterport3D: 90 large-scale indoor scenes.
ScanNet200: Long-tail benchmark emphasizing rare categories.

Metrics

mIoU (mean Intersection-over-Union)
mAcc (mean Accuracy)
Foreground-mIoU / Foreground-mAcc (excluding wall/floor/ceiling).

Results

ScanNetV2 (∼1.5% data): 55.1 mIoU / 72.5 mAcc
Matterport3D: 40.2 mIoU / 62.4 mAcc.
ScanNet200 (long-tail): 11.9 f-mIoU / 22.8 f-mAcc

📦 Checkpoints

Pretrained checkpoints are available on Google Drive: 🔗 Download Here

Usage

Matterport3D checkpoint: checkpoint/result/matterport/model/geopurify.pth
ScanNetV2 checkpoint: checkpoint/result/scannet/model/geopurify.pth

📚 Citation

If you find this work useful, please cite:

@misc{dou2025geopurifydataefficientgeometricdistillation,
      title={GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation},
      author={Weijia Dou and Xu Zhang and Yi Bin and Jian Liu and Bo Peng and Guoqing Wang and Yang Yang and Heng Tao Shen},
      year={2025},
      eprint={2510.02186},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.02186},
}

🙏 Acknowledgements

We thank the authors of Sonata, X-Decoder, and XMask3D for their excellent open-source contributions.

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
config		config
dataset		dataset
docs		docs
models		models
run		run
third_party		third_party
util		util
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
matterport_evaluation.txt		matterport_evaluation.txt
matterport_train.txt		matterport_train.txt
scannet_train.txt		scannet_train.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation

🎉 Accepted to ICLR 2026 🎉

📝 Contents

🧠 Method Overview

✨ Key Features

🛠️ Installation

🚀 Usage

Data Preparation

Training GeoPurify

Inference

📊 Evaluation

Datasets

Metrics

Results

📦 Checkpoints

Usage

📚 Citation

🙏 Acknowledgements

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation

🎉 Accepted to ICLR 2026 🎉

📝 Contents

🧠 Method Overview

✨ Key Features

🛠️ Installation

🚀 Usage

Data Preparation

Training GeoPurify

Inference

📊 Evaluation

Datasets

Metrics

Results

📦 Checkpoints

Usage

📚 Citation

🙏 Acknowledgements

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages