DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes

Brain and Artificial Intelligence Lab, Northwestern Polytechnical University
Department of Computer Vision (VIS), Baidu Inc.
X-NS Group, Beijing Institute of Technology
State Key Lab of CAD & CG, Zhejiang University

Video (with voiceover)

Abstract

Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction. However, these methods rely heavily on dense image inputs and prolonged training times, making them unsuitable where computational resources are limited. Additionally, few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes. Our approach divides the scene into regions, processed independently by drones with sparse image inputs. Using a feed-forward Gaussian model, we predict high-quality Gaussian primitives, followed by a global alignment algorithm to ensure geometric consistency. Synthetic views and depth priors are incorporated to further enhance training, while a distillation-based model aggregation mechanism enables efficient reconstruction. Our method achieves high-quality large-scale scene reconstruction and novel-view synthesis in significantly reduced training times, outperforming existing approaches in both speed and scalability. We demonstrate the effectiveness of our framework on vast aerial scenes, achieving high-quality results within minutes.
Interpolate start reference image

Our proposed DGTR can rapidly reconstruct sparse-view vast scenes in a distributed manner. Compared with the standard central 3DGS training method, we achieve better visual appearance and geometry accuracy at a faster speed.


Method

Interpolate start reference image DGTR (Ours) Overview: Given \(M\) individual devices (drones), we aim to perform sparse-view vast scene reconstruction with fast speed in a multi-device collaboration manner. The whole pipeline can be divided into three steps: 1) each device explores a non-overlap region and conducts Gaussian initialization using the off-the-shelf feed-forward Gaussian method and global alignment strategy; 2) each device performs sparse-view scene reconstruction using the initialized Gaussians; 3) The device uploads the well-trained Gaussian model to the central server, the central server performs model aggregation in a distillation manner.

Results

Interpolate start reference image Quantitative results of novel view synthesis on Mill19 dataset and UrbanScene3D dataset. $\uparrow$: higher is better, $\downarrow$: lower is better. The \colorbox{red}{red}, \colorbox{orange}{orange} and \colorbox{yellow}{yellow} colors respectively denote the best, the second best, and the third best results on the sparse-view setting. The \underline{Underline} denotes the best results in all methods. $\dagger$ denotes half of the test images are included in the training set. $\ddagger$ denotes it uses all dense images as the training set.
Interpolate start reference image Qualitative results on on Mill19 dataset and UrbanScene3D dataset.

BibTeX

@article{li2025DGTR,
      title={DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes}, 
      author={Hao Li and Yuanyuan Gao and Haosong Peng and Chenming Wu and Weicai Ye and Yufeng Zhan and Chen Zhao and Dingwen Zhang and Jingdong Wang and Junwei Han},
      year={2025},
      eprint={2501.xxxx},
}