Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images

1Tsinghua University
2Baidu VIS
3Zhejiang University
*Equal contribution Corresponding author

Abstract

Wide-baseline panoramic images are frequently used in applications like VR and simulations to minimize capturing labor costs and storage needs. However, synthesizing novel views from these panoramic images in real time remains a significant challenge, especially due to panoramic imagery's high resolution and inherent distortions. Although existing 3D Gaussian splatting (3DGS) methods can produce photo-realistic views under narrow baselines, they often overfit the training views when dealing with widebaseline panoramic images due to the difficulty in learning precise geometry from sparse 360◦ views. This paper presents Splatter-360, a novel end-to-end generalizable 3DGS framework designed to handle wide-baseline panoramic images. Unlike previous approaches, Splatter- 360 performs multi-view matching directly in the spherical domain by constructing a spherical cost volume through a spherical sweep algorithm, enhancing the network's depth perception and geometry estimation. Additionally, we introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images and integrate crossview attention to improve feature interactions across multiple viewpoints. This enables robust 3D-aware feature representations and real-time rendering capabilities. Experimental results on the HM3D [27] and Replica [32] demonstrate that Splatter-360 significantly outperforms state-ofthe-art NeRF and 3DGS methods (e.g., PanoGRF, MVSplat, DepthSplat, and HiSplat) in both synthesis quality and generalization performance for wide-baseline panoramic images.

Method

Interpolate start reference image

Our Splatter-360 processes 360° panoramic images using a bi-projection encoder that extracts features from both equirectangular projection (ERP) and cube-map projection (CP) through multi-view transformers. These features are used for spherical cost volume construction, and multi-view matching is performed between the reference and source views in spherical space. Next, a refinement U-Net is applied to enhance the spherical cost volume, yielding refined cost volumes and more accurate spherical depth estimations. These refined outputs are then fed into the Gaussian decoder, which produces pixel-aligned Gaussian primitives for synthesizing novel views.



Video

Novel view rendering with wide-baseline inputs on HM3D and Replica.

More results

Interpolate start reference image

Novel view synthesis results on HM3D with wide-baseline inputs.


Interpolate start reference image

Novel view synthesis results on Replica with wide-baseline inputs.

Interpolate start reference image

Novel perspective view depth results on Replica with wide-baseline inputs.

Interpolate start reference image

Quantitative Results with three-view inputs.

Interpolate start reference image

Quantitative Results with narrow-baseline inputs.

BibTeX

@article{panogrf,
      title={Splatter-360: Generalizable 360◦ Gaussian Splatting for Wide-baseline
        Panoramic Images}, 
      author={Zheng Chen and Chenming Wu and Zhelun Shen and Chen Zhao and Errui Ding and Song-Hai Zhang},
      year={2024},
      
}