论文:2024,Vol:42,Issue(1):129-137
引用本文:
罗锦锋, 袁冬莉, 张蓝, 屈耀红, 宿世鸿. 基于多视图传播的无监督三维重建方法[J]. 西北工业大学学报
LUO Jingfeng, YUAN Dongli, ZHANG Lan, QU Yaohong, SU Shihong. Unsupervised 3D reconstruction method based on multi-view propagation[J]. Journal of Northwestern Polytechnical University

基于多视图传播的无监督三维重建方法
罗锦锋1, 袁冬莉1, 张蓝2, 屈耀红1, 宿世鸿1
1. 西北工业大学 自动化学院, 陕西 西安 710072;
2. 西安电子科技大学 网络与信息安全学院, 陕西 西安 710071
摘要:
提出一种端到端的深度学习框架,从多视图中计算深度图从而重建出三维模型。针对目前大多数研究方法通过3D卷积实现3D成本体积正则化并回归得到初始深度图带来巨大的GPU内存消耗,以及由于设备受限导致有监督的方法中深度图真值难以获取的问题,提出一种多视图传播的无监督三维重建方法。该方法借鉴Patchmatch算法思想,在深度范围内将深度划分n层,通过多视图传播得到深度假设,并利用多个视图之间的光度一致性、结构相似性和深度平滑度构建多指标损失函数,作为网络中学习深度预测的监督信号。实验表明,文中提出的方法在DTU、Tanks & Temples和自制数据集上的性能和泛化性非常有竞争力,比采用3D成本体积正则化的方法快1.7倍以上,内存使用量减少75%。
关键词:    多视图传播    无监督    三维重建    Patchmatch算法    多指标损失函数   
Unsupervised 3D reconstruction method based on multi-view propagation
LUO Jingfeng1, YUAN Dongli1, ZHANG Lan2, QU Yaohong1, SU Shihong1
1. School of Automation, Northwestern Polytechnical University, Xi'an 710072, China;
2. School of Cyber Engineering, Xidian University, Xi'an 710071, China
Abstract:
In this paper, an end-to-end deep learning framework for reconstructing 3D models by computing depth maps from multiple views is proposed. An unsupervised 3D reconstruction method based on multi-view propagation is introduced, which addresses the issues of large GPU memory consumption caused by most current research methods using 3D convolution for 3D cost volume regularization and regression to obtain the initial depth map, as well as the difficulty in obtaining true depth values in supervised methods due to device limitations. The method is inspired by the Patchmatch algorithm, and the depth is divided into n layers within the depth range to obtain depth hypotheses through multi-view propagation. What's more, a multi-metric loss function is constructed based on luminosity consistency, structural similarity, and depth smoothness between multiple views to serve as a supervisory signal for learning depth predictions in the network. The experimental results show our proposed method has a very competitive performance and generalization on the DTU, Tanks & Temples and our self-made dataset; Specifically, it is at least 1.7 times faster and requires more than 75% less memory than the method that utilizes 3D cost volume regularization.
Key words:    multi-view propagation    unsupervised    3D reconstruction    Patchmatch algorithm    multi-metric loss function   
收稿日期: 2023-03-17     修回日期:
DOI: 10.1051/jnwpu/20244210129
基金项目: 国家自然科学基金(61473229)与航空科学基金 (20181353013)资助
通讯作者: 袁冬莉(1966-),副教授 e-mail:yuandongli@nwpu.edu.cn     Email:yuandongli@nwpu.edu.cn
作者简介: 罗锦锋(1999-),硕士研究生
相关功能
PDF(2796KB) Free
打印本文
把本文推荐给朋友
作者相关文章
罗锦锋  在本刊中的所有文章
袁冬莉  在本刊中的所有文章
张蓝  在本刊中的所有文章
屈耀红  在本刊中的所有文章
宿世鸿  在本刊中的所有文章

参考文献:
[1] YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo[C]//15th European Conference on Computer Vision, 2018: 785-801
[2] GALLUP D, FRAHM J M, MORDOHAI P, et al. Real-time plane-sweeping stereo with multiple sweeping directions[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007: 1-8
[3] YAO Y, LUO Z X, LI S W, et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5520-5529
[4] YANG J, MAO W, ALVAREZ J, et al. Cost volume pyramid based depth inference for multi-view stereo[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2022, 44(9): 4748-4760
[5] GALLIANI S, LASINGER K, SCHINDLER K. Massively parallel multiview stereopsis by surface normal diffusion[C]//2015 IEEE International Conference on Computer Vision, 2015: 873-881
[6] SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4104-4113
[7] XU Q, TAO W. Multi-scale geometric consistency guided multi-view stereo[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5478-5487
[8] BARNES C, SHECHTMAN E, FINKELSTEIN A, et al. PatchMatch: a randomized correspondence algorithm for structural image editing[J]. ACM Transactions on Graphics, 2009, 28(3): 24
[9] BLEYER M, RHEMANN C, ROTHER C. PatchMatch stereo-stereo matching with slanted support windows[C]//British Machine Vision Conference, 2011
[10] WANG F, GALLIANI S, VOGEL C, et al. PatchmatchNet: learned multi-view patchmatch stereo[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14189-14198
[11] KHOT T, AGRAWAL S, TULSIANI S, et al. Learning unsupervised multi-view stereopsis via robust photometric consistency[J/OL]. (2019-05-07)[2023-03-17]. https://arxiv.org/abs/1905.02706
[12] HUANG B, YI H, HUANG C, et al. M3VSNET: unsupervised multi-metric multi-view stereo network[C]//IEEE International Conference on Image Processing, 2021: 3163-3167
[13] HUI T W, LOY C C, TANG X O. Depth map super-resolution by deep multi-scale guidance[C]//14th European Conference on Computer Vision, 2016: 353-369
[14] MUR-ARTAL R, TARDÓJ D. ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras[J]. IEEE Trans on Robotics, 2017, 33(5): 1255-1262
[15] SHEWCHUK J R. Delaunay refinement algorithms for triangular mesh generation[J]. Computational Geometry & Applications, 2014, 47(1/2/3): 741-778
[16] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944
[17] AANAES H, JENSEN R R, VOGIATIZIS G, et al. Large-scale data for multiple-view stereopisis[J]. International Journal of Computer Vision, 2016, 120(2): 153-168
[18] KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 1-13
[19] XU H, ZHOU Z, QIAO Y, et al. Self-supervised multi-view stereo via effective co-segmentation and data-augmentation[C]//35th AAAI Conference on Artificial Intelligence, 2021: 3030-3038
[20] FURUKAWA Y, PONCE J. Accurate, dense, and robust multiview stereopsis[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2010, 32(8): 1362-1376
[21] CAMPBELL N, VOGIATZIS G, HERNÁNDEZ C, et al. Using multiple hypotheses to improvedepth-maps for multi-view stereo[C]//10th European Conference on Computer Vision, 2008: 766-779
[22] JI M, GALL J, ZHENG H, et al. SurfaceNet: an end-to-end 3D neural network for multiview stereopsis[C]//2017 IEEE International Conference on Computer Vision, 2017: 2326-2334
[23] YU Z, GAO S. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1946-1955
[24] DAI Y, ZHU Z, RAO Z, et al. MVS2: deep unsupervised multi-view stereo with multi-view symmetry[C]//2019 International Conference on 3D Vision, 2019: 1-8