SLAM综述(2)-视觉SLAM
分
享
SLAM包含了两个主要的任务:定位与构图,在移动机器人或者自动驾驶中,这是一个十分重要的问题:机器人要精确的移动,就必须要有一个环境的地图,那么要构建环境的地图就需要知道机器人的位置。
本系列文章主要分成四个部分:
在第一部分中,将介绍Lidar SLAM,包括Lidar传感器,开源Lidar SLAM系统,Lidar中的深度学习以及挑战和未来。
第二部分重点介绍了Visual SLAM,包括相机传感器,不同稠密SLAM的开源视觉SLAM系统。
第三部分介绍视觉惯性里程法SLAM,视觉SLAM中的深度学习以及未来。
第四部分中,将介绍激光雷达与视觉的融合。
摘要
随着CPU和GPU的发展,图形处理能力变得越来越强大。相机传感器同时变得更便宜,更轻巧,功能更广泛。在过去的十年中,视觉SLAM迅速发展。与Lidar系统相比,使用相机的Visual SLAM还使该系统更便宜,更小。如今SLAM可视化系统可以在微型PC和嵌入式设备中运行,甚至可以在智能手机[1],[2]等移动设备中运行。
视觉SLAM通常包含了传感器数据的处理,包括了摄像机或惯性测量单元,前端的视觉里程计或视觉惯导融合的里程计,后端的优化,后端的闭环以及构建地图[3]。并且重定位是稳定和准确的视觉SLAM的另外一个十分重要的模块[4]。在视觉里程计的过程中,除了基于特征或模板匹配的方法或确定相机运动的相关方法之外,还有另一种方法依赖于Fourier-Mellin变换[5]。[6]和[7]给出了使用地面摄像头时没有明显视觉特征的环境中的实例
视觉传感器
基于视觉SLAM的最常用的传感器是相机,相机可以分为单眼相机,立体相机,RGB-D相机,事件相机等。
单眼相机:基于单目摄像机的视觉slam具有与实际轨迹和地图大小对应上会有一个尺度问题,也就是我们常说的,单目相机无法获得真实的深度,这就是所谓的尺度不确定性[8]。基于单目摄像机的SLAM必须进行初始化,并面临漂移问题。
立体相机:立体相机是两个单眼相机的组合,但已知两个单眼相机之间的基线距离。尽管可以基于校准,校正,匹配和计算来获得深度,但是该过程占用资源。
RGB-D照相机:RGB-D照相机也称为深度照相机,因为深度相机可以直接输出以像素为单位的深度。深度相机可以通过立体声,结构光和TOF技术实现。结构光的理论是红外激光向物体表面发出某种具有结构特征的图案。然后,红外摄像头将收集由于表面深度不同而引起的图案变化。TOF将测量激光飞行时间以计算距离。
事件摄像机:[9]说明了事件摄像机不是以固定的速率捕获图像,而是异步测量每个像素的亮度变化。事件摄像机具有很高的动态范围(140 dB对60 dB),高时间分辨率(按数量级),低功耗,并且不会受到运动模糊的影响。因此,事件摄像机在高速和高动态范围内的性能要优于传统摄像机。事件摄像机的示例是动态视觉传感器[10],动态线传感器[11],动态和动态主动像素视觉传感器[12]和基于异步时间的图像传感器[13]。
视觉传感器的产品和公司
·Microsoft:Kinectc v1(结构化轻型),Kinect v2(TOF), Azure Kinect(带有麦克风和IMU)。
·英特尔:200系列,300系列,模块D400系列, D415(主动红外立体,卷帘快门),D435(主动红外)立体,全局快门),D435i(带IMU的D435)。
·Stereolabs ZED:ZED立体摄像机(深度达20m)。
·MYNTAI:D1000系列(深度相机),D1200(用于智能手机),S1030系列(标准立体摄像头)。
(备注:在本平台咨询并购买可在原价基础上领取优惠券100元。)
· Occipital Structure :结构传感器(适用于ipad)。
· Samsung :第二代和第三代动态视觉传感器和
基于事件的视觉解决方案[65]。其他深度相机可以列举如下,但不限于Leap Motion,Orbbec Astra,Pico Zense,DUO,Xtion,Camboard,IMI,Humanplus,PERCIPIO.XYZ,PrimeSense。其他事件摄像机可以列举如下,但不限于:Innovation,AIT(AIT奥地利技术学院),SiliconEye,Prophesee,CelePixel,Dilusense。
基于图像的SLAM方法可以分为直接方法和基于特征的方法。直接方法是构建半密集和密集地图的方法,而基于特征的是构建稀疏的特征点的方式。
稀疏视觉SLAM
·MonoSLAM:(单目相机)是第一个基于EKF [14]的实时单SLAM系统。
·PTAM:(单目相机)是第一个并行跟踪和构建地图的SLAM系统。它首先采用捆集调整来优化关键帧[15]的概念。更高的版本支持简单而有效的重新定位 [16]。
·ORB-SLAM:它(单目)使用三个线程:跟踪,构建地图和闭环检测[17]。ORBSLAM v2 [18]支持单目,立体和RGB-D相机。CubemapSLAM [19]是基于ORB-SLAM的单目鱼眼镜头SLAM系统。视觉惯性ORB-SLAM [20]解释了IMU的初始化过程以及使用视觉信息进行的联合优化。
·proSLAM:(立体相机)是一种轻量级的视觉SLAM系统,且易于理解[21]。
·ENFT-sfm:(单目相机)是一种特征跟踪方法,可以有效地匹配一个或多个视频序列之间的特征点对应关系[22]。更新版本ENFT-SLAM可以大规模运行。
·OpenVSLAm :(支持所有类型的摄像机)[23]基于具有稀疏特征的间接SLAM算法。OpenVSLAM的优点在于,该系统支持透视图,鱼眼图和等距矩形,甚至支持任何用户设计的相机模型。
·TagSLAM:它通过AprilTag基准标记实现SLAM [24]。而且,它为GTSAM因子图优化器提供了一个前端,可以设计大量实验。其他类似的工作可以列出如下,但不仅限于UcoSLAM [25]。
半稠密视觉SLAM
·LSD-SLAM:(单目相机)提出了一种新颖的直接跟踪方法,该方法可在李代数和直接方法上运行[26]。[27]使它支持立体摄像机。
·SVO:(单目相机)是半直接视觉Odoemtry [28]。它使用基于稀疏模型的图像对齐来获得更快的速度。该更新版本扩展到了多个相机,包括鱼眼镜头和折反射相机。CNN-SVO [29]是SVO的版本,具有来自单图像深度预测网络的深度预测。
·DSO:(单目相机)[30]是LSD-SLAM 作者的新作品。该工作创建了基于直接方法和稀疏方法的可视化odemtry,而无需检测和描述特征点。
·EVO:它(事件摄像机)[31]是基于事件的视觉测距算法。我们的算法不受运动模糊的影响,并且在具有挑战性的高动态范围条件下以及强烈的照明变化下都能很好地运行。其他基于事件相机的SemiDense SLAM可以在[32]中看到。在[33]中可以看到基于事件相机的其他VO(视觉测距)系统。
稠密的视觉 SLAM
·DTAM:它(单目)可以基于在新颖的非凸优化框架中使全局空间正则化的能量函数最小化的基础上实时重建3D模型,这称为直接方法[34]。
·MLM SLAM:它(单目)可以在线重建密集的3D模型,而无需图形处理单元(GPU)。关键贡献在于多分辨率深度估计和空间平滑过程。
·Kinect Fusion:(RGB-D)是第一个带有深度相机[35]的3D重建系统。
·DVO:它(RGB-D)提出了一种密集的视觉SLAM方法,一种基于熵的相似度度量用于关键帧选择和基于g2o框架的闭环检测[36]。
·RGBD-SLAM-V2:利用(RGB-D)深度相机即可重建准确的3D密集模型[37]。
·Kintinuous:它(RGB-D)是一种视觉SLAM系统,具有实时全局一致的点和网格重构[38]。
·RTAB-MAP:它(RGB-D)支持同时定位和映射,但是很难作为开发高级算法的基础[39]。后者同时支持视觉和激光雷达SLAM [40]。
· Dynamic Fusion :它(RGB-D)提出了第一个密集SLAM系统,该系统能够基于Kinect Fusion实时重建非刚性变形场景[41]。VolumeDeform [42]还实现了实时的非刚性重建,但不是开源的。在Fusion4D中可以看到类似的工作[43]。
· Elastic Fusion :它(RGB-D)是一种实时密集视觉SLAM系统,能够捕获使用RGB-D摄像机探索的全面,密集,全局一致的基于房间规模环境的地图,[44]。
·InfiniTAM:它(RGB-D)是在Linux,IOS,Android平台上具有CPU的实时3D重建系统 [45]。
· Bundle Fusion :它(RGB-D)支持强大的跟踪功能,可从严重的跟踪故障中恢复,并实时重新估计3D模型以确保全局一致性[46]。
·KO-Fusion:它(RGB-D)[47]提出了一种密集的RGB-D SLAM系统,该系统具有来自轮式机器人的运动学和里程测量。
·SOFT-SLAM:它(立体)[48]可以创建密集图,并具有基于SOFT [49]进行姿势估计的大闭环效果。其他作品可以列出如下,但不仅限于SLAMRecon,RKD-SLAM [50]和RGB-D SLAM [51]。
参考文献
向上滑动阅览
[1] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163, 2015.
[2] Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018.
[3] Xiang Gao, Tao Zhang, Yi Liu, and Qinrui Yan. 14 Lectures on Visual SLAM: From Theory to Practice. Publishing House of Electronics Industry, 2017.
[4] Takafumi Taketomi, Hideaki Uchiyama, and Sei Ikeda. Visual slam algorithms: A survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, 9(1):16, 2017.
[5] B Srinivasa Reddy and Biswanath N Chatterji. An fft-based technique for translation, rotation, and scale-invariant image registration. IEEE transactions on image processing, 5(8):1266–1271, 1996.
[6] Tim Kazik and Ali Haydar G ¨okto ˘gan. Visual odometry based on the fourier-mellin transform for a rover using a monocular ground-facing camera. In 2011 IEEE International Conference on Mechatronics, pages 469–474. IEEE, 2011.
[7] Merwan Birem, Richard Kleihorst, and Norddin El-Ghouti. Visual odometry based on the fourier transform using a monocular groundfacing camera. Journal of Real-Time Image Processing, 14(3):637–646, 2018.
[8] Liu Haomin, Zhang Guofeng, and Bao hujun. A survy of monocular simultaneous localization and mapping. Journal of Computer-Aided Design & Computer Graphics, 28(6):855–868, 2016.
[9] Guillermo Gallego, Tobi Delbruck, Garrick Orchard, Chiara Bartolozzi, and Davide Scaramuzza. Event-based vision: A survey. 2019.
[10] Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128x128 120db 15us latency asynchronous temporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2):566–576, 2008.
[11] Christoph Posch, Michael Hofstatter, Daniel Matolin, Guy Vanstraelen, Peter Schon, Nikolaus Donath, and Martin Litzenberger. A dual-line optical transient sensor with on-chip precision time-stamp generation. In 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pages 500–618. IEEE, 2007.
[12] Christian Brandli, Raphael Berner, Minhao Yang, Shih-Chii Liu, and Tobi Delbruck. A 240× 180 130 db 3 µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014.
[13] Christoph Posch, Daniel Matolin, and Rainer Wohlgenannt. A qvga 143 db dynamic range frame-free pwm image sensor with lossless pixel-level video compression and time-domain cds. IEEE Journal of Solid-State Circuits, 46(1):259–275, 2010.
[14] Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6):1052–1067, 2007.
[15] Georg Klein and David Murray. Parallel tracking and mapping for small ar workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pages 1– 10. IEEE Computer Society, 2007.
[16] Georg Klein and David Murray. Improving the agility of keyframebased slam. In European Conference on Computer Vision, pages 802– 815. Springer, 2008.
[17] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary R Bradski. Orb: An efficient alternative to sift or surf. In ICCV, volume 11, page 2. Citeseer, 2011.
[18] Raul Mur-Artal and Juan D Tard ´os. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017.
[19] Yahui Wang, Shaojun Cai, Shi-Jie Li, Yun Liu, Yangyan Guo, Tao Li, and Ming-Ming Cheng. Cubemapslam: A piecewise-pinhole monocular fisheye slam system. In Asian Conference on Computer Vision, pages 34–49. Springer, 2018.
[20] Ra´ul Mur-Artal and Juan D Tard ´os. Visual-inertial monocular slam with map reuse. IEEE Robotics and Automation Letters, 2(2):796– 803, 2017.
[21] D. Schlegel, M. Colosi, and G. Grisetti. ProSLAM: Graph SLAM from a Programmer’s Perspective. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–9, 2018.
[22] Guofeng Zhang, Haomin Liu, Zilong Dong, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao. Efficient non-consecutive feature tracking for robust structure-from-motion. IEEE Transactions on Image Processing, 25(12):5957–5970, 2016.
[23] Shinya Sumikura, Mikiya Shibuya, and Ken Sakurada. Openvslam: a versatile visual slam framework, 2019.
[24] Bernd Pfrommer and Kostas Daniilidis. Tagslam: Robust slam with fiducial markers. arXiv preprint arXiv:1910.00679, 2019.
[25] Rafael Munoz-Salinas and Rafael Medina-Carnicer. Ucoslam: Simultaneous localization and mapping by fusion of keypoints and squared planar markers. arXiv preprint arXiv:1902.03729, 2019.
[26] Jakob Engel, Thomas Sch ¨ops, and Daniel Cremers. Lsd-slam: Largescale direct monocular slam. In European conference on computer vision, pages 834–849. Springer, 2014.
[27] Jakob Engel, J ¨org St ¨uckler, and Daniel Cremers. Large-scale direct slam with stereo cameras. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1935–1942. IEEE, 2015.
[28] Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, and Davide Scaramuzza. Svo: Semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 33(2):249–265, 2016.
[29] Shing Yan Loo, Ali Jahani Amiri, Syamsiah Mashohor, Sai Hong Tang, and Hong Zhang. Cnn-svo: Improving the mapping in semi-direct visual odometry using single-image depth prediction. arXiv preprint arXiv:1810.01011, 2018.
[30] Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. CoRR, abs/1607.02565, 2016. [91] Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence, 40(3):611–625, 2017.
[31] Henri Rebecq, Timo Horstsch¨afer, Guillermo Gallego, and Davide Scaramuzza. Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2):593–600, 2016.
[32] Yi Zhou, Guillermo Gallego, Henri Rebecq, Laurent Kneip, Hongdong Li, and Davide Scaramuzza. Semi-dense 3d reconstruction with a stereo event camera. In Proceedings of the European Conference on Computer Vision (ECCV), pages 235–251, 2018.
[33] David Weikersdorfer, Raoul Hoffmann, and J ¨org Conradt. Simultaneous localization and mapping for event-based vision systems. In International Conference on Computer Vision Systems, pages 133–142. Springer, 2013.
[34] Javier Civera, Andrew J Davison, and JM Martinez Montiel. Inverse depth parametrization for monocular slam. IEEE transactions on robotics, 24(5):932–945, 2008.
[35] Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew W Fitzgibbon. Kinectfusion: Realtime dense surface mapping and tracking. In ISMAR, volume 11, pages 127–136, 2011.
[36] Frank Steinbr ¨ucker, J ¨urgen Sturm, and Daniel Cremers. Real-time visual odometry from dense rgb-d images. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 719–722. IEEE, 2011.
[37] Christian Kerl, J ¨urgen Sturm, and Daniel Cremers. Robust odometry estimation for rgb-d cameras. In 2013 IEEE International Conference on Robotics and Automation, pages 3748–3754. IEEE, 2013.
[38] Thomas Whelan, Michael Kaess, Maurice Fallon, Hordur Johannsson, John J Leonard, and John McDonald. Kintinuous: Spatially extended kinectfusion. 2012.
[39] Mathieu Labbe and Franc¸ois Michaud. Online global loop closure detection for large-scale multi-session graph-based slam. In 2014 12 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2661–2666. IEEE, 2014.
[109] MM Labb´e and F Michaud. Appearance-based loop closure detection in real-time for large-scale and long-term operation. IEEE Transactions on Robotics, pages 734–745.
[40] Mathieu Labb´e and Franc¸ois Michaud. Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. Journal of Field Robotics, 36(2):416–446, 2019.
[41] Richard A Newcombe, Dieter Fox, and Steven M Seitz. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 343–352, 2015.
[42] Matthias Innmann, Michael Zollh ¨ofer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. Volumedeform: Real-time volumetric non-rigid reconstruction. In European Conference on Computer Vision, pages 362–379. Springer, 2016.
[43] Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, et al. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (TOG), 35(4):114, 2016.
[44] Thomas Whelan, Stefan Leutenegger, R Salas-Moreno, Ben Glocker, and Andrew Davison. Elasticfusion: Dense slam without a pose graph. Robotics: Science and Systems, 2015.
[45] V A Prisacariu, O K¨ahler, S Golodetz, M Sapienza, T Cavallari, P H S Torr, and D W Murray. InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure. arXiv pre-print arXiv:1708.00783v1, 2017.
[46] Angela Dai, Matthias Nießner, Michael Zoll ¨ofer, Shahram Izadi, and Christian Theobalt. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. ACM Transactions on Graphics 2017 (TOG), 2017.
[47] Charlie Houseago, Michael Bloesch, and Stefan Leutenegger. Kofusion: Dense visual slam with tightly-coupled kinematic and odometric tracking. In 2019 International Conference on Robotics and Automation (ICRA), pages 4054–4060. IEEE, 2019.
[48] Igor Cviˇsic, Josip Cesic, Ivan Markovic, and Ivan Petrovic. Softslam: Computationally efficient stereo visual slam for autonomous uavs. Journal of field robotics, 2017.
[49] Igor Cviˇsi´c and Ivan Petrovi´c. Stereo odometry based on careful feature selection and tracking. In 2015 European Conference on Mobile Robots (ECMR), pages 1–6. IEEE, 2015.
[50] Haomin Liu, Chen Li, Guojun Chen, Guofeng Zhang, Michael Kaess, and Hujun Bao. Robust keyframe-based dense slam with an rgb-d camera. arXiv preprint arXiv:1711.05166, 2017.
[51] Weichen Dai, Yu Zhang, Ping Li, and Zheng Fang. Rgb-d slam in dynamic environments using points correlations. arXiv preprint arXiv:1811.03217, 2018.
资源
三维点云论文及相关应用分享
【点云论文速读】基于激光雷达的里程计及3D点云地图中的定位方法
3D-MiniNet: 从点云中学习2D表示以实现快速有效的3D LIDAR语义分割(2020)
PCL中outofcore模块---基于核外八叉树的大规模点云的显示
更多文章可查看:点云学习历史文章大汇总
SLAM及AR相关分享