Seite - 136 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 136 -

Text der Seite - 136 -

ferent object masks. This would enable the network to learn a bigger amount of object views, than those that a human demonstrator can show in a reasonable time. Ourmethodworkswellwhenfacingrigidobjects, when the number of unique views is limited. How- ever, when it comes to deformable objects, number of unique views increases dramatically. Therefore, in thosecases theefficiencyofourmethoddropssig- nificantly. 5.Conclusion In this paper we intend to highlight open prob- lems of a standard object detector when applied to slightly and highly deformable objects. We specifi- callytrainedtheYOLOv3detector tocopewiththese cases. To reduce the time consuming effort of image annotations, we proposed an automated method for synthesizing the training images. The idea is toshow objectson simplebackgroundanduseashortvideos and a few annotations with augmentation of training data to obtain better performance. While this works well forrigidobjectswithanAPof87.38%,weshow that for slightly deformable objects like scissors and headphones the detection performance drops signifi- cantlyto49.54%. Thedropis,asexpectedevenmore drastic for highly deformable objects like a chain or earphones,down toAPof 26.58%. Using the example of a chain we show that it is possible to pose the problem of detection of the de- formable objects as detection of its elementary rigid element-a link. Tofurther tackle thisproblem,mod- elling of deformable objects could be used for syn- theticdata generation. Acknowledgment This research is partially supported by the Vienna Science and Technology Fund (WWTF), project RALLI (ICT15-045andFestoAG&Co. KG. References [1] M.Cimpoi,S.Maji, I.Kokkinos,S.Mohamed, ,and A. Vedaldi. Describing textures in the wild. In Pro- ceedings of the IEEE Conf. on Computer Visionand PatternRecognition (CVPR), 2014. [2] D. Dwibedi, I. Misra, and M. Hebert. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 1301–1310, 2017. [3] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual ob- ject classes (voc) challenge. International Journal ofComputer Vision, 88(2):303–338, June2010. [4] G. Georgakis, A. Mousavian, A. C. Berg, and J. Kosecka. Synthesizing training data for ob- ject detection in indoor scenes. arXiv preprint arXiv:1702.07836, 2017. [5] G. Georgakis, M. A. Reza, A. Mousavian, P.-H. Le, and J. Kosˇecka´. Multiview rgb-d dataset for ob- ject instancedetection. In2016FourthInternational Conference on 3D Vision (3DV), pages 426–434. IEEE,2016. [6] J.Huh,K.Lee, I.Lee, andS.Lee. Asimplemethod on generating synthetic data for training real-time object detection networks. In 2018 Asia-Pacific Signal and Information Processing Association An- nual Summit and Conference (APSIPA ASC), pages 1518–1522,Nov2018. [7] A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari. The open im- ages dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982, 2018. [8] K. Lai, L. Bo, and D. Fox. Unsupervised feature learning for 3d scene labeling. In 2014 IEEE In- ternationalConferenceonRoboticsandAutomation (ICRA), pages3050–3057. IEEE,2014. [9] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Per- ona, D. Ramanan, P. Dolla´r, and C. L. Zitnick. Mi- crosoft coco: Common objects in context. In Eu- ropean conference on computer vision, pages 740– 755.Springer, 2014. [10] Y. Liu, Y. Wang, S. Wang, T. Liang, Q. Zhao, Z. Tang, and H. Ling. Cbnet: A novel composite backbone network architecture for object detection. arXivpreprintarXiv:1909.03625, 2019. [11] P. Pe´rez, M. Gangnet, and A. Blake. Poisson im- age editing. ACM Transactions on graphics (TOG), 22(3):313–318,2003. [12] A. Quattoni and A. Torralba. Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vi- sion and Pattern Recognition, pages 413–420, June 2009. [13] J. Redmon and A. Farhadi. Yolov3: An incremen- tal improvement. arXiv preprint arXiv:1804.02767, 2018. [14] S.Ren,K.He,R.Girshick, andJ.Sun. Faster r-cnn: Towards real-time object detection with region pro- posal networks. In Advances in neural information processing systems, pages 91–99,2015. [15] K.Wang,F.Shi,W.Wang,Y.Nan,andS.Lian. Syn- thetic data generation and adaption for object de- tection in smart vending machines. arXiv preprint arXiv:1904.12294, 2019. 136

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik