Seite - 136 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 136 -
Text der Seite - 136 -
ferent object masks. This would enable the network
to learn a bigger amount of object views, than those
that a human demonstrator can show in a reasonable
time.
Ourmethodworkswellwhenfacingrigidobjects,
when the number of unique views is limited. How-
ever, when it comes to deformable objects, number
of unique views increases dramatically. Therefore,
in thosecases theefficiencyofourmethoddropssig-
nificantly.
5.Conclusion
In this paper we intend to highlight open prob-
lems of a standard object detector when applied to
slightly and highly deformable objects. We specifi-
callytrainedtheYOLOv3detector tocopewiththese
cases. To reduce the time consuming effort of image
annotations, we proposed an automated method for
synthesizing the training images. The idea is toshow
objectson simplebackgroundanduseashortvideos
and a few annotations with augmentation of training
data to obtain better performance. While this works
well forrigidobjectswithanAPof87.38%,weshow
that for slightly deformable objects like scissors and
headphones the detection performance drops signifi-
cantlyto49.54%. Thedropis,asexpectedevenmore
drastic for highly deformable objects like a chain or
earphones,down toAPof 26.58%.
Using the example of a chain we show that it is
possible to pose the problem of detection of the de-
formable objects as detection of its elementary rigid
element-a link. Tofurther tackle thisproblem,mod-
elling of deformable objects could be used for syn-
theticdata generation.
Acknowledgment
This research is partially supported by the Vienna
Science and Technology Fund (WWTF), project
RALLI (ICT15-045andFestoAG&Co. KG.
References
[1] M.Cimpoi,S.Maji, I.Kokkinos,S.Mohamed, ,and
A. Vedaldi. Describing textures in the wild. In Pro-
ceedings of the IEEE Conf. on Computer Visionand
PatternRecognition (CVPR), 2014.
[2] D. Dwibedi, I. Misra, and M. Hebert. Cut, paste
and learn: Surprisingly easy synthesis for instance
detection. In Proceedings of the IEEE International
Conference on Computer Vision, pages 1301–1310,
2017. [3] M. Everingham, L. Van Gool, C. K. I. Williams,
J. Winn, and A. Zisserman. The pascal visual ob-
ject classes (voc) challenge. International Journal
ofComputer Vision, 88(2):303–338, June2010.
[4] G. Georgakis, A. Mousavian, A. C. Berg, and
J. Kosecka. Synthesizing training data for ob-
ject detection in indoor scenes. arXiv preprint
arXiv:1702.07836, 2017.
[5] G. Georgakis, M. A. Reza, A. Mousavian, P.-H. Le,
and J. Kosˇecka´. Multiview rgb-d dataset for ob-
ject instancedetection. In2016FourthInternational
Conference on 3D Vision (3DV), pages 426–434.
IEEE,2016.
[6] J.Huh,K.Lee, I.Lee, andS.Lee. Asimplemethod
on generating synthetic data for training real-time
object detection networks. In 2018 Asia-Pacific
Signal and Information Processing Association An-
nual Summit and Conference (APSIPA ASC), pages
1518–1522,Nov2018.
[7] A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings,
I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov,
M. Malloci, T. Duerig, and V. Ferrari. The open im-
ages dataset v4: Unified image classification, object
detection, and visual relationship detection at scale.
arXiv:1811.00982, 2018.
[8] K. Lai, L. Bo, and D. Fox. Unsupervised feature
learning for 3d scene labeling. In 2014 IEEE In-
ternationalConferenceonRoboticsandAutomation
(ICRA), pages3050–3057. IEEE,2014.
[9] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Per-
ona, D. Ramanan, P. Dolla´r, and C. L. Zitnick. Mi-
crosoft coco: Common objects in context. In Eu-
ropean conference on computer vision, pages 740–
755.Springer, 2014.
[10] Y. Liu, Y. Wang, S. Wang, T. Liang, Q. Zhao,
Z. Tang, and H. Ling. Cbnet: A novel composite
backbone network architecture for object detection.
arXivpreprintarXiv:1909.03625, 2019.
[11] P. Pe´rez, M. Gangnet, and A. Blake. Poisson im-
age editing. ACM Transactions on graphics (TOG),
22(3):313–318,2003.
[12] A. Quattoni and A. Torralba. Recognizing indoor
scenes. In 2009 IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 413–420, June
2009.
[13] J. Redmon and A. Farhadi. Yolov3: An incremen-
tal improvement. arXiv preprint arXiv:1804.02767,
2018.
[14] S.Ren,K.He,R.Girshick, andJ.Sun. Faster r-cnn:
Towards real-time object detection with region pro-
posal networks. In Advances in neural information
processing systems, pages 91–99,2015.
[15] K.Wang,F.Shi,W.Wang,Y.Nan,andS.Lian. Syn-
thetic data generation and adaption for object de-
tection in smart vending machines. arXiv preprint
arXiv:1904.12294, 2019.
136
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik