Page - 87 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Image of the Page - 87 -
Text of the Page - 87 -
Pose Estimation of Similar Shape Objects using Convolutional Neural
Network trained by Synthetic data
Kiru Park, Johann Prankl, Michael Zillich and Markus Vincze
Abstract—The objective of this paper is accurate 6D pose
estimation from 2.5D point clouds for object classes with a
high shape variation, such as vegetables and fruit. General
pose estimation methods usually focus on calculating rigid
transformations between known models and the target scene,
and do not explicitly consider shape variations. We employ
deep convolutional neural networks (CNN), which show robust
and state of the art performance for the 2D image domain.
In contrast, normally the performance of pose estimation from
point clouds is weak, because it is hard to prepare large enough
annotated training data. To overcome this issue, we propose an
autonomous generation process of synthetic 2.5D point clouds
covering different shape variations of the objects. The synthetic
data is used to train the deep CNN model in order to estimate
the object poses. We propose a novel loss function to guide
the estimator to have larger feature distances for different
poses, and to directly estimate the correct object pose. We
performed an evaluation using real objects, where the training
was conducted with artificial CAD models downloaded from a
public web resource. The results indicate that our approach is
suitable for real world robotic applications.
I. INTRODUCTION
Pose estimation of objects in color and depth images is es-
sential for bin-picking tasks to determine grasping points for
roboticgrippers.Man-madeobjectsareusuallymanufactured
using 3D CAD models having exactly the same shapes with
negligible errors. The well-constrained environment enables
the robot to identify each pose by comparing features of the
pre-created template and an input image [14]. However, it is
not possible to provide 3D CAD models for natural objects,
such as vegetables or fish, where each object has a slightly
different shape. Object pose estimation with template based
approaches would need a huge number of templates in order
tocovereach individualposeand thedifferent shapevariants.
Hence, these approaches would lead to large databases and
a high processing time for matching of the templates.
Recently, CNN based approaches provide reasonable re-
sults for most computer vision tasks including image clas-
sification and object detection in 2D images [13] [15]. This
achievement is accomplished with a large number of training
examples, e.g., [4] [7]. The 2D image datasets are usually
collected from web resource and annotated by non-expert
persons with tools using a user-friendly interface. For RGB-
D images or 2.5D point clouds it is difficult to collect a
large number of examples from public web services and
it is also hard to annotate the exact poses by non-expert
persons. This results in a lack of training data and causes
All authors are with the Vision4Robotics group, Automation and Control
Institute, Vienna University of Technology, Austria {park, prankl,
zillich, vincze}@acin.tuwien.ac.at Fig. 1: Overview of the proposed framework. An artificial
3D CAD model is used to generate synthetic scenes with
varied shapes and poses in order to train the deep CNN. The
trained network can compute poses of each of segmented
clusters.
an additional complexity to train a CNN for estimating 6D
poses in the 3D space. Therefore, pre-trained CNNs are
used for extracting features from color or depth images, and
the extracted features are used to train linear regressors to
estimate the poses [16]. Although there are several datasets
which have 6D pose information for more than 15K images
[9], [10], it is still not enough to train a deep CNN and none
of them consider object classes with large shape variations.
In this paper, we propose a simple pose estimator that can
be used to estimate poses of objects with shape variations,
such as vegetables or fruit, using a CNN and a single depth
image as input. Synthetic depth images containing various
poses and shapes of a CAD model are generated to train the
proposed CNN. No more template information is required
after training. This simplicity is one of the advantages of
the proposed model for the robust estimation of object poses
with different shape variants. The experiments show that our
concept is suitable for real world robotic applications.
As a summary, our paper provides the following contribu-
tions:
• We propose a framework that is able to generate syn-
thetic training images and consists of a deep CNN pose
estimator for the estimation of poses of natural object
classes such as vegetables and fruit.
• Pairwise training is applied to train the deep CNN with
a loss function that minimizes the errors between the
estimated poses and exact ground truth poses and low-
level feature distances between similar poses.
• We show that our estimator successfully estimates poses
of real fruit using more than two hundred test images,
87
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Title
- Proceedings of the OAGM&ARW Joint Workshop
- Subtitle
- Vision, Automation and Robotics
- Authors
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wien
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände