Page - 133 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 133 -
Text of the Page - 133 -
jects like chains or cables. Most of the previous
work on object detection focuses on detecting rigid
objects[3, 15, 6, 2]. Our goal is to expand this re-
search to deformable objects as well. We train an
object detector based on CNN to detect both rigid
and deformable objects. For this task a big amount
of training images is required. Obtaining this data
manually is time consuming, therefore we propose a
method for synthesizing the training data which in-
cludesanRGBbasedsegmentationprocedure that is
able to handle deformable objects. We then use pub-
licly available datasets as background for the syn-
thetic images and augmentation techniques to in-
crease the variabilityof thedataset.
Figure 2. Illustration of the mask acquiring process. Top
left image shows the original RGB image. Top right im-
age shows the result of appyling k-means method to the
original RGB image. Bottom left image shows the auto-
maticallyselectedcontourandthearea insideof itcolored
ingreen. Bottomright imageshowsthefinalextractedob-
jectmasks.
3.1.Data acquisition
Publicly available datasets which contain anno-
tated objects are suitable for training CNN to detect
object classes. However, when it comes to detecting
specificobjects, a specializeddataset is required. We
synthesize a dataset by capturing the images of the
objects and develop a method to segment them from
the flat surfaceon topofwhich theywereplaced.
For the recording of objects a Kinect camera by
Microsoft mounted on a tripod is used. The cam-
era is placed at approximately 30 cm above the flat
surface and facing the object at an angle of approx-
imately 45 degrees. During the recording, both the
camera and the flat surface are stationary. The flat
surface should preferably be unicolor so that the ob- ject is clearlydistinguishable fromit.
After the recording was initiated, the object was
manipulated by hand in order to get it to face the
camera from all possible viewing angles. The point
is to get the object to face the camera in as many
unique perspectives as possible. The advantage of
this method is that it is able to capture deformable
objects by simply changing their shape while they
arebeing recorded.
3.2.Dataprocessing
In order to synthesize images that are needed for
trainingof thenetworkobjectmasksareneeded. Ob-
taining the masks of the object is possible by man-
ually segmenting the object from the background or
byusingasegmentationmethod. Manuallysegment-
ingobjects is inefficient, thereforewedeviseasimple
method for object segmentation that is used for both
rigid and deformable objects. For the segmentation
of the object from the background a combination of
computer-vision based methods is used. It contains
the followingfivesteps:
1. Firstly, k-means clustering is applied to the im-
age with the k value of 2. This method is suc-
cessfulatdistinguishingtheboundariesof inter-
est. Additionally it is computationally more ef-
ficient thanapossiblealternativeofusingOtsu’s
Thresholding.
2. After application of k-means, morphological
operations like image closing and erosion are
appliedtotheimageinorder toconnectpossible
discontinuities in theborderof theobject.
3. Next, contour detection is applied to the whole
image and locations of gravity centers of the
area inside of the detected contours are deter-
mined. Aredcircle isdrawnon the imagecom-
ing from the Kinect camera, which is shown
on the screen, in which the center of the object
should be placed in order to automatically start
thecapturingprocess.
4. The algorithm then determines if the contour
satisfies conditions in terms of its length and
distance from the center of the image and, if
that is the case, the recording is started. Af-
ter the capturing process is initiated a predeter-
mined number of object projections is recorded
at a regular time interval or per keyboard com-
mand. The number of projections recorded is
133
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik