Page - 133 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 133 -

Text of the Page - 133 -

jects like chains or cables. Most of the previous work on object detection focuses on detecting rigid objects[3, 15, 6, 2]. Our goal is to expand this re- search to deformable objects as well. We train an object detector based on CNN to detect both rigid and deformable objects. For this task a big amount of training images is required. Obtaining this data manually is time consuming, therefore we propose a method for synthesizing the training data which in- cludesanRGBbasedsegmentationprocedure that is able to handle deformable objects. We then use pub- licly available datasets as background for the syn- thetic images and augmentation techniques to in- crease the variabilityof thedataset. Figure 2. Illustration of the mask acquiring process. Top left image shows the original RGB image. Top right im- age shows the result of appyling k-means method to the original RGB image. Bottom left image shows the auto- maticallyselectedcontourandthearea insideof itcolored ingreen. Bottomright imageshowsthefinalextractedob- jectmasks. 3.1.Data acquisition Publicly available datasets which contain anno- tated objects are suitable for training CNN to detect object classes. However, when it comes to detecting specificobjects, a specializeddataset is required. We synthesize a dataset by capturing the images of the objects and develop a method to segment them from the flat surfaceon topofwhich theywereplaced. For the recording of objects a Kinect camera by Microsoft mounted on a tripod is used. The cam- era is placed at approximately 30 cm above the flat surface and facing the object at an angle of approx- imately 45 degrees. During the recording, both the camera and the flat surface are stationary. The flat surface should preferably be unicolor so that the ob- ject is clearlydistinguishable fromit. After the recording was initiated, the object was manipulated by hand in order to get it to face the camera from all possible viewing angles. The point is to get the object to face the camera in as many unique perspectives as possible. The advantage of this method is that it is able to capture deformable objects by simply changing their shape while they arebeing recorded. 3.2.Dataprocessing In order to synthesize images that are needed for trainingof thenetworkobjectmasksareneeded. Ob- taining the masks of the object is possible by man- ually segmenting the object from the background or byusingasegmentationmethod. Manuallysegment- ingobjects is inefficient, thereforewedeviseasimple method for object segmentation that is used for both rigid and deformable objects. For the segmentation of the object from the background a combination of computer-vision based methods is used. It contains the followingfivesteps: 1. Firstly, k-means clustering is applied to the im- age with the k value of 2. This method is suc- cessfulatdistinguishingtheboundariesof inter- est. Additionally it is computationally more ef- ficient thanapossiblealternativeofusingOtsu’s Thresholding. 2. After application of k-means, morphological operations like image closing and erosion are appliedtotheimageinorder toconnectpossible discontinuities in theborderof theobject. 3. Next, contour detection is applied to the whole image and locations of gravity centers of the area inside of the detected contours are deter- mined. Aredcircle isdrawnon the imagecom- ing from the Kinect camera, which is shown on the screen, in which the center of the object should be placed in order to automatically start thecapturingprocess. 4. The algorithm then determines if the contour satisfies conditions in terms of its length and distance from the center of the image and, if that is the case, the recording is started. Af- ter the capturing process is initiated a predeter- mined number of object projections is recorded at a regular time interval or per keyboard com- mand. The number of projections recorded is 133

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik