Seite - 105 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 105 -

Text der Seite - 105 -

between themodels, theentirepreprocessing isdone separately for each model. First of all, the dataset is resampled to the respective voxel spacing using a third order B-spline interpolation for the scans and a label-linear interpolation for the ground truth. Next, the intensityvaluesareclipped to the0.5thand 99.5th percentile over the entire training dataset of the fold. Furthermore, the scans are normalized by subtracting themeanand thestandarddeviationover the clipped trainingdataset. 5.2.Architecture We use the architecture described by Isensee et al. [9] and implemented in the Github project 3DUnetCNN [5] as a basis for our experi- ments. Weadjusted the followingmodelparameters: input size, model-depth (number of layers), number of segmentation levels (used for deep supervision) and base-filters (filters in the first convolution kernel). ForM1 (input size of 192× 192× 128) we selected a model-depth of 5 with 3 segmentation levels and base-filters set to 8. ForM2 on the other hand (input size of 160× 160× 128), we chose an increased model-depth of 6 with 4 segmentation levels and base-filters set to 16. The changes toM2 were made in order to account for the larger patch size (compared to 1283 used by Isensee et al.) and increase the receptive field of the model. These changes were omitted forM1, which encompasses a simpler segmentation task, creating only a coarse segmentation of the blood lumen label, whileM2 segments both the blood lumen and the stent-graft wire frame. 5.3.Training We trained both models using a weighted multi- classDice loss [9] incombinationwithanAdamop- timizer. The initial learning rate was set to η0 = 5 ·10−4 with a learning rate drop criterion and early stopping after 50 epochs. The training ran for 70 to 120 epochs with 200 training samples per epoch. Duetothe5-foldcrossvalidationusedforevaluation, the following statistics are averaged over all folds, where for each fold both modelsM1 andM2 were trained as follows. M1 was trained first for blood lumen segmentation on the low resolution large re- gions. The training reached a DSC of 0.978 and 0.898, on average, for the training and validation items, respectively. M1 was then used to create the blood lumensegmentations forcenterlineextraction. The resulting centerline graphs were subsequently usedduring the trainingofM2 as thehigh resolution patcheswereextractedat randompositionsalong the graph. The average training and validation DSCs for the blood lumen are 0.954 and 0.943, respectively, and0.843and0.841 for the stent-graft. 6.Evaluation Having trained two modelsM1 andM2 for each fold, we use our method to create high resolution segmentations. Just like during training,M1 is used to segment the blood lumen used for centerline extraction. The resulting centerline graph is again used to place patches at, however, not randomly but rather at equally distributed positions along the entire span of the graph, as described in Section 4.2. In a post-processing step, the largest connected region of non-background voxels was selected. To compare the results to the ground truth, the segmentations where furthermore resampled to their originalvoxel-spacing. The last stepmaybeskipped when using the results for further processing rather thanevaluation(e.g.,meshgenerationforblood-flow simulations). Using our method, the cross validation yields an average DSC of0.961 for the blood lumen and 0.841 for the stent-graft label. Two examples are shown inFigure5. Toevaluate theeffectivenessofourpatchextraction method, we further conducted an experiment using onlyM2,whichwas trainedusinga traditionalpatch extraction method (see Isensee et al. [10]). Rather than placing the patches along the aorta centerlines, they where placed in a sliding-window fashion, where the patches are aligned in a regular grid of overlapping tiles. The overlap was set to 32 voxels in each dimension (corresponding to 11.2mm frontal/sagittal and 24mm longitudinal). While this technique was used both during training and inference, the remaining setup (including pre- and post-processing) was left unchanged. We evaluated (a) (b) Figure 5. Evaluation results for the two scans shown in Figure3. 105

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"