.. currentmodule:: orthoseg ===================== Finetune your project ===================== The default configuration of orthoseg is meant to be a good starting point for most projects. An overview of all parameters that can be finetuned can be found in the :doc:`/reference_docs` section of the documentation. In this section you can find some more in depth information on some specific parameters that can help you finetune the configuration to your specific project and needs. What model architecture should I use? ------------------------------------- For image segmentation, often an encoder-decoder architecture is used, where the encoder is a (pretrained) DNN that extracts features from the input image, and the decoder reconstructs the segmentation map from these features. Both the encoder and decoder can be implemented using different neural network architectures. The default configuration uses a `UNet` decoder with an `InceptionResNetV2` encoder as backbone because this gives a good balance of accuracy and inference speed. However, if you don't have a CUDA GPU available, you might want to switch to a lighter model to reduce the training and inference time. The sample project uses a `Linknet` with `MobileNetV2` backbone, which is significantly faster and uses less memory, but will give less accurate results. Decoder ^^^^^^^ For the decoder, following model architectures are supported: * `UNet` * `LinkNet` * `FPN` * `PSPNet` Based on practical tests, `UNet` and `LinkNet` give the best results. Both in terms of accuracy and inference speed, they are very similar. There are many variants on e.g. `UNet` as well, but based on the literature found it seems that most give relatively small accuracy improvements for significant increases in complexity, resulting in worse inference/train speed. Hence only the architectures listed above are supported, and the default in orthoseg is the classic `UNet`. Encoder ^^^^^^^ For the encoder there are even more options. In the table below some models that were considered are listed and compared to `InceptionResnetV2`: * **Model**: the name of the model * **Acc@1**: the top 1 classification accuracy on imagenet * **Speed**: an indication on the inference speed, normalized on `InceptionResnetV2` having speed = 100 * **Support**: whether the model is supported in orthoseg * **Remarks**: some remarks on the model, e.g. if practical tests have been conducted on segmentation performance,... ================== ===== ===== ======= ======= Model Acc@1 Speed Support Remarks ================== ===== ===== ======= ======= vgg11 69.15 - Tested* (=TernausNet): not very accurate vgg16 70.79 46 Yes accuracy ~ vgg11 vgg19 70.89 46 Yes " resnet18 68.24 29 " resnet34 72.17 32 " resnet50 74.81 41 Yes resnet101 76.58 60 Yes resnet152 76.66 77 Yes resnet50v2 69.73 36 Yes resnet101v2 71.93 53 Yes accuracy ~ vgg11 resnet152v2 72.29 75 Yes " resnext50 77.36 69 resnext101 78.48 110 slower + worse accuracy densenet121 74.67 51 Yes densenet169 75.85 62 Yes densenet201 77.13 77 Yes inceptionv3 77.55 71 Yes xception 78.87 77 inceptionresnetv2 80.03 100 Yes good accuracy, speed OK seresnet18 69.41 37 accuracy ~ vgg11 seresnet34 72.60 41 accuracy ~ vgg11 seresnet50 76.44 43 seresnet101 77.92 59 seresnet152 78.34 87 seresnext50 78.74 70 seresnext101 79.88 115 slower + worse accuracy senet154 81.06 251 a lot slower nasnetlarge 82.12 213 a lot slower nasnetmobile 74.04 51 mobilenet 70.36 28 Yes accuracy ~ vgg11 mobilenetv2 71.63 33 Yes accuracy ~ vgg11 EfficientNetB0 77.1 49 v2 is documented to be faster and more accurate EfficientNetB1 79.1 56 " EfficientNetB2 80.1 65 " EfficientNetB3 81.6 88 " EfficientNetB4 82.9 151 " EfficientNetB5 83.6 253 " EfficientNetB6 84.0 404 " EfficientNetB7 84.3 616 " EfficientNetV2B0 78.7 EfficientNetV2B1 79.8 EfficientNetV2B2 80.5 EfficientNetV2B3 82.0 EfficientNetV2S 83.9 EfficientNetV2M 85.3 96 Yes Tested*: similar theoretical accuracy, worse in practice EfficientNetV2L 85.7 ================== ===== ===== ======= ======= Full segmentation models ^^^^^^^^^^^^^^^^^^^^^^^^ Because orthoseg is about semantic segmentation, in the end it is the full segmentation model that matters. There is less information available on the performance of full segmentation models, so in the following overview only some of the full models that were tested with orthoseg are listed. TernausNet/Vanilla Unet """"""""""""""""""""""" * Both vanilla `Unet` and `TernausNet` (=`Unet` with `vgg11` backbone) were tested on the detection of greenhouses. The accuracy of both was very similar and was decent, but clearly worse than `InceptionResNetV2`. * Support for these models was removed from orthoseg to avoid having to upgrade them to stay supported with newer keras versions. EfficientNetV2M+Unet """""""""""""""""""" * The classification accuracy of just `EfficientNetV2M` on imagenet is significantly better than `InceptionResNetV2` (85.3% vs 80.3% top 1 accuracy), so you would expect the segmentation performance to be better as well. * The number of weights is similar, and train/inference speed is reported to be a lot faster than v1 of the first generation EfficientNet family. * The train/inference speed for a segmentation was about the same as `InceptionResNetV2`: training was a few % slower, but inference was a few % faster. Both tested on the CUDA GPU (NVidia Quadro P5000). * The IOU score from the training obtained on a "sealed surfaces" detection was slightly higher than `InceptionResNetV2` (0.9791 vs 0.9765). When running an actual detection and reviewing the results on-screen though, it seeemed like types of sealed areas that were less represented in the training data and were narrow were detected significantly worse than was the case with `InceptionResNetV2`. Mainly unpaved roads were often missing. Possbibly this can be solved by adding extra training data of this type, but as the IOU score improvement was very low as well, it doesn't seem to be an actual improvement compared to `InceptionResNetV2`. Some links ^^^^^^^^^^ * A very interesting comparison of the performance of different DNNs: `Benchmark Analysis of Representative Deep Neural Network Architectures `_. Not specific to image segmentation though, rather object classification, but interesting to choose a good backbone DNN for the segmentation. * This is an overview of the accuracy, network size and inference speed cost of the different pretrained neural networks available in keras.applications: `Keras models `_. * A similar overview with some other models: `Classification Models `_ * paperswithcode.com is also an interesting website with an huge overview of accuracy results achieved using AI, also in the domain of computer vision. Some examples: * `Best performing image classifications on the imagenet dataset `_ * `Best performing image segmentations on the PASCAL VOC 2012 dataset `_