3. Finetune your project#
The default configuration of orthoseg is meant to be a good starting point for most projects.
An overview of all parameters that can be finetuned can be found in the Reference documentation section of the documentation.
In this section you can find some more in depth information on some specific parameters that can help you finetune the configuration to your specific project and needs.
3.1. What model architecture should I use?#
For image segmentation, often an encoder-decoder architecture is used, where the encoder is a (pretrained) DNN that extracts features from the input image, and the decoder reconstructs the segmentation map from these features.
Both the encoder and decoder can be implemented using different neural network architectures.
The default configuration uses a UNet decoder with an InceptionResNetV2 encoder as backbone because this gives a good balance of accuracy and inference speed. However, if you don’t have a CUDA GPU available, you might want to switch to a lighter model to reduce the training and inference time. The sample project uses a Linknet with MobileNetV2 backbone, which is significantly faster and uses less memory, but will give less accurate results.
3.1.1. Decoder#
For the decoder, following model architectures are supported:
UNet
LinkNet
FPN
PSPNet
Based on practical tests, UNet and LinkNet give the best results. Both in terms of accuracy and inference speed, they are very similar. There are many variants on e.g. UNet as well, but based on the literature found it seems that most give relatively small accuracy improvements for significant increases in complexity, resulting in worse inference/train speed. Hence only the architectures listed above are supported, and the default in orthoseg is the classic UNet.
3.1.2. Encoder#
For the encoder there are even more options. In the table below some models that were considered are listed and compared to InceptionResnetV2:
Model: the name of the model
Acc@1: the top 1 classification accuracy on imagenet
Speed: an indication on the inference speed, normalized on InceptionResnetV2 having speed = 100
Support: whether the model is supported in orthoseg
Remarks: some remarks on the model, e.g. if practical tests have been conducted on segmentation performance,…
Model |
Acc@1 |
Speed |
Support |
Remarks |
|---|---|---|---|---|
vgg11 |
69.15 |
Tested* (=TernausNet): not very accurate |
||
vgg16 |
70.79 |
46 |
Yes |
accuracy ~ vgg11 |
vgg19 |
70.89 |
46 |
Yes |
“ |
resnet18 |
68.24 |
29 |
“ |
|
resnet34 |
72.17 |
32 |
“ |
|
resnet50 |
74.81 |
41 |
Yes |
|
resnet101 |
76.58 |
60 |
Yes |
|
resnet152 |
76.66 |
77 |
Yes |
|
resnet50v2 |
69.73 |
36 |
Yes |
|
resnet101v2 |
71.93 |
53 |
Yes |
accuracy ~ vgg11 |
resnet152v2 |
72.29 |
75 |
Yes |
“ |
resnext50 |
77.36 |
69 |
||
resnext101 |
78.48 |
110 |
slower + worse accuracy |
|
densenet121 |
74.67 |
51 |
Yes |
|
densenet169 |
75.85 |
62 |
Yes |
|
densenet201 |
77.13 |
77 |
Yes |
|
inceptionv3 |
77.55 |
71 |
Yes |
|
xception |
78.87 |
77 |
||
inceptionresnetv2 |
80.03 |
100 |
Yes |
good accuracy, speed OK |
seresnet18 |
69.41 |
37 |
accuracy ~ vgg11 |
|
seresnet34 |
72.60 |
41 |
accuracy ~ vgg11 |
|
seresnet50 |
76.44 |
43 |
||
seresnet101 |
77.92 |
59 |
||
seresnet152 |
78.34 |
87 |
||
seresnext50 |
78.74 |
70 |
||
seresnext101 |
79.88 |
115 |
slower + worse accuracy |
|
senet154 |
81.06 |
251 |
a lot slower |
|
nasnetlarge |
82.12 |
213 |
a lot slower |
|
nasnetmobile |
74.04 |
51 |
||
mobilenet |
70.36 |
28 |
Yes |
accuracy ~ vgg11 |
mobilenetv2 |
71.63 |
33 |
Yes |
accuracy ~ vgg11 |
EfficientNetB0 |
77.1 |
49 |
v2 is documented to be faster and more accurate |
|
EfficientNetB1 |
79.1 |
56 |
“ |
|
EfficientNetB2 |
80.1 |
65 |
“ |
|
EfficientNetB3 |
81.6 |
88 |
“ |
|
EfficientNetB4 |
82.9 |
151 |
“ |
|
EfficientNetB5 |
83.6 |
253 |
“ |
|
EfficientNetB6 |
84.0 |
404 |
“ |
|
EfficientNetB7 |
84.3 |
616 |
“ |
|
EfficientNetV2B0 |
78.7 |
|||
EfficientNetV2B1 |
79.8 |
|||
EfficientNetV2B2 |
80.5 |
|||
EfficientNetV2B3 |
82.0 |
|||
EfficientNetV2S |
83.9 |
|||
EfficientNetV2M |
85.3 |
96 |
Yes |
Tested*: similar theoretical accuracy, worse in practice |
EfficientNetV2L |
85.7 |
3.1.3. Full segmentation models#
Because orthoseg is about semantic segmentation, in the end it is the full segmentation model that matters. There is less information available on the performance of full segmentation models, so in the following overview only some of the full models that were tested with orthoseg are listed.
3.1.3.1. TernausNet/Vanilla Unet#
Both vanilla Unet and TernausNet (=`Unet` with vgg11 backbone) were tested on the detection of greenhouses. The accuracy of both was very similar and was decent, but clearly worse than InceptionResNetV2.
Support for these models was removed from orthoseg to avoid having to upgrade them to stay supported with newer keras versions.
3.1.3.2. EfficientNetV2M+Unet#
The classification accuracy of just EfficientNetV2M on imagenet is significantly better than InceptionResNetV2 (85.3% vs 80.3% top 1 accuracy), so you would expect the segmentation performance to be better as well.
The number of weights is similar, and train/inference speed is reported to be a lot faster than v1 of the first generation EfficientNet family.
The train/inference speed for a segmentation was about the same as InceptionResNetV2: training was a few % slower, but inference was a few % faster. Both tested on the CUDA GPU (NVidia Quadro P5000).
The IOU score from the training obtained on a “sealed surfaces” detection was slightly higher than InceptionResNetV2 (0.9791 vs 0.9765). When running an actual detection and reviewing the results on-screen though, it seeemed like types of sealed areas that were less represented in the training data and were narrow were detected significantly worse than was the case with InceptionResNetV2. Mainly unpaved roads were often missing. Possbibly this can be solved by adding extra training data of this type, but as the IOU score improvement was very low as well, it doesn’t seem to be an actual improvement compared to InceptionResNetV2.
3.1.4. Some links#
A very interesting comparison of the performance of different DNNs: Benchmark Analysis of Representative Deep Neural Network Architectures. Not specific to image segmentation though, rather object classification, but interesting to choose a good backbone DNN for the segmentation.
This is an overview of the accuracy, network size and inference speed cost of the different pretrained neural networks available in keras.applications: Keras models.
A similar overview with some other models: Classification Models
paperswithcode.com is also an interesting website with an huge overview of accuracy results achieved using AI, also in the domain of computer vision. Some examples: