3. Finetune your project#

The default configuration of orthoseg is meant to be a good starting point for most projects.

An overview of all parameters that can be finetuned can be found in the Reference documentation section of the documentation.

In this section you can find some more in depth information on some specific parameters that can help you finetune the configuration to your specific project and needs.

3.1. What model architecture should I use?#

For image segmentation, often an encoder-decoder architecture is used, where the encoder is a (pretrained) DNN that extracts features from the input image, and the decoder reconstructs the segmentation map from these features.

Both the encoder and decoder can be implemented using different neural network architectures.

The default configuration uses a UNet decoder with an InceptionResNetV2 encoder as backbone because this gives a good balance of accuracy and inference speed. However, if you don’t have a CUDA GPU available, you might want to switch to a lighter model to reduce the training and inference time. The sample project uses a Linknet with MobileNetV2 backbone, which is significantly faster and uses less memory, but will give less accurate results.

3.1.1. Decoder#

For the decoder, following model architectures are supported:

UNet
LinkNet
FPN
PSPNet

Based on practical tests, UNet and LinkNet give the best results. Both in terms of accuracy and inference speed, they are very similar. There are many variants on e.g. UNet as well, but based on the literature found it seems that most give relatively small accuracy improvements for significant increases in complexity, resulting in worse inference/train speed. Hence only the architectures listed above are supported, and the default in orthoseg is the classic UNet.

3.1.2. Encoder#

For the encoder there are even more options. In the table below some models that were considered are listed and compared to InceptionResnetV2:

Model: the name of the model
Acc@1: the top 1 classification accuracy on imagenet
Speed: an indication on the inference speed, normalized on InceptionResnetV2 having speed = 100
Support: whether the model is supported in orthoseg
Remarks: some remarks on the model, e.g. if practical tests have been conducted on segmentation performance,…

Model	Acc@1	Speed	Support	Remarks
vgg11	69.15			Tested* (=TernausNet): not very accurate
vgg16	70.79	46	Yes	accuracy ~ vgg11
vgg19	70.89	46	Yes	“
resnet18	68.24	29		“
resnet34	72.17	32		“
resnet50	74.81	41	Yes
resnet101	76.58	60	Yes
resnet152	76.66	77	Yes
resnet50v2	69.73	36	Yes
resnet101v2	71.93	53	Yes	accuracy ~ vgg11
resnet152v2	72.29	75	Yes	“
resnext50	77.36	69
resnext101	78.48	110		slower + worse accuracy
densenet121	74.67	51	Yes
densenet169	75.85	62	Yes
densenet201	77.13	77	Yes
inceptionv3	77.55	71	Yes
xception	78.87	77
inceptionresnetv2	80.03	100	Yes	good accuracy, speed OK
seresnet18	69.41	37		accuracy ~ vgg11
seresnet34	72.60	41		accuracy ~ vgg11
seresnet50	76.44	43
seresnet101	77.92	59
seresnet152	78.34	87
seresnext50	78.74	70
seresnext101	79.88	115		slower + worse accuracy
senet154	81.06	251		a lot slower
nasnetlarge	82.12	213		a lot slower
nasnetmobile	74.04	51
mobilenet	70.36	28	Yes	accuracy ~ vgg11
mobilenetv2	71.63	33	Yes	accuracy ~ vgg11
EfficientNetB0	77.1	49		v2 is documented to be faster and more accurate
EfficientNetB1	79.1	56		“
EfficientNetB2	80.1	65		“
EfficientNetB3	81.6	88		“
EfficientNetB4	82.9	151		“
EfficientNetB5	83.6	253		“
EfficientNetB6	84.0	404		“
EfficientNetB7	84.3	616		“
EfficientNetV2B0	78.7
EfficientNetV2B1	79.8
EfficientNetV2B2	80.5
EfficientNetV2B3	82.0
EfficientNetV2S	83.9
EfficientNetV2M	85.3	96	Yes	Tested*: similar theoretical accuracy, worse in practice
EfficientNetV2L	85.7

3.1.3. Full segmentation models#

Because orthoseg is about semantic segmentation, in the end it is the full segmentation model that matters. There is less information available on the performance of full segmentation models, so in the following overview only some of the full models that were tested with orthoseg are listed.

3.1.3.1. TernausNet/Vanilla Unet#

Both vanilla Unet and TernausNet (=`Unet` with vgg11 backbone) were tested on the detection of greenhouses. The accuracy of both was very similar and was decent, but clearly worse than InceptionResNetV2.
Support for these models was removed from orthoseg to avoid having to upgrade them to stay supported with newer keras versions.

3.1.3.2. EfficientNetV2M+Unet#

The classification accuracy of just EfficientNetV2M on imagenet is significantly better than InceptionResNetV2 (85.3% vs 80.3% top 1 accuracy), so you would expect the segmentation performance to be better as well.
The number of weights is similar, and train/inference speed is reported to be a lot faster than v1 of the first generation EfficientNet family.
The train/inference speed for a segmentation was about the same as InceptionResNetV2: training was a few % slower, but inference was a few % faster. Both tested on the CUDA GPU (NVidia Quadro P5000).
The IOU score from the training obtained on a “sealed surfaces” detection was slightly higher than InceptionResNetV2 (0.9791 vs 0.9765). When running an actual detection and reviewing the results on-screen though, it seeemed like types of sealed areas that were less represented in the training data and were narrow were detected significantly worse than was the case with InceptionResNetV2. Mainly unpaved roads were often missing. Possbibly this can be solved by adding extra training data of this type, but as the IOU score improvement was very low as well, it doesn’t seem to be an actual improvement compared to InceptionResNetV2.

3.1.4. Some links#

A very interesting comparison of the performance of different DNNs: Benchmark Analysis of Representative Deep Neural Network Architectures. Not specific to image segmentation though, rather object classification, but interesting to choose a good backbone DNN for the segmentation.
This is an overview of the accuracy, network size and inference speed cost of the different pretrained neural networks available in keras.applications: Keras models.
A similar overview with some other models: Classification Models
paperswithcode.com is also an interesting website with an huge overview of accuracy results achieved using AI, also in the domain of computer vision. Some examples:
- Best performing image classifications on the imagenet dataset
- Best performing image segmentations on the PASCAL VOC 2012 dataset