3. Finetune your project#

The default configuration of orthoseg is meant to be a good starting point for most projects.

An overview of all parameters that can be finetuned can be found in the Reference documentation section of the documentation.

In this section you can find some more in depth information on some specific parameters that can help you finetune the configuration to your specific project and needs.

3.1. What model architecture should I use?#

For image segmentation, often an encoder-decoder architecture is used, where the encoder is a (pretrained) DNN that extracts features from the input image, and the decoder reconstructs the segmentation map from these features.

Both the encoder and decoder can be implemented using different neural network architectures.

The default configuration uses a UNet decoder with an InceptionResNetV2 encoder as backbone because this gives a good balance of accuracy and inference speed. However, if you don’t have a CUDA GPU available, you might want to switch to a lighter model to reduce the training and inference time. The sample project uses a Linknet with MobileNetV2 backbone, which is significantly faster and uses less memory, but will give less accurate results.

3.1.1. Decoder#

For the decoder, following model architectures are supported:

  • UNet

  • LinkNet

  • FPN

  • PSPNet

Based on practical tests, UNet and LinkNet give the best results. Both in terms of accuracy and inference speed, they are very similar. There are many variants on e.g. UNet as well, but based on the literature found it seems that most give relatively small accuracy improvements for significant increases in complexity, resulting in worse inference/train speed. Hence only the architectures listed above are supported, and the default in orthoseg is the classic UNet.

3.1.2. Encoder#

For the encoder there are even more options. In the table below some models that were considered are listed and compared to InceptionResnetV2:

  • Model: the name of the model

  • Acc@1: the top 1 classification accuracy on imagenet

  • Speed: an indication on the inference speed, normalized on InceptionResnetV2 having speed = 100

  • Support: whether the model is supported in orthoseg

  • Remarks: some remarks on the model, e.g. if practical tests have been conducted on segmentation performance,…

Model

Acc@1

Speed

Support

Remarks

vgg11

69.15

Tested* (=TernausNet): not very accurate

vgg16

70.79

46

Yes

accuracy ~ vgg11

vgg19

70.89

46

Yes

resnet18

68.24

29

resnet34

72.17

32

resnet50

74.81

41

Yes

resnet101

76.58

60

Yes

resnet152

76.66

77

Yes

resnet50v2

69.73

36

Yes

resnet101v2

71.93

53

Yes

accuracy ~ vgg11

resnet152v2

72.29

75

Yes

resnext50

77.36

69

resnext101

78.48

110

slower + worse accuracy

densenet121

74.67

51

Yes

densenet169

75.85

62

Yes

densenet201

77.13

77

Yes

inceptionv3

77.55

71

Yes

xception

78.87

77

inceptionresnetv2

80.03

100

Yes

good accuracy, speed OK

seresnet18

69.41

37

accuracy ~ vgg11

seresnet34

72.60

41

accuracy ~ vgg11

seresnet50

76.44

43

seresnet101

77.92

59

seresnet152

78.34

87

seresnext50

78.74

70

seresnext101

79.88

115

slower + worse accuracy

senet154

81.06

251

a lot slower

nasnetlarge

82.12

213

a lot slower

nasnetmobile

74.04

51

mobilenet

70.36

28

Yes

accuracy ~ vgg11

mobilenetv2

71.63

33

Yes

accuracy ~ vgg11

EfficientNetB0

77.1

49

v2 is documented to be faster and more accurate

EfficientNetB1

79.1

56

EfficientNetB2

80.1

65

EfficientNetB3

81.6

88

EfficientNetB4

82.9

151

EfficientNetB5

83.6

253

EfficientNetB6

84.0

404

EfficientNetB7

84.3

616

EfficientNetV2B0

78.7

EfficientNetV2B1

79.8

EfficientNetV2B2

80.5

EfficientNetV2B3

82.0

EfficientNetV2S

83.9

EfficientNetV2M

85.3

96

Yes

Tested*: similar theoretical accuracy, worse in practice

EfficientNetV2L

85.7

3.1.3. Full segmentation models#

Because orthoseg is about semantic segmentation, in the end it is the full segmentation model that matters. There is less information available on the performance of full segmentation models, so in the following overview only some of the full models that were tested with orthoseg are listed.

3.1.3.1. TernausNet/Vanilla Unet#

  • Both vanilla Unet and TernausNet (=`Unet` with vgg11 backbone) were tested on the detection of greenhouses. The accuracy of both was very similar and was decent, but clearly worse than InceptionResNetV2.

  • Support for these models was removed from orthoseg to avoid having to upgrade them to stay supported with newer keras versions.

3.1.3.2. EfficientNetV2M+Unet#

  • The classification accuracy of just EfficientNetV2M on imagenet is significantly better than InceptionResNetV2 (85.3% vs 80.3% top 1 accuracy), so you would expect the segmentation performance to be better as well.

  • The number of weights is similar, and train/inference speed is reported to be a lot faster than v1 of the first generation EfficientNet family.

  • The train/inference speed for a segmentation was about the same as InceptionResNetV2: training was a few % slower, but inference was a few % faster. Both tested on the CUDA GPU (NVidia Quadro P5000).

  • The IOU score from the training obtained on a “sealed surfaces” detection was slightly higher than InceptionResNetV2 (0.9791 vs 0.9765). When running an actual detection and reviewing the results on-screen though, it seeemed like types of sealed areas that were less represented in the training data and were narrow were detected significantly worse than was the case with InceptionResNetV2. Mainly unpaved roads were often missing. Possbibly this can be solved by adding extra training data of this type, but as the IOU score improvement was very low as well, it doesn’t seem to be an actual improvement compared to InceptionResNetV2.