Domain Adaptive Semantic Segmentation via Entropy-Ranking and Uncertain Learning-Based Self-Training

2022-08-13 02:05ChengliPengandJiayiMa
IEEE/CAA Journal of Automatica Sinica 2022年8期

Chengli Peng and Jiayi Ma

Dear Editor,

This letter develops two new self-training strategies for domain adaptive semantic segmentation, which formulate self-training into the processes of mining more training samples and reducing influence of the false pseudo-labels. Particularly, a self-training strategy based on entropy-ranking is proposed to mine intra-domain information. Thus, numerous false pseudo-labels can be exploited and rectified, and more pseudo-labels can be involved in training.Meanwhile, another novel self-training strategy is developed to handle the regions that may possess false pseudo-labels. In detail, a specific uncertain loss, that makes the network automatically decide whether the pseudo-labels are true, is proposed to improve the network optimization. Consequently, the influence of false pseudolabels can be reduced. Experimental results prove that, compared with the baseline, the average mIoU performance gain brought by our method can attain 4.3%. Extensive benchmark experiments further highlight the effectiveness of our method against existing state-ofthe-arts.

Learning-based image semantic segmentation requires numerous labeled images. However, annotating pixel-wise image semantic segmentation labels is extremely time-consuming [1]. To this end,several unsupervised methods [2], [3] have been investigated and achieved competitive results compared with supervised methods.Specifically, some recent works [3], [4] use data collected from simulators and game engines or similar real-world scenes with precise pixel-level semantic annotations to train segmentation networks. However, the trained model often suffers significant performance degradation when handling unseen data from another new scene due to the cross-domain difference. To alleviate the gap between different domain data, several methods based on the unsupervised domain adaptation (UDA) technique have been developed. Normally, we call the data with pixel-level semantic annotations as the source domain data and the data from the new scene (e.g., without annotation) as the target domain data. UDA aims to use the source and target domain data to produce a model that has favorable segmentation performance on the target domain. Among the UDA methods, the ones based on self-training [5], [6] can achieve better segmentation performance on the target domain due to the consideration of the intra-domain relation (e.g., important information for improving segmentation accuracy [7], [8]) of the target domain. These methods first train a network to align the distribution shift between the source and target domain data. Then,they generate pseudo-labels of the target domain images from the trained network. Finally, they select pseudo-labels with high confidence as the training samples to implement self-training.Generally, the pseudo-labels with high confidence are referred to easy samples while those with low confidence are referred to hard samples. However, the hard samples inevitably contain numerous true pseudo-labels, leading to waste of the training samples. And the easy samples also contain false pseudo-labels, influencing optimization efficiency of the network. To solve above mentioned problems, this letter investigates two varied self-training strategies for mining more training samples and reducing influence of false pseudo-labels, respectively.

Related Work: Due to the domain otherness, the semantic segmentation model obtained from source domain usually suffer obvious performance decrease when processing images from target domain. The UDA technology can address this problem by aligning the distribution shift between the source and the target domain data.For example, AdvEnt [3] uses a generator to generate the predicted feature maps of the source and target domain data, and uses a discriminator to distinguish which domain are the feature maps from.The distribution shift between the source and the target domain data can be aligned by confusing the discriminator. In [9], a maximum squares loss is proposed to balance the gradient of well-classified target samples, which can prevent the training process being dominated by easy-to-transfer samples in the target domain.

The above UDA methods just consider the distribution shift between the source and target domain data while ignoring the distribution shift between intra-target domain data. To address this issue, self-training has been adopted. For instance, Panet al.[6] used an image-wise entropy-based ranking function to separate the target domain images into easy and hard samples and then used the easy samples as the source domain and the hard samples as the target domain to implement a new round of UDA training, which can be regarded as a round of self-training inside of the target domain.MRNet [10] proposes a memory regularization in vivo to exploit the intra-domain knowledge and regularize the model training, benefiting the initialization of pseudo-labels and reducing the influence of false pseudo-labels. Based on MRNet, RPLUE [11] leverages uncertainty estimation to integrate the memory regularization in vivo, which significantly enhances the network optimization efficiency. However,these methods rarely consider the influence of true hard samples and false easy samples, leading to their limited performance. In this letter,our proposed self-training not only involves more samples for training but also limits the influence of false samples, resulting in more efficient self-training and higher segmentation accuracy.

Self-training with entropy-ranking: Due to that the hard samples contain a large number of false pseudo-labels, they can not be considered as the training samples even they have several true pseudo-labels. But, if the numbers of the false pseudo-labels in the hard samples can be reduced, they will have positive effect for the network optimization and can be considered as the training samples.According to this analysis, we use first round of self-training to refine the false pseudo-labels. How to select and rectify the false pseudo-labels are critical in this stage. Previous works prove that, if a pixel has a high entropy value, it will have high uncertainty and a high probability of getting a false pseudo-label. Considering this property, a pixel-wise entropy-ranking method is developed as a coarse pseudo-label filter to remove false pseudo-labels, namely hard samples. Meanwhile, the rest pixels after excluding the hard samples can be regarded as the pixels with true pseudo-labels, namely easy samples. Although there are still some easy samples whose pseudolabels are false, the ratio of false samples is greatly reduced. By mining the relationship of easy samples, we could get beneficial information to rectify the false pseudo-labels.as the same as Cityscapes, leading to their small domain discrepancy and limited effect of the first round of self-training. On the other hand, the images from Oxford RobotCar have a more similar style compared with the Cityscapes dataset, resulting in weak influence of the second round of self-training. Hence, our proposed method is more suitable for images that possess obvious style differences.

Fig. 1. Pseudo-label, entropy map and mask after entropy-ranking of a target image. In the entropy map, bright color represents high entropy regions and dark color means low entropy regions. In the mask, white regions represent easy samples and black regions represent hard samples.

Fig. 2. The proposed self-training strategy. In the first round of self-training,the network takes original target domain image It , pseudo-label P Lt and mask Mt as the inputs. The network is optimized by the loss Lseg1. After the first round of self-training, the model (i.e., M1) used to update the pseudo-labels can be obtained. We input original target domain image It into M1 to get the updated pseudo-label PLu. In the second round of self-training, the network takes original target domain image It , original pseudo-label PLt and updated pseudo-label PLu as the inputs. The network is optimized by the loss Lseg1.The model (i.e., M2) leveraged to evaluate network performance will be produced after the second round of self-training.

Table 1.The Performance of Network With Different Settings of λr on the GTA5→Cityscapes Task

Fig. 3. Several typical segmentation results of different stages. (a) Original images; (b) Ground truth; (c) Results before the first round of self-training;(d) Results after the first round of self-training; (e) Results after the second round of self-training.

Table 2.Illustration of the Effect of Different Terms in U C. None Means That Neither E nt nor D is is Introduced IntoUC

Table 3.The mIoU on GTA5→Cityscapes. M1, M2, M3, M4, M5, M6, M7 Represent PatchAlign [13], AdvEnt [3], MRKLD [14], MRNet [10], RPLUE [11],AdvEnt + Ours and MRNet + Ours, Respectively. Bold Indicates the Best

Table 4.Quantitative Results on SYNTHIA→Cityscapes. We Present Pre-Class IoU, mIoU and mIoU*. mIoU and mIoU* are Averaged Over 16 and 13 Categories, Respectively. M1, M2, M3, M4, M5, M6 Represent AdvEnt [3], MRNet [10], CBST [5], MRKLD [14], RPLUE [11]and MRNet + Ours, Respectively. Bold Indicates the Best

Fig. 4. Qualitative results of semantic segmentation adaptation on GTA5→Cityscapes task.

Fig. 5. Qualitative results of semantic segmentation adaptation on the SYNTHIA→Cityscapes task.

Table 5.mIoU on the Cityscapes→Oxford RobotCar Task. M1, M2, M3,M4 Represent PatchAlign [13], MRNet [10], PKA [15] and MRNet + Ours, Respectively. Bold Indicates the Best

Conclusions: In this work, we proposed a novel self-training strategy for domain adaptive semantic segmentation. Particularly, we used two rounds of self-training to address the pseudo-label waste and false pseudo-label problems, respectively. We first developed a round of self-training based on entropy-ranking to generate more easy samples. Thus, more pseudo-labels can be involved in the training, avoiding the waste of pseudo-labels. Subsequently, we developed another round of self-training based on uncertainly learning to reduce the influence of misleading information for network optimization. Experimental results proved that our proposed method can increase the performance of baseline methods significantly. Meanwhile, our proposed method can outperform several state-of-the-arts.

Fig. 6. Qualitative results of semantic segmentation adaptation on the Cityscapes→Oxford RobotCar task.

Acknowledgments: This work was supported by the Key Research and Development Program of Hubei Province (2020BAB113), and the Natural Science Fund of Hubei Province (2019CFA037).