A color image segmentation method based on deep learning and level set

2021-07-15 09:45,

,

(1.School of Mathematics, Sichuan University, Chengdu 610064, China; 2.West China Second University Hospital, Sichuan University, Chengdu 610041, China; 3.Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu 610041, China)

Abstract: In this study, we combine the semantic segmentation technology based on full convolution neural network and the image segmentation technology based on level set method and uses Deeplab V2 and Distance Regularized Level Set Evolution (DRLSE) model to realize general color image segmentation.To improve the evolution speed of the zero level set segmentation, a new shape energy term is added to the DRLSE model. Numerical simulations verify the efficiency of our method.

Keywords: Deep Learning; Level set method; Image segmentation; DRLSE; DeepLab

1 Introduction

Since the 1970s, the problem of image segmentation has attracted the attention of many scholars. The level set method has been developed in recent 20 years, and has achieved good results in the field of image segmentation. The idea is to represent the two-dimensional plane contour as the zero level set of the three-dimensional implicit function, which is called the Level Set Function (LSF), and evolves the LSF according to the Partial Differential Equation (PDE). In recent years, many segmentation models based on level set have been presented.

Recently, deep learning and Convolutional Neural Networks (CNN) have made a lot of breakthroughs in image segmentation. In these methods, CNN is trained by lots of data to effectively extract object features and convert the image to be segmented into simple feature images for pixel level recognition and classification, so as to realize the corresponding mapping relationship between the image to be segmented and the desired segmentation result. This method has strong learning ability, which can deal with complex natural image segmentation problems and multi-objective segmentation problems. This method has robustness and can realize image semantic understanding and segmentation. But the quantity and quality of data required by such methods are positively related to the difficulty of the segmentation task, which costs much more time compared with the usual algorithms, and it usually needs tens of thousands of data to segment the image well.

1.1 Overview on level set method

The Active Contour Model (ACM)[1]proposed by Kass, etc. creatively in the study of image segmentation methods lays the foundation for the level set method. The level set method was first proposed by Osheretal.[2]in 1988 to study the outer contour of fire. The idea is to express the two-dimensional closed curve as the zero level set of three-dimensional implicit function, and the evolution of the zero level set is used to approximate the edge contour of the target. The concept of reinitialization is given by Chop[3]. Reinitialization means that only updating the values in the vicinity of the zero level set during the evolution of the zero level set,and it will reduce the amount of calculation. However, if the operation is repeated many times, the amount of calculation will increase. The classical segmentation model Mumford Shah (M-S) model was proposed by Mumfordetal.[4], which used the minimization of the energy functional to drive the evolution of zero level set. Then Chanetal.[5]proposed the C-V model to simplify the energy functional of M-S model. The model can be used to segment the image with blurred contour, since it does not rely on gradient correlation information, and has certain robustness for noise. All the models mentioned above need to be reinitialized in the process of numerical computations, so there are two difficulties. One is whether the regularized LSF after reinitialization can ensure that the zero level set approaches the target edge, and the other is how to select the appropriate time for reinitialization in the actual calculation process. Lietal.[6]proposed a method to solve the above two problems, namely DRLSE method. By introducing the regularized term, LSF is guaranteed to be a signed distance function and the reinitialization step is avoided.

Generally speaking, active contour models can be divided into two categories: edge based segmentation model and region based segmentation model. Let’s take a look at several edge based segmentation models. Lietal.[7]put forward the double well potential function and updated the DRLSE model. However, the model still has some shortcomings such as unidirectional evolution and reduction of convergence rate. Wangetal.[8]proposed an Adaptive Level Set Evolution (ALSE) model. Compared with DRLSE model, this model reduces the dependence on the initial contour set and accelerates the evolution of zero level set, but there are still some shortcomings. Zhangetal.[9]proposed a level set evolution model driven by the optimal area energy term. This model has the characteristics of high efficiency and stability. It is robust for noise, but it is not adaptive and sensitive to initialization. Here are some region based segmentation models. Chan and Vese[5]put forward a Piecewise Constant (PC) model, which can segment images with blurred edges, but it is not appropriate for images with uneven intensity. Lietal.[10]proposed a Region Scalable Fitting (RSF) model, which can segment non-uniform images, but has a large amount of computation and is sensitive to the initial contour. Zhangetal.[11]proposed Local Image Fitting (LIF) model. Compared with RSF model, LIF converges more fast, but the robustness of initialization needs to be improved. Wangetal.[12]proposed a Local Gaussian Distribution Fitting (LGDF) model, which uses LGDF energy to calculate the variance statistics of images, and can segment regions with similar strength but different variances. However, LGDF model takes much time to segment. Dingetal.[13]added a logarithmic energy term to the RSF model and proposed Laplacian optimization of the region-scalable fitting (RSFLog) model. Compared with RSF model, RSFLog segmentation model has higher segmentation speed, accuracy and robustness. Fast segmentation and good robustness will be achieved by introducing the local pre-fitting (LPF) energy into variational level set method in Ref.[14]. Xiaetal.[15]improved CV model, proposed a new Smooth convergence function based on the region confidence.

1.2 Overview on image segmentation based on deep learning

In recent years, deep learning technology and neural network have achieved many important research results in the field of image processing. CNN[16]is one of the branches of artificial neural network. After a certain degree of data training, it can effectively extract image information and realize recognition and classification at pixel level, which is often used in image processing related issues. In 2015, Longetal.[17]proposed Full Convolutional Network (FCN) and realized semantic segmentation of images. The input image of FCN is not limited by the size of the input image. The convolution operation is used to predict the image category at the pixel level. The size of output result is consistent with the size of the input image. It is the first time to use FCN to solve the problem of image segmentation, but there are also some problems such as the segmentation effect is not accurate enough and the amount of calculation is too large.

Chenetal.[18]proposed the network model DeepLab V1 in 2015. DeepLab V1 innovatively applied hole convolution to vgg-16 network, which expanded the receptive field and improved the segmentation accuracy of the model. In 2016, Chenetal.[19]proposed DeepLab V2 to extract global semantic information by atrous spatial pyramid pooling (ASPP) to enhance the ability of multi-scale object recognition. Although the accuracy has been improved, the amount of calculation has become very large because of the addition of ASPP module. Then, Chenetal.[20]proposed DeepLab V3, adding convolution layer and global average pooling in ASPP module. By combining serial connection hole convolution with ASPP module, more abundant global semantic information can be obtained.

In Section 2, we review the level set method and DRLSE model. The method based on deep learning and DRLSE model is presented in Section 3. Numerical experiments are shown in Section 4. The conclusion is presented in Section 5 with some remarks.

2 Basic knowledge

2.1 Level Set Method

We will briefly introduce level set method in this section. Suppose the curve C is expressed as

C(t)={(x,y),φ(x,y,t)=c}

(1)

The evolution of level set function is governed by

(2)

whereF(p,t) is the speed function that controls the motion of the contour. The vertical view of level set function is shown in Fig.1. We define a measure of distance and usedc(x,y) represents the shortest distance from a point (x,y) to the initial curve C, then LSF can be initialized as a general Signed Distance Function (SDF),

(3)

where (x,y) represents a pixel in the image. During the process of iterative evolution, some numerical values may have very serious deviation in the calculation process. The reason is that the level set function will gradually become irregular near the zero level set in the process of evolution, which will lead to the gradual accumulation of errors. In order to prevent such errors, reinitialization is introduced. To a certain extent, the level set maintains stability and regularity in the evolution process, and reduces the risk of error occurrence. However, there are still some defects in reinitialization, for example, it greatly increases the computational complexity in the evolution process of LSF, it is difficult to choose the appropriate time to reinitialize, and it can not guarantee that the regular LSF after reinitialization can drive the zero level set function to the edge of the target.

Fig.1 Vertical view of level set function

2.2 The DRLSE model

In order to solve the problem of reinitialization in the iterative process, Li, etc.[6]proposed the DRLSE model which was is used. Letφ:Ω→Rbe a level set function on the domain Ω. The energy functional form of DRLSE model is as follows

E(φ)=μR(φ)+λL(φ)+αA(φ)

(4)

whereR(φ) represents the distance regular term, which is used to ensure the smoothness of LSF in the evolution process and avoid reinitializing LSF during the evolution process;L(φ) represents the length term, which is used to drive the zero level set to evolve towards the edge of the target;A(φ) represents the area term. By minimizing the area term, the level set can be contracted (α>0) or expanded (α<0) to guide the level set to evolve towards the target step by step. Whileμ,λ,αcorrespond to the distance regular term, the coefficient of length term and area term respectively. A potential functionP(s) is proposed, which is defined as follows

(5)

which has a minimum value at s = 1. Thus, the distance regular termR(φ) can be defined as follows

(6)

which can guarantee that the minimum value of distance regular termR(φ) can be obtained whens=1. At the same time, it also ensures that SDF has the characteristics of “|∇φ|=1” whenP(s) has a minimum value ats= 1. The edge indicator function g is introduced by

(7)

whereCσis a Gaussian filter function with a standard deviation ofσ. The convolution in (7) is used to smooth the image to reduce the noise. In general, the minimum value of g is obtained when LSF evolves to the boundary of the target. The length termL(φ) and area termA(φ) can be defined by g as follows,

(8)

(9)

whereδis dirac function,His Heaviside function.

In conclusion,L(φ) can calculate the curve integral ofφalong the zero level set, and the minimum value is obtained when the zero level set is located at the edge of the target to be segmented; as forA(φ), it can accelerate the evolution of LSF. When the initial contourφinitialis outside the target contour, a positive value ofαwill cause theφinitialshrinks rapidly inward; whenφinitialis within the target contour, a negativeαvalue can make theφinitialexpands rapidly. Lietal.[7]also proposed the improvement of the potential functionP(s). They used double well potential functionP2(s) which is defined as follows,

(10)

This potentialfunction can ensure that the SDF characteristic is maintained near the zero level set,i.e.|∇φ|=1. In the region far away from the zero level set, LSF is a constant and remains unchanged,i.e.|∇φ|=0. In order to have the above properties, the potential function must have two minimum pointss=0 ands=1 (markP2(s) asP(s) in the following).By (6) (8) and (9),we have the energy functional as follows,

(11)

The corresponding gradient flow of the energyE(φ) is

(12)

2.3 Semantic segmentation based on deep learning

For the current semantic segmentation with deep learning, it originated in 2015. Longetal.[17]proposed the FCN. FCN is an evolution of convolutional neural network, which is often used in the prediction and classification of image processing, because the output results of CNN are mostly probability. FCN can classify each pixel of the input image, so as to realize the pixel level image segmentation, namely semantic segmentation. Different from CNN using full connection layer after convolution layer, FCN adopts deconvolution layer. The purpose of FCN is to upsampling the feature image generated by the convolution layer at the end, so that the convoluted image can be restored to the size of the previous input image. Thus, the output result not only retains the space information contained in the input image, but also can predict and classify each pixel of the original input image.

The neural network used in this paper is DeepLab V2 developed from FCN. DeepLab V2 has three characteristics: one is the use of hole convolution, which changes the operation mode of traditional convolution operation, expands the receptive field and maintains the resolution on the basis of unchanged input data; the second is to propose Atrous Spatial Pyramid Pooling (ASPP) for multi-scale processing of input information; the third is to combine fully connected Conditional Random Field (CRF) to process the input information The convoluted image is restored accurately. As shown in Fig.2, the pixel level image obtained by convolution of the input original image has a good contour after CRF recovery.

Fig.2 Functions of DeepLab

3 Preprocessing based on deep learning and improved DRLSE model

Compared with the traditional DRLSE model segmentation method, this paper has two main improvements: (i) The neural network based semantic segmentation is used for image preprocessing. (ii) The image information obtained by semantic segmentation is transformed into an item of energy functional, which is added to the iterative process of DRLSE model.

3.1 Preprocess

The DRLSE model is usually segmented by the gray image of the input image. However, if the gray image of the original image is directly segmented, the result is often very inaccurate, or even very bad, as shown in Fig.3 below.

(a) originalimage (b) grayscale image (c) segmentation result

Our goal is to segment the three ducks in (a) of Fig.3. It can be seen that (a) is a picture with less color types, and the color (white) of the target to be segmented is clearly different from the background color (green). Therefore, in the gray image of (a), that is, the gray level of the target in (b) is also significantly different from that of the background. However, after segmentation by DRLSE model, the effect is shown in (c), which can only roughly describe the area where the target is located, and can not accurately drive to the edge of the target. Therefore, the Otsu threshold algorithm, which is often used for image preprocessing in DRLSE model, will be introduced below.

Otsu threshold algorithm is an adaptive threshold algorithm proposed by Japanese scholar Otsu[21]. It divides the original image into two parts, the target and the background, according to the gray characteristics of the image itself. The measure standard of Otsu threshold algorithm is the maximum inter class variance, that is, if the variance between the target and the background is larger, the difference between the target and the background is greater, and the probability that the target and the background are wrongly divided into the same category is smaller. in Fig.4, (a) is transformed into (b) after Otsu threshold algorithm, and (b) is segmented by level set method of DRLSE model to obtain (c). Compared with (d), (c) has a great improvement on the approximation of the target contour, and objectively it has been very close to the target edge, which is a good segmentation. However, when there are many kinds of colors between the target and the background in the input image or the difference is not obvious enough, the preprocessing effect through Otsu threshold algorithm is not evident, as shown in Fig.5.

(a) original image (b) Otsu

(c) segmentation result (d)Fig.3 result

in Fig.5, our target is the bird in (a). After Otsu threshold algorithm, (a) becomes (b). From (b), we can directly see that the image becomes very bad after preprocessing. Therefore, (d) as the result of segmentation is intuitively imprecise and deviates from the target seriously, which is a bad segmentation. Therefore, the DRLSE model pretreated by Otsu threshold algorithm is not suitable for image segmentation of color rich images, which is one of the reasons why DRLSE is commonly used in medical CT image segmentation.

In order to improve this dilemma, we use the new semantic segmentation based on neural network to preprocess the general color image in this paper. In theory, semantic segmentation is the classification of image pixels. Pixels belonging to the same category will be classified into one category. Therefore, the difference between the target and the background after the semantic segmentation preprocessing will be a very significant in color labeling, which is more conducive to the DRLSE model for target contour approximation after the gray image.

(a) original image (b) Otsu

(c) initial contour (d) segmentation result

(a) original image (b) DeepLab (c) grayscale image

(d) segmentation result (e) ground truth

in Fig.6, (a) is obtained after semantic segmentation based on neural network. It is observed that (b) is better than the result using Otsu threshold algorithm for the case withhuge difference between target and background. The figure (c) is the gray image transformed from (b), and (d) is the segmentation result of DRLSE model, and (e) is the ground truth of the original image. It can be seen that the zero level set which finally stops evolving is very close to the contour of the target.

In Section 4, more examples are presented to prove the superiority of semantic segmentation in image preprocessing compared with Otsu algorithm.

3.2 Improvement of DRLSE model

For an image I, when using DRLSE model to segment image I, it is usually necessary to manually select an initial contourφinitial.
Fig.8 shows the contour of edge indicator function g detectedφgand the actual contourφgroundtruth(hereinafter referred to asφGT), you can find theφgandφGT, in general, has a similar outline. Generally speaking, in the evolution process of the level set, when the initial contourφinitialis selected, the contour represented by zero level set converges toφfinalwith the indication of edge indicator function g and iterative update of zero level set. The process is shown in Fig.8, which shows the expansion of the initial contourφinitialselected inside the target in the process of level set iteration, and finally gradually expand to the edge of the target to complete the segmentation.

(a) φg (b) φGT

Recall the semantic segmentation mentioned in Section 3.1, after preprocessing, the color difference between the target and the background is very significant, which is shown in Fig.9. In this case, we can obtain a contour which is similar to the final contour directly through the color difference between the target and the background. Because DeepLab is used for preprocessing, this contour is defined as a prior contourφDPL.
Fig.10 is a comparison of the prior contourφDPLwith the contourφgdetected by the edge indicator function and the actual contourφGTof the target. Compared withφDPL, we can find thatφgcontains some redundant image information, which is often caused by the error in the preprocessing with DeepLab. However, with the iterative update of the zero level set, the redundant information will be gradually filtered out. The outline ofφgandφDPLis similar, butφDPLcontains much less redundant information thanφg.

(a) initial contour (b) 100 iterations

(c) 200 iterations (d) segmentation result

Therefore, it is unreasonable to direct the initial contourφinitialtoφDPL. However, we can useφDPLto guideφinitialto evolve rapidly at the beginning of the iterative evolution process fromφinitialtoφGT, so that it can quickly form a contour similar toφGT, and then through the previous iterative update, it can better approximate the truth contourφGT. Therefore, according to the previous analysis, our model is more effective in two cases. One is to segment the image with simple contour, in whichφGTandφDPLhave similar shape; the other isφgcontains redundant information whileφDPLcon tains less or no redundant information.

(a) original image (b) DeepLab

(c) original image (d) DeepLab

Recall the energy functional of DRLSE model mentioned in (11), we will add a new energy termφDPLto formulation (11) in order to make it have a great contribution to the evolution in the early stage of the evolution of zero level set, that is, it can quickly approach the general outline. According to Ref.[22], we can define it as:

(13)

which represents the shape term that drivesφinitialtowards the shapeφDPL, where ω decreases gradually to 0 as iterations increases. This definition is consistent with the purpose ofφDPLto help zero level set approach the contour in the early stage, but does not affect the evolution of zero level set in the later stage. Therefore we have

(14)

(a) original image (b) DeepLab (c) φg

(d) φDPL (e) φGT

Fig.11 shows the difference between the traditional DRLSE model and the improved DRLSE model in image segmentation after semantic segmentation preprocessing. It can be seen that the contour evolution of the improved model is better than that of the traditional model in redundant information processing. It can be seen from (i) and (j) in Fig.11 that the contour of the improved model converges to the final contour after 400 iterations, while the contour of the traditional model is still evolving and converges only after 500 iterations.

(a) initial contour (b) 200 iterations (c) 300 iterations

(d) 400 iterations (e) 500 iterations (f) initial contour

(g) 200 iterations (h) 300 iterations (i) 400 iterations

(j) 500 iterations

4 Numerical example

4.1 Data set

In this example, the DeepLab V2 model is trained using Pascal voc2012 data set, a total of 17 125 color images are trained. Pascal voc2012 data set mainly provides label data for supervised learning in visual tasks. In semantic segmentation, it is often the preferred data set. It has 20 categories, including some common animals, such as birds, cats, horses, transportation vehicles, furniture supplies and people.

4.2 Parameter setting

The parameters set in this section include DeepLab V2 and two parts of DRLSE model, as shown in the Tab.1 and Tab.2. The software for training and using DeepLab V2 is Python 3.7, and the software using DRLSE model is Matlab 2016a.

Tab.1 Parameter setting of DeepLab V2

Tab.2 Parameter setting of DRLSE

4.3 Measurement criteria

In this paper, we use the Dice coefficient to measure image segmentation, which is defined as follows,

(15)

whereArepresents the set corresponding to the final contourφfinalof the model evolution of the target, andBrepresents the set corresponding to the actual real contourφGTof the target.

4.4 Experimental process

In this experiment, (a) and (c) in Fig.12 are selected and their sizes are 500×332 and 240×320 pixels, respectively. (a) is the image mentioned in Section3.2 with simple target contour, (c) is the image with more redundant information after DeepLab preprocessing mentioned in Section 3.2.
Fig.12 shows the selected image and the corresponding ground truthφGT.

The traditional DRLSE model is used for image segmentation.
Fig.13 shows the results preprocessed by Otsu threshold algorithm and the image after DRLSE model segmentation. The final segmentation of the contourφfinalandφGThas a huge gap that can be seen by the naked eye. It can be seen that segmentation by traditional “Otsu threshold algorithm + DRLSE model” will lead to extremely bad results. “DeepLab + DRLSE model” is used in our method.
Fig.14 shows the results of preprocess. It can be seen that the image preprocessed by DeepLab retains the information of the target contour well. The DRLSE model and the improved DRLSE model are used for image segmentation.

(a) original image (b) φGT

(c) original image (d) φGT

(a) original image (b) Otsu

(c) initial contour (d) segmentation result

(e) originalimage (f) Otsu

(g) initial contour (h) segmentation result

(a) original image (b) DeepLab (c) φg (d) φDPL

(e) original image (f) DeepLab (g) φg (h) φDPL

(a) initial contour (b) 200 iterations (c) 500 iterations

(d) 800 iterations (e) 1 050iterations (f) initial contour

(g) 200 iterations (h) 500 iterations (i) 800 iterations

(j) 1 050 iterations

From Fig.15, it is observed that the improved model converges faster than the traditional DRLSE model under the same iteration times. It can be seen from (i) and (j) in Fig.15 that the segmentation of the model proposed in this paper has been completed after 800 iterations, while (e) shows that the traditional model can converge to the final contour after 1 050 iterations. (b) (c) (g) and (h) of Fig.16 show that the model proposed in this paper has a faster convergence rate at the same iteration number. It can be seen from (d) and (i) in Fig.16 that both models have rough outlines after 300 iterations, but there is still redundant information waiting to be processed (red part of the head). It can be seen from (e) and (j) in Fig.16 that the improved model completes the processing of redundant information and converges to the final contour after 650 iterations, while the traditional model completes segmentation after 750 iterations. Therefore, compared with the traditional DRLSE model, the convergence speed of the proposed model is improved, and the convergence accuracy will be compared in Section 4.5.

(a) initial contour (b) 100 iterations (c) 200 iterations

(d) 300 iterations (e) 750iterations (f) initial contour

(g) 100 iterations (h) 200 iterations (i) 300 iterations

(j) 650 iterations

4.5 Comparison of methods

Tab.3 shows the DC coefficients of pictures (a) and (c) in Fig.12 respectively, which are: (i) DC coefficients obtained by using the traditional DRLSE model after preprocessing by Otsu algorithm; (ii) DC coefficients of segmentation results by traditional DRLSE model after being processed by DeepLab V2; (iii) DC coefficients of segmentation results with the improved DRLSE model after being processed by DeepLab V2.

Tab.3 Comparison

There is almost no difference in the segmentation accuracy between the two models. The accuracy of the model proposed in this paper is slightly different in the order of the thousandth, but in terms of convergence speed, the improved model proposed in this paper has more advantages than the traditional model.

5 Conclusions

Compared with the traditional DRLSE model, this paper uses neural network to segment general color images, and merges the preprocessing information into a new energy item, which is added to the DRLSE model to segment the color image and obtain the contour. The segmentation accuracy is well retained, the number of iterations of zero level set is reduced, and the speed of segmentation is accelerated.

In the future, we will generate our method to the problem with several objects for semantic segmentation, and perform semantic segmentation for more kinds of color images. We will optimize the setting of parameter ω in this paper and reduce the artificial adjustment.