An expert system for predicting shear stress distribution in circular open channels using gene expression programming

2018-08-17 09:51ZohrehSheikhKhozaniHosseinBonakdariIsaEbtehaj
Water Science and Engineering 2018年2期

Zohreh Sheikh Khozani,Hossein Bonakdari*,Isa Ebtehaj

Department of Civil Engineering,Razi University,Kermanshah 67131,Iran

Abstract The shear stress distribution in circular channels was modeled in this study using gene expression programming(GEP).173 sets of reliable data were collected under four flow conditions for use in the training and testing stages.The effect of input variables on GEP modeling was studied and 15 different GEP models with individual,binary,ternary,and quaternary input combinations were investigated.The sensitivity analysis results demonstrate that dimensionless parameter y/P,where y is the transverse coordinate,and P is the wetted perimeter,is the most in fluential parameter with regard to the shear stress distribution in circular channels.GEP model 10,with the parameter y/P and Reynolds number(Re)as inputs,outperformed the other GEP models,with a coefficient of determination of 0.7814 for the testing data set.An equation was derived from the best GEP model and its results were compared with an artificial neural network(ANN)model and an equation based on the Shannon entropy proposed by other researchers.The GEP model,with an average RMSE of 0.0301,exhibits superior performance over the Shannon entropy-based equation,with an average RMSE of 0.1049,and the ANN model,with an average RMSE of 0.2815 for all flow depths.©2018 Hohai University.Production and hosting by Elsevier B.V.This is an open access article under the CC BY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords:Circular channel;Gene expression programming(GEP);Sensitivity analysis;Shear stress distribution;Soft computing

1.Introduction

Predicting boundary shear stress is an important part of stable channel section design as well as estimation of the sediment transport rate and channel erosion and deposition.Researchers have employed various methods to study the measurement of boundary shear stress in channels with different cross-sections and both smooth and rough boundaries(Knight,1981;Tominaga et al.,1989;Rhodes and Knight,1994;Knight and Sterling,2000;Seckin et al.,2006;Blanckaertetal.,2008;Pechlivanidis et al.,2015).Berlamont et al.(2003)used computational fluid dynamics(CFD)to estimate the shear stress distribution in circular channels.The ability of the software package to estimate the shear stress in rectangular channels was verified,after which CFD was employed to simulate the shear stress distribution with and without sediment.Yang et al.(2013)established a depth-averaged equation for flow in a rectangular compound channel with secondary flows by analyzing the forces acting on the elemental water body using Newton's second law.Then,they modeled the depth-averaged velocity and bed shear stress in rectangular compound channels.Sheikh Khozani and Bonakdari(2016)used Tsallis entropy to estimate the shear stress distribution in compound channels and compared their model with four different analytical models.Some researchers have used the Shannon entropy concept to estimate the shear stress distribution in circular channels(Sterling and Knight,2002;Sheikh Khozaniand Bonakdari,2015,2017).Bonakdari et al.(2015)employed the Tsallis entropy concept to predict the shear stress distribution in circular and trapezoidal channels and achieved acceptable precision.

Utilizing soft computing(SC)methods to solve different problemsisanotherprogressing concept.SC methods,including artificial neural networks(ANN),particle swarm optimization(PSO),adaptive neuro-fuzzy inference systems(ANFIS),the group method of data handling(GMDH),genetic algorithms(GA),and genetic programming(GP),are used in diverse engineering fields(Melin et al.,2013;Kaydani et al.,2014;Najafzadeh et al.,2014;Najafzadeh and Azamathulla,2015;Sheikh Khozani et al.,2016a,2017).One of the newest and most powerful SC methods is gene expression programming(GEP),which is an extension of GP and GA(Azamathulla and Ab Ghani,2010;Sattar and Gharabaghi,2015),and was first introduced by Ferreira(2001).The GEP method mitigates the majority of problems of principal SC methods related to the absence of equations for practical engineering by presenting explicit equations.In GEP,there is no predefined function to be considered in modeling of nonlinear problems,which is the main advantage of this method over other SC methods.The formed functions are generated randomly in the GEP model and the one that best fits the results is selected.

Sheikh Khozani et al.(2016b)estimated the percentage of shear force carried by walls in rectangular channels with rough boundaries using two SC methods.Kisi et al.(2012)predicted the discharge capacity of triangular labyrinth side weirs using generalized regression neural networks(GRNN)and GEP.Shiri et al.(2014)evaluated the function of GEP-based models in estimating reference evapotranspiration according to temporal and spatial criteria and dataset scanning procedures for coastal environments of Iran.Due to the poor performance of existing equations in estimating the discharge coefficient of side weirs,Ebtehaj et al.(2015)used GEP to predict this parameter.They presented a functional equation to predict the discharge coefficient of side weirs and indicated that GEP exhibits superior performance to existing equations.Najafzadeh et al.(2016)employed GEP,evolutionary polynomial regression(EPR),and the model tree(MT)method to estimate scour depth around bridge piers with debris effects.Few studies have focused on estimating the shear stress distribution using soft computing methods.The aim of this study was to use GEP modeling to predict the shear stress distribution in circular channels.The results of the best GEP model are presented as an equation and compared with an ANN model and an equation proposed by Sterling and Knight(2002).

2.Materials and methods

2.1.Data presentation

In the present work,experimental data on the shear stress distribution in circular channels measured by Knight and Sterling(2000)were used.Knight and Sterling used the Preston pipe technique to measure the shear stress distribution in a circular channel with a 244-mm diameter at four different flow depths.About 173 sets of data for GEP modeling were used from the experiments of Knight and Sterling(2000).The experimental flume is illustrated in Fig.1.

Fig.1.Experimental flume(Knight and Sterling,2000).

Based on the results obtained by other researchers,the shear stress τ in a circular channel depends on the channel geometry(D,P,and y,where D is the channel diameter,P is the wetted perimeter,and y is the traverse coordinate), flow depth(h),energy slope(S), flow velocity(V),kinematic viscosity(ν),gravitational acceleration(g),and hydraulic radius(R).Therefore,the functional relationship between the shear stress and the effective parameters is defined as

Using dimensional analysis,the dimensionless function to express the shear stress is as follows:

where ρ is the fluid density.

In shear stress distribution modeling with GEP,the abovementioned parameters serve as input variables.Table 1 displays the statistical parameters of the dataset applied.

As can be seen in Table 1,half of all data were used to train the dataset and the remainder were used for testing.In order to compare different similarity values,all data were normalized between 0 and 1.

As mentioned above,Shannon entropy was one of the analytical methods employed by Sterling and Knight(2002)to predict the shear stress distribution in circular,circular-with-aflat bed,and trapezoidal channels.They extracted an equation to estimate the shear stress distribution along the wetted perimeter of a circular channel as follows:

where τmaxis the maximum value of shear stress,and λ is the Lagrange multiplier calculated as follows:

Eq.(4)was used to verify the accuracy of the GEP method in this study.

Table 1 Range of input variables for modeling.

2.2.Sensitivity analysis

To identify the input parameter with the greatest effect on the shear stress distribution prediction using GEP modeling,fifteen different input combinations were applied to the models.These input combinations involved single,binary,ternary,and quaternary combinations.Table 2 shows the input combinations applied to each model.All possible combinations were considered and investigated.

2.3.Overview of gene expression programming(GEP)

GEPbene fitsfromtheadvantagesofGAandGP,butdoesnot havetheirlimitationsanddisadvantages.IntheGEPmethod,the genomes are encoded with constant lengths similar to GA.However,the genes in GEP are presented as phenotypes in the form of expression trees(ETs).Unlike GP,which isa refractory system,GEP is a full- fledged,evolved phenotype(ET)/genotype(chromosome)method that deals with each parameter separately.In some cases,GEP surpasses the GP technique by a factor of 100-60000(Ferreira,2001,2002).

Like other evolutionary algorithms,the modeling process in GEP begins with a randomly generated initial population.This population includes individual chromosomes with a fixed length but potentially different numbers of genes.Each chromosome from the initial population undergoes fitness evaluation using existing fitness function equations from the literature.The chromosomes for the next generation are selected based on the fitness value of each chromosome in the current generation as well as the roulette wheel selection method,which is among the most powerful selection processes(Ebtehaj and Bonakdari,2016).Following superior chromosome selection according tothe randomly generated initial population,the new chromosomes of the subsequent generation are reproduced through amendments carried out by genetic operators(i.e.,mutation,transposition,inversion,and recombination).The new individuals are exposed to the same amendments by these operators.This process continues until one of the stop criteria are reached,such as a given number of generations or acceptable precision(Ferreira,2001,2002).The performance of each genetic operator in GEP is brie fly described below.

Table 2 Input variables of each model for sensitivity analysis.

Mutation:ThisisanimportantgeneticoperatorusedinGEP.It can occur anywhere in the genome.However,the structural organization of chromosomes should be maintained.Thus,the existing function at the head can be replaced with a function or terminal,but at the tail a terminal must be exchanged with only oneoftheotherterminals.Therefore,mutationhelpsmaintainthe structuralorganizationofchromosomes,andthenewindividuals generated by this operator are structurally correct programs.

Inversion operator:This operator inverts a sequence in the gene head following selection.To modify the gene,the terminal and initial points of the head portion are inverted through the random selection of chromosomes.

Insertion sequence(IS)transposition:Every sequence in the genome may become an IS element.Thus,IS transposition elements are randomly selected throughout the chromosomes;these elements are short genomic fragments with a terminal or function in the first position.IS transposition randomly selects a chromosome,a gene to be improved,the start of an element,the target site,and the transposon length.

Root insertion sequence(RIS)transposition:RIS transposition elements are short genomic segments similar to IS transposition elements.The difference between IS and RIS is the starting point.The starting point in IS can be a function or terminal,whereas in RIS it is always a function.If no function is found,the RIS operator is ineffective.

Gene transposition:With this operator,a complete gene operates as a transposon and transposes itself to the beginning of the chromosome.The gene transposition operator deletes the gene at the place of origin,unlike IS and RIS transposition.

One-point and two-point recombination:In one-point recombination,the parent chromosomes are coupled at the same point.The gene's contribution downstream of the crossover point is replaced between two chromosomes.In twopoint recombination,two crossover points are randomly selected and two parent chromosomes are coupled.

Gene crossover:In this operator,genes are replaced between two parent chromosomes,and daughter chromosomes are created,including genes from both parents.The displaced genes are randomly selected and situated in the same location as the parent chromosomes.

2.4.Derivation of shear stress modeling based on GEP

The shear stress (τ/(ρgRS))was modeled with the GEP technique in terms of four dimensionless parameters:Fr,Re,h/D,and y/P.First,the dataset was classified in two groups:training and testing.50%of all data were selected randomly to train the GEP model and the remaining 50%were used to test the model performance.After selecting the training set,the GEP learning environment should be defined.The five main steps in GEP training are as follows:

First step:selecting fitness function.The fitness(fi)of an individual program i is defined as

where Tjis the target value for fitness case j,Ci,jis the value estimated by the individual program i for fitness case j,Ciis the number offitness cases for the individual program i,and M is the selection range.If the difference between the value predicted by GEP and the actual value is less than 0.01,that isthen the precision is zero,and fi=fmax=CtM,where fmaxis the maximum fitness,and Ctis the number offitness cases.In this study,the values of M and Ctwere 100 and 10,respectively.Thus,fmaxwas 1000.The fitness function in Eq.(5)results in an optimal solution.As such,the runs continued until the maximum fitness was attained(Ferreira,2006).

Second step:determining the function set(F)and terminal set(T)for chromosome generation.Trial and error was used to obtain the basic arithmetic operators(+,-,×,/)and some basic mathematical operators(pow,sqrt,ln,x4,3Rt,and 5Rt)that serve as function sets.lnx,pow(x,y),x4,sqrt(x),3Rt(x),and 5Rt(x)return lnThe terminal set included four dimensionless independent parameters:T={h/D,Fr,Re,y/P}.

Third step:specifying the number of genes and the head length.GEP modeling considers a value of 1-5 for the number of genes and 1-10 for the head length.According to an evaluation of different head lengths and numbers of gene combinations in the training and testing stages,increasing the number of genes and head length beyond 3 and 7,respectively,does not significantly enhance GEP performance.Therefore,the head length and number of genes utilized in this study were 7 and 3,respectively.In addition to these two parameters,the number of chromosomes should be defined.A number of chromosomes between 30 and 100 could lead to strong model performance(Ferreira,2001).In the present study,the initial number of chromosomes used was 30,which was increased to 50 during GEP modeling.Based on the training and testing results,30 is the best number of chromosomes.

Fourth step:defining the linking function for linking different sub-ETs.Recent studies in various engineering fields have shown that using addition as a linking function yields better results(Gharagheizi et al.,2012;Zhang et al.,2013;Alavi et al.,2013;Ebtehaj et al.,2015).Therefore,the linking function applied in this study was addition.

Fifth step:setting the values of different genetic operators,such as inversion,transposition,and recombination.The values of these operators and other parameters used in GEP are presented in Table 3.Fig.2 represents the GEP flowchart and best GEP model in training and testing.

2.5.Goodness-of- fit of model performance

Four statistical evaluation criteria were used to assess the models’performance:

(1)Coefficient of determination(R2):

(2)Root mean square error(RMSE):

(3)Mean absolute error(MAE):

(4)Thecoefficientofresidualmass(CRM),whichisanindex for recognizing model underestimation or overestimation compared to measured values in the laboratory:

Table 3 Parameter settings for GEP modeling.

Fig.2.GEP flowchart.

where Tmiis the measured shear stress value,Tpiis the simulated shear stress value,and n is the number of shear stress values.A combination of the mentioned statistical parameters is sufficient for model evaluation.

3.Results and discussion

3.1.Individual combinations

Fig.3.Scatterplots of models with one input variable.

The results of applying the input variables to the models individually are illustrated in Fig.3.Models 1,2,and 3,with the ratio of the flow depth to the pipe diameter(h/D),Froude number(Fr),and Reynolds number(Re)as individual inputs,respectively,cannot model the shear stress distribution.This is evident from the results,which fall in straight lines in the scatterplots.The R2values of models 1,2,and 3 are very low and,hence,unacceptable.Therefore,it can be said that these models cannot predict the shear stress distribution with these inputs.It is concluded that model 4,with y/P as an input variable,performs well to some extent.The R2value of 0.477 for this model indicates that y/P is the most effective parameter in predicting the shear stress distribution among the effective parameters obtained by dimensional analysis in Eq.(2).Although model 4 is more accurate than other models with individual parameters(models 1,2,and 3),it does not predict the shear stress distribution well.Therefore,it is not sufficient to use individual parameters for predicting the shear stress distribution.Rather,a combination of different dimensionless parameters should be used to achieve an accurate and flexible model.

3.2.Binary combinations

The models containing two input variables are examined in this section.It is noted that,with an increase in the number of input variables,the GEP models perform better.Fig.4 displays the results of models 5 to 10 with different two-input variable combinations.In models 5,6,and 7 the parameter h/D is constant.The R2values of models 5 and 6 are very low,whereas for model 7 the value is 0.7031.Therefore,the twoinput variable combinations presented for models 5 and 6 cause very poor performance.However,using h/D and y/P increase the GEP model accuracy,as the value of R2increases from 0.477 in model 4 to 0.7031 in model 7.Model 8,with Fr and Re as input combinations,exhibits weak performance in predicting the shear stress distribution,similar to the individual models as well as models 5 and 6(binary).Models 9 and 10,with y/P as one of two inputs in their combinations,perform well,and better than all individual and other binary models.Therefore,using y/P with Fr or Re as effective parameters in estimating the shear stress distribution enhances prediction accuracy.Fig.4 indicates that model 9 makes some predictions with a high relative error.Thus,model 10 is the best among the models with one and two dimensionless parameters;that is,combining y/P with Re creates a robust GEP model(model 10)with precise results.

3.3.Ternary combinations

Fig.4.Scatterplots of models with two input variables.

Fig.5.Scatterplots of models with three input variables.

Of the models with ternary parameter combinations,model 11 performs poorly.This model,with h/D,Fr,and Re,cannot estimate the shear stress distribution,as the predicted values also fall in a straight line in the scatter plots.As seen in Fig.5,the absence of effective parameter y/P in model 11 significantly impacts this model's results.With the addition of parameter y/P as an input variable,the GEP model results are improved significantly(models 12,13,and 14).The parameters of model 12 are utilized in models 5,7,and 9 with binary combinations.A comparison between these models demonstrates that model 12 only outperforms model 5 in predicting the shear stress distribution;model 5 does not include y/P as an effective parameter.Actually,adding one dimensionless parameter to model 7 or 9 decreases GEP accuracy.This trend is also evident for models 13 and 14(ternary models).Of all GEP models with ternary input variables,model 13,with h/D,Re,and y/P as input variables,and with an R2value of 0.6654,performs the best.Therefore,the Re and y/P combination substantially affects GEP model 13 and validates the results of model 10.

3.4.Quaternary combinations

The last model contains all input variables presented in Eq.(2)as effective parameters in predicting the shear stress distribution.The results of model 15 are illustrated in Fig.6;it is clear that this model makes good shear stress distribution estimations,with an R2value of 0.7549.The predicted values are less scattered from the fitted line,showing accurate shear stress distribution predictions.Considering all 15 GEP models,model 10 with y/P and Re as input variables and an R2value of 0.7814 is the best GEP model.

After extending the GEP models and investigating the sensitivity analysis results,GEP model 10 with y/P and Re as input variables was chosen as the best GEP model,with accurate shear stress distribution estimation in circular channels.A straightforward equation obtained from this GEP model for predicting the shear stress distribution is as follows:

According to Eq.(10),the dimensionless shear stress can be calculated with y/P and Re.

Fig.6.Scatterplot of model with four input variables.

3.5.Sensitivity analysis

Sensitivity analysis was used to analyze thevariation trend of τ/(ρgRS)accordingtotheinputvariablesforthebestmodel.The sensitivityanalysisresults areillustratedinFig.7.Asseeninthis figure,lower y/P values(y/P<0.5)correspond to positive sensitivity values,and therefore,with increasing y/P,the dimensionless shear stress increases.However,for y/P>0.5,an increase in this input variable causes the dimensionless shear stress to decrease.For input variable Re,with increasing Re,the output variable τ/(ρgRS)decreases,since all sensitivity values are in the negative range.With an increase in the Re value,the effect of this parameter decreases.

3.6.Comparison of results of GEP with ANN model and Shannon entropy-based equation

The GEP equation(Eq.(10))obtained from GEP model 10 was compared with an ANN model and Shannon entropybased equation.The statistical results of this comparison are tabulated in Table 4.Evidently,at greater flow depths,the GEP model performs better in terms of statistical parameter values:RMSE=0.0339 and MAE=0.0297 for h/D=0.333,RMSE=0.0297 and MAE=0.0214 for h/D=0.506,and RMSE=0.0228 and MAE=0.0187 for h/D=0.666.The ANN model predicts the shear stress distribution in circular channels with lower accuracy.For all flow depths the ANN model demonstrates poor results and at greater flow depths it exhibits the worst results.It can be deduced that the ANN model cannot simulate the shear stress distribution in circular channels.Moreover,the results of the Shannon entropy-based equation are better for greater flow depths than for lower ones.Comparison of the GEP model results with the Shannon entropy-based equation shows that the GEP model outperforms the Shannon entropy-based equation for all flow depths.For h/D=0.826,due to the presence of strong secondary flows,shear stress prediction is more complicated.Furthermore,the effect of secondary flow is not considered in the GEP model.Therefore,its results are less precise than those at h/D=0.666.Nonetheless,the results for this flow depth are once again more accurate than the shear stress predicted by the Shannon entropy-based equation.

The shear stress distribution along the wetted perimeter of a circular channel with different flow depths predicted by the GEP model,ANN model,and the Shannon entropy-based equation is illustrated in Fig.8.From this figure,it is clear that the GEP model presents a better pattern of the shear stress distribution prediction at increasing flow depths.Moreover,the predicted shear stress values are better adapted to the experimental results for increasing flow depths.It is clear that the shear stress predictions by the Shannon entropy-based equation for 0<y/P<0.2 and 0.8<y/P<1 are underestimated for greater flow depths and overestimated for lower flow depths.The GEP model displays better performance with increasing flow depths.For h/D=0.826,the GEP model and Shannon entropy-based equation perform well,and the GEP model is superior to some extent.The ANN model demonstrates inferior performance in the shear stress distribution estimation to the GEP and the Shannon entropy-based equation.It is clear that the ANN model predicts the same shear stress pro file for all flow depths and lower shear stress values for the entire wetted perimeter.Although the ANN model can estimate the shear stress distribution pattern,the location of maximum shear stress in circular channels is not at the channel centerline.

Fig.7.Sensitivity analysis of different inputs for best model.

Table 4 Statistical analysis of GEP,ANN and Shannon entropy-based equation.

Fig.8.Comparison of shear stress distributions predicted by most appropriate GEP model,ANN model,and Shannon entropy-based equation.

4.Conclusions

Determining the shear stress distribution in open channels is an essential problem that engineers have been attempting to solve.A new formulation for estimating the shear stress distribution in circular channels in the form of GEP was investigated.To identify the effective parameters in shear stress distribution modeling with GEP,sensitivity analysis was applied and 15 different GEP models were run.The results indicate that y/P is a sensitive parameter in shear stress distribution estimation.When Re and y/P were added as inputs,the GEP models show the most accurate shear stress distribution prediction results.A simple equation was derived from the best GEP model.According to the sensitivity analysis,parameter y/P has a complex effect on the dimensionless shear stress,but the input variable Re has a direct relation with the dimensionless shear stress.The shear stress distribution predicted by the most appropriate GEP model selected was compared with those predicted by an ANN model and the Shannon entropy-based equation.Both the GEP model and Shannon entropy-based equation performed better with increasing flow depths.The ANN model exhibited the worst results and produced lower shear stress estimations than the experimental values.The GEP model,with an average RMSE of 0.0301,indicated better functionality than the Shannon entropy-based equation and the ANN model,with average RMSE values of 0.1049 and 0.2815,respectively.A simple and applicable equation was thus extracted from the GEP model for predicting the shear stress distribution in circular channels.