Improved metrics for evaluating fault detection efficiency of test suite

2014-09-06 10:49WangZiyuanChenLinWangPengZhangXueling

Journal of Southeast University(English Edition) 2014年3期

Wang Ziyuan Chen Lin Wang Peng Zhang Xueling

(1 School of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210006, China)(2 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China)(3School of Computer Science and Engineering, Southeast University, Nanjing 210096, China)

Wang Ziyuan1,2Chen Lin2Wang Peng3Zhang Xueling1

(1School of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210006, China)(2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China)(3School of Computer Science and Engineering, Southeast University, Nanjing 210096, China)

By analyzing the average percent of faults detected (APFD) metric and its variant versions, which are widely utilized as metrics to evaluate the fault detection efficiency of the test suite, this paper points out some limitations of the APFD series metrics. These limitations include APFD series metrics having inaccurate physical explanations and being unable to precisely describe the process of fault detection. To avoid the limitations of existing metrics, this paper proposes two improved metrics for evaluating fault detection efficiency of a test suite, including relative-APFD and relative-APFDC. The proposed metrics refer to both the speed of fault detection and the constraint of the testing source. The case study shows that the two proposed metrics can provide much more precise descriptions of the fault detection process and the fault detection efficiency of the test suite.

software testing; test case prioritization; fault detection efficiency; metric

The test case prioritization technique schedules test cases in an initial test suite in order, forming a prioritized test suite that increases its efficiency. Giving an existing initial test suiteTinit, the test case prioritization technique aims to discover the best prioritized test suiteσ∈Psuch that

(∀σ)(σ′∈P)(σ=σ′)[f(σ)>f(σ′)]

wherePis the set of all the possible permutations ofTinit, andfis an objective function[1].

An objective function called the average percent of faults detected (APFD) is usually utilized as the metric to evaluate the faults detection efficiency of the prioritized test suiteσ∈P[1]. There are also some variants of the APFD metric, including NAPFD[2], APFDC[3]etc. In this paper, we jointly call these metrics the APFD series.

For these problems, we propose an improved metric relative-APFD, which is related to a given testing resource constraint that determines how many test cases can be run, to replace the existing APFD and NAPFD. Furthermore, we also discuss the scenarios where test costs and fault severities are taken into consideration, and propose relative-APFDCto replace existing APFDC. The case study shows that all the proposed metrics can provide much more precise illustrations of the fault detection efficiency of a prioritized test suite.

1 APFD Series Metrics

Letσ,Φ, and TF(φ,σ) be the prioritized test suite under evaluation, the set of faults contained in the software, and the index of the first test case inσthat exposes faultφ∈Φ, respectively, and then the APFD ofσis defined as[1]

wherepis the rate of faults detected byσ, i.e.,

In recent years, people have proposed other metrics by extending APFD for special applications, including metrics for parallel processes[8], and metrics for evaluating the ratio of achieved efficiency[9]etc.

2 Limitations of APFD Series Metrics

2.1 Constraint on the sizes of test suites

We take test cases and faults in Tab.1 as examples to show some incorrect results when using APFD series metrics in scenarios, where the sizes of prioritized test suites are varied.

Tab.1 Faults detected by test cases

1) For the situation where all faults are detected, we construct two prioritized test suitesσ1: T3-T5-T2-T4-T1andσ2: T3-T5-T6. Note that bothσ1andσ2can detect all faults. Then we obtain the APFD values (see Fig.1).

APFD(σ1) =APFDC(σ1)=0.6

APFD(σ2)APFDC(σ2)=0.5

However, it is incorrect to say thatσ1is more efficient thanσ2. After run 1 (or 2) test case(s), bothσ1andσ2detect 3 (or 5) faults; after run 3 test cases,σ2detects all the 8 faults whileσ1detects only 5. This means thatσ2detects faults more rapidly thanσ1.

2) For the situation where there are non-detected faults, we construct two prioritized test suitesσ3: T3-T2-T5andσ4: T3-T5. Note thatσ3andσ4detect the same faults. Then, we obtain NAPFD values.

NAPFD(σ3)=0.354 2

NAPFD(σ4)=0.343 6

It is also incorrect to say thatσ3is more efficient thanσ4. After running 1 test case, bothσ3andσ4detect 3 faults; after running 2 test cases,σ4detects 5 faults whileσ3detects only 3. It means thatσ4detects faults more rapidly thanσ3.

(a)

(b)

This limitation, which has been often overlooked previously, sometimes may lead to incorrect and confused experimental results in the applications of APFD series metrics[2,5].

2.2 Process of fault detection

Another limitation is that the APFD series metrics cannot precisely illustrate the process of fault detection in the real world. They assume that during the running of one test case, the number of the newly detected faults (for APFD and NAPFD) or the total severities of the newly detected faults (for APFDC) grow linearly with consumed time. Factually, however, if a test case is still running, it cannot detect any faults since we cannot check whether it has passed or failed.

3 Improved Metrics

3.1 Relative-APFD

Formally, letσ,Φ, TF(φ,σ) be the prioritized test suite under evaluation, the set of faults contained in the software and the position of the first test case inσthat exposes faultφ, respectively. We specifically set TF(φ,σ)=0 for non-detected faults. For a given testing resource constraintm, the relative-APFD ofσis defined as

where

In addition,p(m) is the ratio of the number of faults detected by the firstmtest cases inσto the number of faults inΦ; i.e.,

3.2 Relative-APFDC

By considering the test costs, the given testing resource constraint should be scaled by a positive real numbermC. Then we can propose the metric relative-APFDCby extending relative-APFD.

where

andp(mC) is the ratio of the total severities of faults detected byσwithin the testing resource constraint to the total severities of all the faults inΦ; i.e.,

4 Case Study

Considering the prioritized test suitesσ1: T3-T5-T2-T4-T1andσ2: T3-T5-T6, their relative=APFD values for the testing resource constraintm=1, 2, 3, 4, and 5 are shown in Fig.2 as the area under the step functions:

• RAPFD(σ1, 1)=RAPFD(σ2, 1)=0;

• RAPFD(σ1, 2)=RAPFD(σ2, 2)=3/16;

• RAPFD(σ1, 3)=RAPFD(σ2, 3)=1/3;

• RAPFD(σ1, 4)=13/32 < RAPFD(σ2, 4)=1/2;

• RAPFD(σ1, 5)=1/2 < RAPFD(σ2, 5)=3/5.

(a)

(b)

The overall results show that, if the testing resource constraint is less than or equal to 3 (3 or less test cases run),σ1andσ2have the same efficiency; and if the constraint is greater than 3 (more than 3 test cases run),σ2is more efficient thanσ1.

Considering the other two prioritized test suitesσ3: T3-T2-T5andσ4: T3-T5, their relative-APFD values for testing the resource constraintm=1, 2, 3 are as follows:

• RAPFD(σ3, 1)=RAPFD(σ4, 1)=0;

• RAPFD(σ3, 2)=RAPFD(σ4, 2)=3/10;

• RAPFD(σ3, 3)=6/15 < RAPFD(σ4, 3)=8/15.

The overall results show that, if the testing resource constraint is less than or equal to 2 (2 or less test cases run),σ3andσ4have the same efficiency; if the constraint is greater than 2 (more than 2 test cases run),σ4is more efficient thanσ3.

The above two cases show that, relative-APFD avoids incorrect results obtained by existing APFD and NAPFD. The relative-APFDChas the same advantage, which is omitted here.

5 Conclusion

We make a brief review of widely used existing APFD series metrics including APFD, NAPFD and APFDC, and discuss their limitations. To avoid these, two improved metrics relative-APFD and relative-APFDCare proposed in this paper. These proposed metrics can illustrate the process of faults detection more precisely and practically, and provide more correct results to evaluate and compare the efficiency of prioritized test suites. In the future works, some metrics for a parallel testing process are required, since the cloud computing techniques have been widely applied to software testing.

[1]Rothermel G, Untch R H, Chu C Y, et al. Prioritizing test cases for regression testing [J].IEEETransactionsonSoftwareEngineering, 2001, 27(10): 929-948.

[2]Qu X, Cohen M B, Woolf K M. Combinatorial interaction regression testing: a study of test case generation and prioritization [C]//ProceedingsofIEEEInternationalConferenceonSoftwareMaintenance. Paris, France, 2007: 255-264.

[3]Elbaum S, Malishevsky A G, Rothermel G. Incorporating varying test costs and fault severities into test case prioritization [C]//ProceedingsoftheInternationalConferenceonSoftwareEngineering. Toronto, Canada, 2001: 329-338.

[4]Chen X, Gu Q, Zhang X, et al. Building prioritized pairwise interaction test suites with ant colony [C]//Proceedingsofthe9thInternationalConferenceonQualitySoftware. Jeju, Korea, 2009: 347-352.

[5]Walcott K R, Soffa M L, Kapfhammer G M, et al. Time-aware test suite prioritization [C]//Proceedingsof23rdInternationalSymposiumonSoftwareTestingandAnalysis. Portland, Maine, USA, 2006:1-11.

[6]Harrold M J, Gupta R, Soffa M L. A methodology for controlling the size of a test suite [J].ACMTransactionsonSoftwareEngineeringandMethodology, 1993, 2(3): 270-285.

[7]Weiβleder S. Towards impact analysis of test goal prioritization on the efficient execution of automatically generated test suites based on state machines [C]//Proceedingsofthe11thInternationalConferenceOnQualitySoftware. Madrid, Spain, 2011: 150-155.

[8]Qu B, Xu B, Nie C, et al. A new metrics for test case prioritization in parallel scenario [J].JournalofSoutheastUniversity:NaturalScienceEdition, 2009, 39(6): 1104-1108. (in Chinese)

[9]Zhang X, Qu B. An improved metric for test case prioritization [C]//ProceedingsofWebInformationSystemsandApplicationsConference. Chongqing, China, 2011:125-130.

改进的测试用例错误检测效率度量方法

王子元1,2陈林2汪鹏3仉雪玲1

(1南京邮电大学计算机学院, 南京 210006)(2南京大学软件新技术国家重点实验室, 南京 210093)(3东南大学计算机科学与工程学院, 南京 210096)

分析了在测试用例优先级问题中被广泛用于度量测试用例集错误检测效率的APFD度量标准及其变种,指出APFD系列度量标准存在物理意义模糊、对错误检测过程描述不清晰等缺陷.针对这些缺陷对已有度量标准进行改进,提出2种新的测试用例集错误检测效率度量方法relative-APFD和relative-APFDC.新的度量方法在评价测试用例集效率时,综合考虑了错误检测速度和测试资源约束问题.实例分析表明,新方法可以更为清晰地描述测试用例集错误检测过程，并更为准确地反映不同测试用例集的错误检测效率.

软件测试; 测试用例优先级; 错误检测效率; 度量方法

TP311

s：The National Natural Science Foundation of China (No.61300054), the Natural Science Foundation of Jiangsu Province (No.BK2011190, BK20130879), the Natural Science Foundation of Higher Education Institutions of Jiangsu Province (No.13KJB520018), the Science Foundation of Nanjing University of Posts & Telecommunications (No.NY212023).

：Wang Ziyuan, Chen Lin, Wang Peng, et al. Improved metrics for evaluating fault detection efficiency of test suite[J].Journal of Southeast University (English Edition),2014,30(3):285-288.

10.3969/j.issn.1003-7985.2014.03.005

Received 2013-12-28.

Biography：Wang Ziyuan (1982—), male, graduate, associate professor, wangziyuan@njupt.edu.cn.