S/W TEST ...bringing people, processes, and practices together...












Home
Forum
Search
FAQ
Links
  Integration test sites
Databases
Internal sites
External sites
Organizations
Documents
  Search
Submit
View by owner
Newsletter
  Current
Archives
Contact List
  Search
View by name
Distribution Lists
  MTeam
Top Gun
S/W TEST Council
Facilities Sharing
Patents
  Search
View by title
Self-Assessment
FAQ
Print the HTML
Print the PDF
Submit
Glossary
Feedback
IBM | Search | Support
 

IBM Software TEST News: September 1999

Coverage and What It Can Do For You

By: Shmuel Ur, TopGun, Haifa Research Lab
 
 


Testing is one of the biggest problems of the software industry. The cost of testing is usually between 40-80% of the development process as compared with less than 20% for the coding itself [4]. The practice of letting the users find the bugs and fixing them in the next release is becoming dangerous and costly for three main reasons: reputation and brand-name are harmed, replacing the software can be very costly when there is a large install base, and litigation can be expected if the software error caused harm to the user. Therefore, one has to be certain that testing resources are used efficiently and that the testing is thorough.

The main technique for demonstrating that the testing has been thorough is called test coverage analysis [10]. Simply stated, the idea is to create, in some systematic fashion, a large and comprehensive list of tasks and check that each task is covered in the testing phase. Coverage can help in monitoring the quality of testing, assist in creating tests for areas that have not been tested before, and help with forming small yet comprehensive regression suites [5].

Coverage, in general, can be divided into two types: code-based or functional. Code-based coverage concentrates on measuring syntactic properties in the execution, for example, that each statement was executed, or each branch was taken. This makes program-based coverage a generic method which is usually easy to measure, and for which many tools are available. Examples include program based coverage tools for C [9], C++ [13], and Java [14]. Functional coverage, on the other hand, focuses on the functionality of the program, and it is used to check that every aspect of the functionality is tested. Therefore, functional coverage is design and implementation specific, and is more costly to measure. 

Inside IBM there are a number of coverage tools that are freely available

    • Focus - a tool that implements the functional coverage methodology. To download go to Alphaworks and look for Focus
    • xSuds - A C C++ code coverage tool from bell internally available in for free use in IBM
    • PureCoverage - A C, C++, Fortran and assembler coverage tool from Rational


A number of papers on coverage can be found at the IBM Haifa Research web site.

The Benefits and Risks in Using Coverage
Coverage is defined as any metric of completeness with respect to a test selection criteria [3]. Many such metrics have been suggested in the past [3], of which statement coverage is the most common. Full statement coverage means that every statement in the program has been executed by the tests. Coverage is one of the more systematic ways to check that the testing has been thorough. When using any coverage model, of which many are available [11], a metric is created against which the quality and completeness of the testing is measured.

The most commonly used coverage metrics are based on the control flow of the program, such as statement coverage and branch coverage, however, many other metrics exist. Some coverage metrics are based on the data flow of variables, like define-use [3], while others are not based on the program code but on the inputs or the specifications.

Coverage is usually used to find new testing requirements that have been overlooked in the test plan. Many times the test requirements are written during the design and do not take into account the details of the implementation. For example, the implementation of a sorting function might use two different algorithms, depending on the size of the array sorted, a detail which is not in the specifications. In this case, statement coverage might show that the inputs never included the case of a short array, that use one of the algorithms, and that a new test is needed. Working with coverage as a guide to improve the quality of testing has been shown to be a cost effective use of resources [12].

Another application of coverage that is commonly used, is generation of regression suites [10]. Generation of regression suites has to deal with two contradictory requirements; the suite must be small so that it is economical to execute it after every design change, yet it must be comprehensive in order to find the bugs that were introduced. Coverage enables us to find a relatively small set of tests which is comprehensive in the sense that it covers the required metric [5].

Besides these uses, coverage provides other benefits to the testing process that are often overlooked. One such benefit is the use of coverage or, more specifically, functional coverage, to assist in defining testing requirements and specifications. Another benefit of functional coverage is that it helps to achieve a better understanding of the tested program during definition of the coverage models.

While the use of coverage as an aid to the testing process has a lot of benefits, centering the testing process around coverage has its own risks. A common misconception about coverage is that the testing methodology should be to decide on an appropriate coverage metric and then generate a set of tests that covers it. This is not advisable for a number of reasons, the main one is that the tests created to achieve coverage goals are usually very simple tests.

Another drawback of coverage is that many coverage models are ill suited to deal with many common problems. For example, control flow models, such as statement and branch coverage, are ill suited to deal with missing code. If, for example, a case statement should have six cases but, in practice, it has only four, statement or branch coverage will not help you find it. One way to overcome this difficulty is to use several coverage models, which are derived from different domains, so that one model will cover the weaknesses of another model.

A different risk in using coverage is setting low coverage goals. It has been shown that using coverage to assess quality with a lower coverage target (50%-90%) is not useful [12]. The reason is that the probability of having bugs in hard-to-cover areas tends to be larger than the probability of bugs in well covered areas. Therefore, it is better to use simpler coverage models with high coverage goals than more complex models with lower coverage goals.

Code Based Coverage
Code based coverage, usually just called coverage, is a technique that measures the execution of tests against the source code of the program. For example, one can measure whether all the statements of the program have been executed. The main uses of program based coverage are assessing the quality of the testing, finding missing requirements in the test plan and constructing regression suites.

A number of standards, as well as internal company policies, require the testing program to achieve some level of coverage, under some model. For example, one of the requirements of the DOA standard [15] is 100% statement coverage.

Many coverage tools that support all major programming languages exist. Every tool implements a number of coverage models for a particular combination of operating system, compiler and programming language. Most of them work by instrumenting the source code and adding counters which can later be used by the tool's user interface to show the status and progress of the coverage in some detail. To apply such a tool, one typically has to recompile the software with the tool and execute the tests. After the tests are executed, there is usually some interface that highlights the parts of the program that were not covered.

Almost all coverage tools implement the statement and branch coverage models. Multi-condition coverage, a model that checks that each part of a condition (e.g. A or B and C) had impact, is also implemented by many tools. Fewer tools implement the more complex models such as define-use, mutation, and path coverage variants [6].

The main advantage of code based coverage tools is their simplicity of use. The tools come ready for the testing environment. No special preparations are needed in the programs and understanding the feedback from the tool is straightforward. The main disadvantage of code coverage tools is that the tools do not "understand" the application domain. Therefore, it is very hard to tune the tools to areas which the user thinks are of significant.

Functional Coverage
Unlike code based coverage, where the execution of tests is measured against the program source code, functional coverage focuses on the functionality of the program, and it is used to check that every aspect of the functionality is tested. Therefore, functional coverage is design and implementation specific, and is harder to measure. Currently, functional coverage is mostly done manually.

Functional coverage is considered by some to be black-box testing [6], since it involves models based on the specifications of the application. We believe that functional coverage is much more varied. Functional coverage models can be based on the specifications of the application, but they can also be derived from the implementation. Functional coverage models have many flavors. Models can cover the inputs and outputs of the program or they can look at the internal state of the program (e.g., values of variables). Functional coverage models can be snapshot models, that look at the state of the program at a certain time, or they can be temporal models that deal with scenarios. Usually, functional coverage models involve looking at several properties in parallel. Our experience shows that many bugs can be found only when a number of events happen concurrently [1]. Therefore, covering each event on its own is not sufficient. A simple example for a snapshot model is covering all the possible values of the input parameters of a function. An example for a temporal model is looking at the changes in the values of global variables between consecutive activations of a function. Thread interleaving and synchronization in a multi-threaded system is a source of many bugs. Therefore, a coverage model that looks at all the reasons for thread switching is a good example for a coverage model that is based on a bug model (A bug model is a set of requirements for finding bugs of a type that have been uncovered before.)

General Guidelines for Usage of Coverage
Coverage should not be used if the resources used for it can be better spent elsewhere. This is the case when the budget is very tight and there is not enough time to even finish the test plan. In such a case, designing new tests is not useful as not all the old tests will be run. Coverage should be used only if there is a full commitment to make use of the data collected. Measuring coverage in order to report coverage percentile is practically worthless. Coverage points out parts of the application that have not been tested and guides test generation to these parts. Moreover, it is very important to try to reach full coverage or at least set high coverage goals, since many bugs hide in hard-to-reach places. 

Coverage is a very useful criteria for test selection for regression suites. Whenever a small set of tests is needed, the test suite should be selected so that it will cover as many requirements or coverage tasks as possible.

When coverage and reviews are used for the same project reviews can put less emphasis on things that coverage is likely to find. For example, a review for dead code is unnecessary if statement coverage is used, a review for boundary error in loops is redundant if the appropriate mutation coverage model [8] is used, and manually checking that some values of variable can be attained is not needed if the appropriate functional coverage model is used.
 

Conclusions
In this article, we compared functional coverage to code based coverage. We have shown that each has its own merits and drawbacks. It has been our experience, as well as the experience of anyone that we know has used coverage, that coverage is worth doing. Almost anyone, and under any timing and budget consideration, can benefit from some form of coverage. However, one has to commit to coverage and use it properly. Functional coverage is a more powerful testing technique then code based coverage, since it can focus on areas of concern, and contribute to the design and verification processes in many more ways. On the other hand, functional coverage is more complicated and requires more resources than program based coverage. We therefore recommend that it should be reserved to those parts of a program that are of special concern. Code based coverage, on the other hand, can and should be uniformly applied to the entire application.

Acknowledgment

This article is part of the paper Off-The-Shelf Vs. Custom Made Coverage Models,Which Is The One for You?
by Shmuel Ur and Avi Ziv which was published in STAR98.

References
[1] Y. Abarbanel-Vinov, and S. Ur. Processor Bug Classification and Modeling, IBM's Haifa Research Lab internal document, 1996.
[2] J. Baumgartner and R. Raghavan. Method to compute test coverage in complex computer system simulation. IBM Technical Disclosure Bulletin, 40(3):1-4, March 1997.
[3] B. Beizer. Software Testing Techniques. Van Nostrand Reinhold, 1990.
[4] F. P. Brooks. The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley, 1995.
[5] E. Buchnik and S. Ur. Compacting regression-suites on-the-fly. In Proceedings of the 4th Asia Pacific Software Engineering Conference, pages 385-94, December 1997.
[6] S. Cornett. Software Test Coverage Analysis, http://www.bullseye.com/webCoverage.html
[7] R. Grinwald, E. Harel, M. Orgad, S. Ur, and A. Ziv. User defined coverage - a tool supported methodology for design verification, to appear in Proceedings of the 35th Design Automation Conference, June 1998.
[8] W.E. Howden. Weak mutation testing and completeness of test sets, IEEE Transactions on Software Engineering, 8(4):371-379, July 1982.
[9] J.R. Horgan, S. London and M.R. Lyu. Achieving software quality with testing coverage measures, Computer, 27(9):60-69, September 1994.
[10] B. Marick. The Craft of Software Testing, Subsystem Testing Including Object-Based and Object-Oriented Testing. Prentice-Hall, 1995.
[11] C. Kaner. Software negligence and testing coverage, In proceedings of STAR 96: the Fifth International Conference, Software Testing, Analysis and Review, pages 299-327, June 1996.
[12] R. Stewart. Unit test coverage as leading indicator of rework, EuroSTAR 97, November 1997.
[13] C-Cover - Test Coverage Analyzer for C/C++, http://www.bullseye.com/webCcover.html
[14] DeepCover for Java, http://www.rstcorp.com/DCJava.html
[15] Software test and evaluation guidelines, Department of the Army, Pamphlet 73-7


Software TEST Home   September 1999 Newsletter

 
Software TEST Home   November 1999 Newsletter

 
 
 

Feedback
Let us know what you think. That's how we improve our website. 
* Tell us...
Get in Touch
SWTeam members are just a click away. And they want to hear from you!
* Search the list
* View by site
* View by name
Links
Find test tools, test methodologies, vendor info, and even more... 
* Integration Test
* Databases
* Internal Web sites
* External Web sites
* Professional Orgs