IBM
Software TEST News: September 1999
Coverage and What It
Can Do For You
By: Shmuel Ur, TopGun, Haifa
Research Lab
Testing is one of the biggest
problems of the software industry. The cost of testing is usually between
40-80% of the development process as compared with less than 20% for the
coding itself [4]. The practice of letting the users find the bugs and
fixing them in the next release is becoming dangerous and costly for three
main reasons: reputation and brand-name are harmed, replacing the software
can be very costly when there is a large install base, and litigation can
be expected if the software error caused harm to the user. Therefore, one
has to be certain that testing resources are used efficiently and that
the testing is thorough.
The main technique for demonstrating
that the testing has been thorough is called test coverage analysis
[10]. Simply stated, the idea is to create, in some systematic fashion,
a large and comprehensive list of tasks and check that each task is covered
in the testing phase. Coverage can help in monitoring the quality of testing,
assist in creating tests for areas that have not been tested before, and
help with forming small yet comprehensive regression suites [5].
Coverage, in general, can
be divided into two types: code-based or functional. Code-based coverage
concentrates on measuring syntactic properties in the execution, for example,
that each statement was executed, or each branch was taken. This makes
program-based coverage a generic method which is usually easy to measure,
and for which many tools are available. Examples include program based
coverage tools for C [9], C++ [13], and Java [14]. Functional coverage,
on the other hand, focuses on the functionality of the program, and it
is used to check that every aspect of the functionality is tested. Therefore,
functional coverage is design and implementation specific, and is more
costly to measure.
Inside IBM there are a number
of coverage tools that are freely available
-
Focus - a tool that
implements the functional coverage methodology. To download go to Alphaworks
and look for Focus
-
xSuds - A C C++ code
coverage tool from bell internally available in for free use in IBM
-
PureCoverage - A C,
C++, Fortran and assembler coverage tool from Rational
A number of papers on
coverage can be found at the IBM
Haifa Research web site.
The Benefits and Risks
in Using Coverage
Coverage is defined as
any metric of completeness with respect to a test selection criteria [3].
Many such metrics have been suggested in the past [3], of which statement
coverage is the most common. Full statement coverage means that every
statement in the program has been executed by the tests. Coverage is one
of the more systematic ways to check that the testing has been thorough.
When using any coverage model, of which many are available [11], a metric
is created against which the quality and completeness of the testing is
measured.
The most commonly used coverage
metrics are based on the control flow of the program, such as statement
coverage and branch coverage, however, many other metrics exist. Some coverage
metrics are based on the data flow of variables, like define-use [3], while
others are not based on the program code but on the inputs or the specifications.
Coverage is usually used
to find new testing requirements that have been overlooked in the test
plan. Many times the test requirements are written during the design and
do not take into account the details of the implementation. For example,
the implementation of a sorting function might use two different algorithms,
depending on the size of the array sorted, a detail which is not in the
specifications. In this case, statement coverage might show that the inputs
never included the case of a short array, that use one of the algorithms,
and that a new test is needed. Working with coverage as a guide to improve
the quality of testing has been shown to be a cost effective use of resources
[12].
Another application of coverage
that is commonly used, is generation of regression suites [10]. Generation
of regression suites has to deal with two contradictory requirements; the
suite must be small so that it is economical to execute it after every
design change, yet it must be comprehensive in order to find the bugs that
were introduced. Coverage enables us to find a relatively small set of
tests which is comprehensive in the sense that it covers the required metric
[5].
Besides these uses, coverage
provides other benefits to the testing process that are often overlooked.
One such benefit is the use of coverage or, more specifically, functional
coverage, to assist in defining testing requirements and specifications.
Another benefit of functional coverage is that it helps to achieve a better
understanding of the tested program during definition of the coverage models.
While the use of coverage
as an aid to the testing process has a lot of benefits, centering the testing
process around coverage has its own risks. A common misconception about
coverage is that the testing methodology should be to decide on an appropriate
coverage metric and then generate a set of tests that covers it. This is
not advisable for a number of reasons, the main one is that the tests created
to achieve coverage goals are usually very simple tests.
Another drawback of coverage
is that many coverage models are ill suited to deal with many common problems.
For example, control flow models, such as statement and branch coverage,
are ill suited to deal with missing code. If, for example, a case statement
should have six cases but, in practice, it has only four, statement or
branch coverage will not help you find it. One way to overcome this difficulty
is to use several coverage models, which are derived from different domains,
so that one model will cover the weaknesses of another model.
A different risk in using
coverage is setting low coverage goals. It has been shown that using coverage
to assess quality with a lower coverage target (50%-90%) is not useful
[12]. The reason is that the probability of having bugs in hard-to-cover
areas tends to be larger than the probability of bugs in well covered areas.
Therefore, it is better to use simpler coverage models with high coverage
goals than more complex models with lower coverage goals.
Code Based Coverage
Code based coverage, usually
just called coverage, is a technique that measures the execution of tests
against the source code of the program. For example, one can measure whether
all the statements of the program have been executed. The main uses of
program based coverage are assessing the quality of the testing, finding
missing requirements in the test plan and constructing regression suites.
A number of standards, as
well as internal company policies, require the testing program to achieve
some level of coverage, under some model. For example, one of the requirements
of the DOA standard [15] is 100% statement coverage.
Many coverage tools that
support all major programming languages exist. Every tool implements a
number of coverage models for a particular combination of operating system,
compiler and programming language. Most of them work by instrumenting the
source code and adding counters which can later be used by the tool's user
interface to show the status and progress of the coverage in some detail.
To apply such a tool, one typically has to recompile the software with
the tool and execute the tests. After the tests are executed, there is
usually some interface that highlights the parts of the program that were
not covered.
Almost all coverage tools
implement the statement and branch coverage models. Multi-condition coverage,
a model that checks that each part of a condition (e.g. A or B and C) had
impact, is also implemented by many tools. Fewer tools implement the more
complex models such as define-use, mutation, and path coverage variants
[6].
The main advantage of code
based coverage tools is their simplicity of use. The tools come ready for
the testing environment. No special preparations are needed in the programs
and understanding the feedback from the tool is straightforward. The main
disadvantage of code coverage tools is that the tools do not "understand"
the application domain. Therefore, it is very hard to tune the tools to
areas which the user thinks are of significant.
Functional Coverage
Unlike code based coverage,
where the execution of tests is measured against the program source code,
functional coverage focuses on the functionality of the program, and it
is used to check that every aspect of the functionality is tested. Therefore,
functional coverage is design and implementation specific, and is harder
to measure. Currently, functional coverage is mostly done manually.
Functional coverage is considered
by some to be black-box testing [6], since it involves models based on
the specifications of the application. We believe that functional coverage
is much more varied. Functional coverage models can be based on the specifications
of the application, but they can also be derived from the implementation.
Functional coverage models have many flavors. Models can cover the inputs
and outputs of the program or they can look at the internal state of the
program (e.g., values of variables). Functional coverage models can be
snapshot models, that look at the state of the program at a certain time,
or they can be temporal models that deal with scenarios. Usually, functional
coverage models involve looking at several properties in parallel. Our
experience shows that many bugs can be found only when a number of events
happen concurrently [1]. Therefore, covering each event on its own is not
sufficient. A simple example for a snapshot model is covering all the possible
values of the input parameters of a function. An example for a temporal
model is looking at the changes in the values of global variables between
consecutive activations of a function. Thread interleaving and synchronization
in a multi-threaded system is a source of many bugs. Therefore, a coverage
model that looks at all the reasons for thread switching is a good example
for a coverage model that is based on a bug model (A bug model is a set
of requirements for finding bugs of a type that have been uncovered before.)
General Guidelines for
Usage of Coverage
Coverage should not be
used if the resources used for it can be better spent elsewhere. This is
the case when the budget is very tight and there is not enough time to
even finish the test plan. In such a case, designing new tests is not useful
as not all the old tests will be run. Coverage should be used only if there
is a full commitment to make use of the data collected. Measuring coverage
in order to report coverage percentile is practically worthless. Coverage
points out parts of the application that have not been tested and guides
test generation to these parts. Moreover, it is very important to try to
reach full coverage or at least set high coverage goals, since many bugs
hide in hard-to-reach places.
Coverage is a very useful
criteria for test selection for regression suites. Whenever a small set
of tests is needed, the test suite should be selected so that it will cover
as many requirements or coverage tasks as possible.
When coverage and reviews
are used for the same project reviews can put less emphasis on things that
coverage is likely to find. For example, a review for dead code is unnecessary
if statement coverage is used, a review for boundary error in loops is
redundant if the appropriate mutation coverage model [8] is used, and manually
checking that some values of variable can be attained is not needed if
the appropriate functional coverage model is used.
Conclusions
In this article, we compared
functional coverage to code based coverage. We have shown that each has
its own merits and drawbacks. It has been our experience, as well as the
experience of anyone that we know has used coverage, that coverage is worth
doing. Almost anyone, and under any timing and budget consideration, can
benefit from some form of coverage. However, one has to commit to coverage
and use it properly. Functional coverage is a more powerful testing technique
then code based coverage, since it can focus on areas of concern, and contribute
to the design and verification processes in many more ways. On the other
hand, functional coverage is more complicated and requires more resources
than program based coverage. We therefore recommend that it should be reserved
to those parts of a program that are of special concern. Code based coverage,
on the other hand, can and should be uniformly applied to the entire application.
Acknowledgment
This article is part of
the paper Off-The-Shelf Vs. Custom Made Coverage Models,Which Is The
One for You?
by Shmuel Ur and Avi Ziv
which was published in STAR98.
References
[1] Y. Abarbanel-Vinov,
and S. Ur. Processor Bug Classification and Modeling, IBM's Haifa Research
Lab internal document, 1996.
[2] J. Baumgartner and
R. Raghavan. Method to compute test coverage in complex computer system
simulation. IBM Technical Disclosure Bulletin, 40(3):1-4, March 1997.
[3] B. Beizer. Software
Testing Techniques. Van Nostrand Reinhold, 1990.
[4] F. P. Brooks. The Mythical
Man-Month: Essays on Software Engineering, Addison-Wesley, 1995.
[5] E. Buchnik and S. Ur.
Compacting regression-suites on-the-fly. In Proceedings of the 4th Asia
Pacific Software Engineering Conference, pages 385-94, December 1997.
[6] S. Cornett. Software
Test Coverage Analysis, http://www.bullseye.com/webCoverage.html
[7] R. Grinwald, E. Harel,
M. Orgad, S. Ur, and A. Ziv. User defined coverage - a tool supported methodology
for design verification, to appear in Proceedings of the 35th Design Automation
Conference, June 1998.
[8] W.E. Howden. Weak mutation
testing and completeness of test sets, IEEE Transactions on Software Engineering,
8(4):371-379, July 1982.
[9] J.R. Horgan, S. London
and M.R. Lyu. Achieving software quality with testing coverage measures,
Computer, 27(9):60-69, September 1994.
[10] B. Marick. The Craft
of Software Testing, Subsystem Testing Including Object-Based and Object-Oriented
Testing. Prentice-Hall, 1995.
[11] C. Kaner. Software
negligence and testing coverage, In proceedings of STAR 96: the Fifth International
Conference, Software Testing, Analysis and Review, pages 299-327, June
1996.
[12] R. Stewart. Unit test
coverage as leading indicator of rework, EuroSTAR 97, November 1997.
[13] C-Cover - Test Coverage
Analyzer for C/C++, http://www.bullseye.com/webCcover.html
[14] DeepCover for Java,
http://www.rstcorp.com/DCJava.html
[15] Software test and
evaluation guidelines, Department of the Army, Pamphlet 73-7
|