Excerpt from Boris Beizer’s latest book

(c) Copyright 1999 by Software Research, Inc.
Reprinted by permission of the author.
                   (c) Copyright 1999 by Boris Beizer
                    bbeizer@sprintmail.com

Good testing practices can't be defined by simple rules of thumb or fixed formulas.  What is best in one circumstance can be worst in another.  Here for your consideration is a baker's dozen of  best and worst testing practices -- you'll notice that both columns feature the same practice.

1. Unit Testing to a 100% Coverage Standard

Testing at the lowest level (i.e., a unit is a piece of source code that does not include any called subroutines or functions) using a coverage tool; as a minimum, sufficient testing to assure that every source statement has been executed at least once under test.  In most practical cases, it is necessary to test to assure branch cover (each branch tested both TRUE and FALSE) and predicate cover (each term of a compound predicate tested both TRUE and FALSE).  Unit testing is done to find unit bugs, which is the second highest frequency kind of bug.

Best Practice

Testing to a 100% statement (also branch and predicate) coverage standard is a mandatory, minimum testing requirement because only by so doing can we assure that all the code has been tested at least once in its life.  We must assure that because if we don't we are guaranteed not to find the bugs that may be in the untested code.  Not only is this common sense, but it is a fundamental axiom of testing -- if you don't test it, you won't find the bugs in it.

Today, with widely available coverage tools for most popular languages, such testing is as easy as routinely using a spell-checker.

If you don't get the unit bugs out during unit testing you'll be finding and fixing them in system testing, or worse, in the field, where the cost will be orders of magnitude higher.

Worst Practice


Applying the unit test coverage standard to stable code that has not been modified and that had previously been tested to the standard -- especially code that is heavily in maintenance.  If neither the code nor that code's requirements have changed, then what can possibly be learned from re-running unit tests?

Attempting to apply the unit test standard to big components that are aggregates of many units -- e.g., 50,000+ lines of source code.  These are not "units" -- they are higher level components and the unit test standard does not apply.  Using the unit test standard as the sole testing criterion -- that is, stopping unit testing once coverage has been achieved.  Just because you've covered the code under test doesn't mean that it does anything useful, or that it even works.

2. Integration Testing

Testing the interfaces between otherwise correct components to assure that they are compatible.  Integration bugs are just behind unit bugs in frequency; and for object-oriented software, are probably more frequent.

Best Practice

When two or more components are combined to create a bigger component, a new, previously untested, interface comes into play.  Integration bugs concern incompatibilities across interfaces.  If the interface is not explicitly tested, then you won't find the bugs.  Integration testing, therefore, is not a single event, but takes place at every level of the build process.

Worst Practice

Confusing integration testing with testing something that's already been integrated.  If it's not concerned with testing  component interfaces, then no matter what you call it, it isn't integration testing as the term is defined.

A project plan with a step or phase labeled "integration testing means that it's unlikely that any integration testing whatsoever is being done.

3. System Testing

Testing an entire software system, end-to-end to discover common system bugs such as resource loss, synchronization and timing problems, and shared file conflicts.

Best Practice

Once we've found and fixed the low level bugs it's time to address real system bugs.  We do this because if we don't, our end-users will find these bugs for us.  Most system bugs, such as resource loss, timing, and synchronization bugs can be found by applying specific system testing methods.  There's no need to be victimized by such bugs if you've made a concerted effort to find them.  And there's also no excuse for them.

Worst Practice

Doing system testing without prior, meticulous, unit testing, lower level component testing, and integration testing at every stage of the build.   If you bypass unit testing, all you'll be doing is discovering low-level unit and integration bugs in the context of an expensive, end-to-end system testing so that you won't really have time to do real system tests.

4. Testing to Requirements

Testing from the users' perspective, typically end-to-end, to verify the operability of every feature.

Best Practice


We build software for users.  They're not (should not be) concerned with how the software works, how it's organized, or the cute programming tricks we used.  They're only concerned with how it  behaves and therefore, testing that behavior must be central to the testing effort.

This philosophy must be applied to every level because the "user" of an internal subroutine, say, is the programmer who calls that subroutine. Therefore, testing to requirements of  every level should be an objective at every testing level.

Worst Practice

Assuming that by "user" we mean only the end user.  In typical software, only 10%-15% of the code directly concerns things seen by the end user. The remaining 85% concerns infrastructure items such as resource management, protocols, data bases, etc.,  that the user doesn't know and doesn't want to know about.

Testing only to end-user perceived requirements is like inspecting a building based on the work done by the interior decorator at the expense of the foundations, girders, and plumbing.

5A. Test Execution Automation

Use of automated test drivers running from test scripts or test data files.

Best Practice

Manual testing is the pits -- not only that, it doesn't work very well because the test execution error rate is about 50% or worse.  Testing is not about monkeys pounding keys.  Today, with automated test drivers and capture/playback tools widely available, there's no difficulty in achieving a high degree of test execution automation -- in the high 95% range for most applications.

The biggest advantage to test execution automation can't be reaped unless you intend to run the tests many times.  However, most organizations under estimate the number of times tests will be run. While they think in terms of once or twice, the actual number is likelier to be closer to forty or fifty times over the life of the software.

If you accept test execution automation as a capital investment in testware, no less important than software, then you'll see a truly remarkable long-term return on investment that beats all other software development tools.

Worst Practice

Getting that last 5% can be a bear.  There's not much point in test execution automation if it means that you have to build key-pounding robots.  Some things are really difficult to automate: for example, judging print quality, graphics rendering, color fidelity, and sound quality.

Such inherently difficult areas aside, the biggest barrier to test execution automation is not planning the automation at the outset. If that happens, you may (unfortunately) be better off with manual testing.

The next most important source of test execution automation difficulty is not properly training your people in the use of these tools. If they haven't been properly trained and given the time and resources to master the tools, they'll actually be more productive doing it manually.

Finally, don't inject automation in the middle of a project and expect to get anything but chaos, lowered productivity, and reduced quality.

5B. Test Design Automation

Use of models and/or semi-formal requirements to automatically generate test cases.

Best Practice

This is where the action is on the leading edge of testing.  Instead of writing individual tests you create formal models such as finite-state machine models, regular expressions, domain specifications, constraint sets, or formal specifications. The tool then automatically generates a covering set of tests from the model. The focus of testing then shifts from designing individual test cases to creating models that faithfully express the requirements.

But you don't have to be on the leading edge to exploit test design automation.  The simplest method is to use a capture/playback system to create a base scenario.  Once you've got that base test case debugged you create the variants by simple editing of the test script.  It's an effective method because while the typical test case contains 250 keystrokes, the variations from case to case involve changes in only a dozen or so.  It isn't a new idea -- we were doing it 35 years ago with paper tape.

Worst Practice

If you're looking for excitement and trouble then introduce test design automation before you've got your tests under configuration management and have implemented test execution automation.  A good test design automation tool generates tests by the thousands and ten-thousands.

You can't run those tests by hand and there's no practical way to figure out which tests are worth running and which are not.  And if they're not under strict configuration management how will you know that what and why? Education, education, education.

If you're going to use sophisticated tools then your people must be trained in those tools and given enough time to internalize the methodologies behind the tools.  If you don't figure that into the equation, your attempt at test design automation will fare worse than manual testing.  You can't expect a person who's never flown an airplane to take off in an F-16 on the first day.

6. Stress Testing

Subjecting a software system to an unreasonable load while denying it the resources needed to process that load.

Best Practice

If you're trying to find true system bugs and have never subjected this software to a real stress test, then it is high time you started. Dollar for dollar, stress testing has probably the highest payoff of any test technique for finding system bugs.

Proper stress testing is useful in finding synchronization and timing bugs, interlock problems, priority problems, resource loss bugs, and general abuse of the API.

Stress testing is usually easy to set up.  You can create stress loads by looping transactions back on themselves so that the system stresses itself: e.g., an incoming message is output on a looped-back line to generate another incoming message.  Alternatively you can use another system of comparable size to create the stress load.

Worst Practice

Stress tests wear out faster than almost any other test technique.  The kinds of bugs found by stress testing are well-understood and limited.Once they've been found and fixed, the stress test becomes marginally effective.  Most organizations find that the stress test adds little or no value after the third or fourth run.  The worst practice, then is to continue with stress testing long after it has worn out -- everybody feels good because the test isn't finding any bugs.  In fact, if stress testing doesn't wear out, something is seriously wrong with your QA because nobody seems to be learning how to avoid these common system
bugs.

Another terrible stress testing practice is to attempt manual stress testing -- only the testers gets stressed.  It's another, worthless, feel-good testing method.

7. Regression Testing

 

More specifically, equivalency testing, that is, rerunning a suite of tests to assure that the current version behavesidentically to the previous version except in such areas as are known to have been changed.

Best Practice

To quote Robert Burns: "The best laid schemes of Mice and Men gang oft awry and leave us nought but grief and pain for promised joy."  It's not possible to keep a project perfectly aligned so that all events happen exactly as in the initial schedule.  Therefore, it's inevitable that some things will go into the build before all interfaces have been tested and some things will be integrated before lower level testing has been done.  Continual  regression testing at every stage of the build is cheap insurance against the bugs that slip through that way.

Worst Practice

Manual testing when surrounded by high-powered computers is patently silly.  Attempting manual regression testing is sillier still.  Your people will hate it and their test execution error rate will be so high that the entire effort is an exercise in feel-good self- deception and false confidence.  Besides which, almost nobody actually ever does it.

Not using a fully automated regression test (e.g., attempted manual regression testing) is such a worthless practice that I advise against any regression testing if the only way you can do it is manually.

8. Reliability Testing

Testing to determine the expected failure rate of software under a statistically specified user load (operational profile).

Best Practice

Reliability testing under a statistically valid user profile has been established for many applications as an effective method to determine when enough testing has been done to warrant using the product.  Among the primary areas where it has been validated is telecommunications and control software.  Where its applicability has been confirmed, it is an accepted part of the testing tool kit.  Publication of Lyu's Handbook of Software Reliability Engineering (McGraw-Hill, 1996) attests to that.


Worst Practice

If you don't have a valid user profile, you can't do this kind of testing (you can do it, but you won't get meaningful results).  Also, changes in your application can change user behavior and therefore the profile -- to the point where the entire test is invalidated.

Worst practice?  Being naive and expecting this to be done by statistically unsophisticated people who can't or won't do the math in order to discover which caveats do and don't apply in your case.

9. Performance Testing

Testing to determine the expected processing delay as a function of the applied load; also to determine resource utilization under load. Equivalently, testing to determine the maximum number of simultaneous users and/or transactions that the system can sustain.

Best Practice

If the cost of testing and analysis is far lower than the expected gain and if you have stable software and valid user profiles, then this is an important part of the toolkit.  Its use is widespread in telecommunications, embedded systems, and other system software.

Given the heavy capital investment in a performance testing laboratory staffed by experts, this kind of testing can forestall the worst kinds of performance problems met in the field.  If throughput issues are meaningful for your application, then you can't afford not to do this
testing.

Worst Practice

With today's cheap hardware, performance testing is rarely justified unless the production run is measured in hundreds of thousands of units. Alternatively, it can be justified for a big site or network where the cost issues are measured in millions of dollars.   In most cases, upgrading the hardware is cheaper than doing the performance test and the associated analysis.

Performance testing is for experts.  There's a lot of heavy mathematics and specialized testing expertise if you don't want to run the risk of getting impressive but meaningless numbers.  Worst practice?  Give the job to amateurs.

11. Independent Test Groups

A test group that reports along a different hierarchy than the development group.  True independence implies the ability to block a release if it is in the organization's best interest to do so.

Best Practice

The modern independent test group is a value-added group that brings special expertise to the test floor.  Because of specialized expertise and dedicated resources they are more effective at doing many kinds of testing than the typical development group would be.  Among the kinds of testing effectively done by a value-added independent test group are: network, configuration compatibility, usability, performance, security, acceptance, hardware/software integration, distributed processing, recovery, platform compatibility, third-party software.

Worst Practice

Using an independent test group to do: the testing that developers don't want to do or should do but can't; independent unit and component testing; as a safety net to catch any bugs that slip through the developers' testing; in the hopes of having more objective testing; testing in total ignorance of the underlying code; staffing the group with people who can't do programming; using the group as a dumping ground for failed programmers, underqualified workers, and undesirable employees.

12. Usability Testing

Testing the human/machine interface with respect to things such as screen and menu layouts, help features, instruction manuals, icon style and placement in order to confirm that such things are well thought out and that the system can be learned and used with a minimum of hassle.

Best Practice

Usability issues are worked out and tested on prototypes long before code is written.  Operational concepts are tested in a usability test laboratory staffed by trained observers (through a one-way mirror) who record actual hand and eye motions and other user actions in order to judge the effectiveness of the placement of menu items, screen appearances, etc.  Also, measure the frequency of references to help screens and paper manuals.  Saves money and embarrassment.

Worst Practice

Doing it after the software has been written when it's too late to make any substantive changes. Expecting amateurs not trained in human factors to understand the issues and to measure the responses correctly.


Not knowing how to set up experiments involving human psychology with the result that the subjects tell you what they think you want to hear rather than what's real.

13. Beta Testing

Testing done by (usually) unpaid, but representative, users  -- usually the final stage of testing prior to official release.

Best Practice

Using small, representative sample of your installed user base (e.g., under 1%) can be an effective way to wring out latent configuration sensitivity and performance bugs not previously found.  The sample must be representative of both high-end and low-end configurations and high-end and low-end users.

Worst Practice


Using beta testing instead of proper in-house testing.  We got away from that a long time ago.  If beta testing finds a lot of new bugs, then the software was released prematurely.   Beta testing "instead of" certainly isn't free and it isn't very cost-effective.  But watch out for the popular press and the public that still swears by the beta-testing myths.