5th International Conference

EuroSTAR '99, November 8 - 12, 1999, Barcelona, Spain


Risk Based Testing and Metrics


Risk Analysis Fundamentals and Metrics for software testing including a Financial Application case study


Ståle Amland Hulda Garborgsv. 2,

N-4020 STAVANGER, NORWAY


Phone: +47 51 58 05 87 Mobile: +47 905 28 930 FAX: +47 51 58 55 24


E-mail: stale@amland.no


Abstract

This paper provides an overview of risk analysis fundamentals, focusing on software testing with the key objectives of reducing the cost of the project test phase and reducing future potential production costs by optimising the test process. The phases of Risk Identification, Risk Strategy, Risk Assessment, Risk Mitigation (Reduction) and Risk Prediction are discussed. Of particular interest is the use of metrics to identify the probability and the consequences of individual risks (errors) if they occur, and to monitor test progress.


The body of this paper contains a case study of the system test stage of a project to develop a very flexible retail banking application with complex test requirements. The project required a methodology that would identify functions in their system where the consequence of a fault would be most costly (either to the vendor or to the vendor’s customers) and also a technique to identify those functions with the highest probability of faults.


A risk analysis was performed and the functions with the highest risk exposure, in terms of probability and cost, were identified. A risk based approach to testing was introduced, i.e. during testing resources would be focused in those areas representing the highest risk exposure. To support this approach, a well defined, but flexible, test organisation was developed.


The test process was strengthened and well-defined control procedures were introduced. The level of test documentation produced prior to test execution was kept to a minimum and as a result, more responsibility was passed to the individual performing the test. To support this approach, progress tracking metrics were essential to show the actual progress made and to calculate the resources required to complete the test activities.

  1. Introduction

    The risk based approach to testing is explained in six sections:


    1. Risk Analysis Fundamentals: Chapter 2 contains a brief introduction to risk analysis in general with particular focus on using risk analysis to improve the software test process.

    2. Metrics: Chapter 3 gives a basic introduction to the metrics recorded as part of the case study contained in this document.

    3. The Case: Chapter 4 is the first chapter of the case study. It explains the background of how the methodology was implemented in one particular project

    4. The Challenge: Chapters 5 and 6 further summarise what had to be done in the case project, why it should be done and how it should be done.

    5. The Risk Analysis: Chapter 7 explains how the probability and cost of a fault was identified. Further, it discuss how the risk exposure of a given function was calculated to identify the most important functions and used as an input into the test process.

    6. The Process and Organisation: Chapter 8 goes through the test process and discusses improvements made to the organisation and processes to support the risk based approach to testing in the case project.


    In addition, chapter 9 briefly discusses the importance of automated testing as part of a risk based approach. Some areas for further research and of general interest are listed in chapter 10.


  2. Risk Analysis fundamentals in software testing

    This chapter provides a high level overview of risk analysis fundamentals and is only intended to be a basic introduction to the topic. Each of the activities described in this chapter are expanded upon as part of the included case study.


    According to Webster’s New World Dictionary, risk is “the chance of injury, damage or loss; dangerous chance; hazard”.


    The objective of Risk Analysis is to identify potential problems that could affect the cost or outcome of the project.


    The objective of risk assessment is to take control over the potential problems before the problems control you, and remember: “prevention is always better than the cure”.


    The following figure shows the activities involved in risk analysis. Each activity will be further discussed below.

    image

    Test Plan


    Test Item Tree

    Risk Strategy


    Risk Identification


    Risk Assessment


    Testing, Inspection etc.



    Matrix: Cost and Probability

    Risk Mitigation


    Risk Reporting


    Test Metrics


    Risk Prediction


    Figure 1: Risk analysis activity model. This model is taken from Karolak’s book “Software Engineering Risk Management”, 1996 [6] with some additions made (the oval boxes) to show how this activity model fits in with the test process.

    1. Risk Identification

      The activity of identifying risk answers these questions:


      • Is there risk to this function or activity?

      • How can it be classified?


        Risk identification involves collecting information about the project and classifying it to determine the amount of potential risk in the test phase and in production (in the future).


        The risk could be related to system complexity (i.e. embedded systems or distributed systems), new technology or methodology involved that could cause problems, limited business knowledge or poor design and code quality.


    2. Risk Strategy

      Risk based strategizing and planning involves the identification and assessment of risks and the development of contingency plans for possible alternative project activity or the mitigation of all risks. These plans are then used to direct the management of risks during the software testing activities. It is therefore possible to define an appropriate level of testing per function based on the risk assessment of the function. This approach also allows for additional testing to be defined for functions that are critical or are identified as high risk as a result of testing (due to poor design, quality, documentation, etc.).


    3. Risk Assessment

      Assessing risks means determining the effects (including costs) of potential risks. Risk assessments involves asking questions such as: Is this a risk or not? How serious is the risk? What are the consequences? What is the likelihood of this risk happening? Decisions are made based on the risk being assessed. The decision(s) may be to mitigate, manage or ignore.


      The important things to identify (and quantify) are:


      • What indicators can be used to predict the probability of a failure?

        The important thing is to identify what is important to the quality of this function. This may include design quality (e.g. how many change requests had to be raised), program size, complexity, programmers skills etc.

      • What are the consequences if this particular function fails?

        Very often is it impossible to quantify this accurately, but the use of low-medium-high (1-2-3) may be good enough to rank the individual functions.


        By combining the consequence and the probability (from risk identification above) it should now be possible to rank the individual functions of a system. The ranking could be done based on “experience” or by empirical calculations. Examples of both are shown in the case study later in this paper.


    4. Risk Mitigation

      The activity of mitigating and avoiding risks is based on information gained from the previous activities of identifying, planning, and assessing risks. Risk mitigation/avoidance activities avoid risks or minimise their impact.


      The idea is to use inspection and/or focus testing on the critical functions to minimise the impact a failure in this function will have in production.

    5. Risk Reporting

      Risk reporting is based on information obtained from the previous topics (those of identifying, planning, assessing, and mitigating risks).


      Risk reporting is very often done in a standard graph like the following:


      image

      High

      1


      Probability

      2


      3

      4


      Low


      Low


      Consequence


      High

      Figure 2: Standard risk reporting - concentrate on those in the upper right corner!


      In the test phase it is important to monitor the number of errors found, number of errors per function, classification of errors, number of hours testing per error, number of hours in fixing per errors etc. The test metrics are discussed in detail in the case study later in this paper.


    6. Risk Prediction

      Risk prediction is derived form the previous activities of identifying, planning, assessing, mitigating, and reporting risks. Risk prediction involves forecasting risks using the history and knowledge of previously identified risks.


      During test execution it is important to monitor the quality of each individual function (number of errors found), and to add additional testing or even reject the function and send it back to development if the quality is unacceptable. This is an ongoing activity throughout the test phase.


  3. Metrics

    This chapter will give a very brief introduction to metrics used in this document. There are several reasons to use metrics, for instance:


  4. The Case

    The rest of this paper will discuss a case study using the risk based approach to software testing, relating the different activities to the activity model discussed in the previous chapter.


    1. The Application

      This paper is based on the system test stage of a project developing a retail banking application. The project included an upgrade of a Customer Information System being used by clients as a central customer, account and product database, and a complete reengineering of a Deposit Management System. The project scope included reengineering of the data model, technology change from IMS/DL1 to CICS/DB2, rewrite from JSP COBOL to COBOL-2 and a completely new physical design. During this rewrite large investments were done in productivity tools, design, quality assurance and testing.


      The project started in June 1994 and was delivered in October 1995. The project total was approximately 40 man years over 17 months. This paper documents experiences from the system test stage, which consumed approximately 22% of the total project resources.

      The applications consist of approximately 300 on-line transactions and 300 batch programs, a total of 730,000 SLOC1 and 187 dB2 tables. This is the server part only, no client-GUI was tested in this project.


    2. The Scope

      The system test stage included:


      1. Technical System Test, i.e. what is usually referred to as environment test and integration test. Due to differences between the development environment and the production environment, the system test stage had to test all programs in the production environment. During system test the test team had to do the integration test of the on-line system by testing and documenting all on-line interfaces (called modules). The team also had to perform the integration test of the batch system(s) by testing and documenting that all modules had been called and also testing the complete batch flow.

      2. Functional System Test, i.e. black box testing of all programs and modules to detect any discrepancies between the behaviour of the system and its specifications. The integration test verified that all modules had been called, and that the functional system test was designed based on application functionality.

      3. Non-functional System Test. The system test also tested the non-functional requirements, i.e. security, performance (volume- and stress-test), configuration (application consistency), backup and recovery procedures and documentation (system, operation and installation documentation).


      As for all projects, the time and resources were limited. At the beginning of construction (programming), the system test strategy was still not agreed upon. Since the development project was a very large project to the vendor and therefore consumed nearly all available resources, the number of people with experience available for test planning was limited.


      The final system test strategy for the system test was agreed approximately one month before end of construction, and the time for planning was extremely short. A traditional approach to system test planning based on test preparation done in parallel with design and construction, could therefore not be used.


      The following project stages were executed before the system test2:

      • Project Initiation - PI (organising the project, staffing and development environment)

      • Requirement Analysis - RA (documents the functional requirements to the application)

      • Logical Design - LD (data model and process model)

      • Physical Design - PD (program design - executed as part of construction)

      • Construction and Unit Test - CUT (programming and testing, including a 100% code coverage test)


  5. The Challenge

    Why did the vendor need a Risk Based Approach to the System Test?


    Because:

  6. The Strategy

    The project started with a Traditional Approach to testing, i.e. the test should be prepared with input and output as part of the design and construction stages, prior to system test start. However, it was obvious as time passed by and only limited resources were available to prepare the System Test, that this strategy was impossible to fulfil.


    The original system test strategy document (based on a traditional test approach), identified the following test structure for both on-line and batch testing:


    1. System Test Plan, documenting the test scope, environment and deliverables, test control procedures, test tools to be used, test schedule and phases, and listing start and stop criteria related to each phase.

    2. Test Specification, i.e. a detailed break down of the application into testable units.

    3. Test Cases, i.e. documentation of what to test, basically listing all requirements enabling a tester to easily read them.

    4. Test Scripts, i.e. documentation of how to test “step by step”, including test data to be used by the tester.


    Implementing a structure like the one above is very time consuming, especially step 4 - documenting test scripts.


    Midway through construction it became obvious that it was impossible to document everything before end of construction. Either the project would be delayed, or the test planning process had to be changed.


    The main problem at this stage was the preliminary system test strategy document delivered to the customer. How do you have the customer accept that you will not be able to document all tests prior to test execution as thoroughly as you originally intended to? By convincing him that the new process will improve the product quality!


    The key words became “Risk Based Approach” to testing. We agreed with the customer (reference to the risk activity model in chapter 2 is given in italic):


    1. The vendor will test all functionality in the application to “a minimum level” (in addition to all interfaces, and all non-functional tests). This will not be documented prior to the test, but logging of details for all tests (i.e. input, expected output and actual output), will after test execution, prove this “minimum level of testing” (Risk Strategy).

    2. All test cases (“what to test”) will be documented prior to test start and will be available for the customer to review (Risk Strategy).

    3. Only highly qualified testers, i.e. system analysts experienced in the application area, were to be utilised for testing, and the testers will be responsible for planning all “test shots”, including providing test data and documenting the executed tests. (Tools were available to the tester for documenting the tests) (Risk Strategy).

    4. The vendor will do a risk analysis together with the customer to identify those areas of highest risk, either to the customer or to the vendor (Risk Identification and Risk Assessment).

    5. Based on the Risk Analysis, the vendor will focus “extra testing” in those areas of highest risk (Risk Mitigation).

    6. “Extra testing” will be planned and performed by a specialist in the application area, that are not involved in the “minimum level of testing” (Risk Mitigation and Risk Reporting).


    The 6 bullet points above cover all activities from Risk Identification through Risk Reporting. How risk reporting was used as input to risk prediction is explained later.


    The customer approved the idea, and the vendor was ready to start. The project was now in a hurry and had to do the following:


    1. Complete the documentation of all test cases (what to test) for on-line and batch

    2. Perform the Risk Analysis for on-line and batch and plan any “extra testing”.

    3. Document the new risk based test process, including procedures, check lists, training the testers and preparing the test organisation.


  7. The Risk Analysis

(Risk Identification, Risk Strategy and Risk Assessment)

The risk analysis was performed prior to system test start, but was continuously updated during test execution. Separate analysis was performed for on-line and batch.


The vendor developed a model for calculating the risk exposure based on:



    1. The Control Procedures

      The project implemented separate procedures for control issues (i.e. changes to scope, process or schedule not related to system design) and change requests (i.e. changes related to system design and implementation). All projects will be affected by issues and change requests, but the risk based approach made this project even more vulnerable to changes. The reason is that the planning process had been limited and the detail test scripts were not prepared until test execution. Usually several faults are identified during the test planning phase, but by the risk based approach to testing, the planning phase is part of the test execution.


      All change requests had to be accepted by the product manager outside the system test team.

    2. Progress Tracking and Progress Indicators

      (Risk Reporting and Risk Prediction)

      The introduction of the risk based approach made it very important to document the test progress to the customer as well as the quality of the test. The quality of the test was documented as part of the output documentation from the test execution, including the complete listing of the test log from the test tool.


      The progress tracking was critical to have the customer believe in the product and to believe in the end date. The reporting included basically two elements:


      1. The number of tests started and completed according to plan.

      2. Indicators to show the load of faults found and corrected.


          1. On-line Progress Tracking

            The following graphs show the on-line tests started and completed. Because of the limited material prepared prior to test execution, the quality control of the test documentation prior to execution was very limited. This made the quality control (QC) and quality assurance (QA) processes during test execution even more critical. Therefore, the curve most interesting to the customer was “tests actually completed from QA” in the graph On-line Test Cases Completed.


            On-line Test Cases Started

            On-line Test Cases Completed



            image

            Planned


            Actual

            image

            Planned


            Executed


            Number of Test Cases

            QAed


            Number of Test Cases

            Date

            Date

            Figure 8: Progress Tracking. To the left is a graph showing planned and actual of test cases started. The graph to the right visualises test cases completed, showing planned complete, actually completed by tester (executed) and actually completed from QA.


          2. Batch Progress Tracking

            The batch process of ProDeposits is very complex. Approximately 300 batch programs constitute a daily batch run. A traditional approach to batch testing for The vendor would have been to set up one batch system and process day by day, fixing problems as they occurred.


            The approach was to run as many test runs as possible as early as possible, to identify problem areas (i.e. areas of high risk) and to focus the test in those areas.


            The consequence was that 3 sub-systems were set up, each with a batch cycle consisting of 12-15 batch runs,

            i.e. processing 12-15 periods. A period (point of time) is a single day, a week end, a month end, a year end etc. Each batch cycle would be processed at least 3 times for all three systems during the system test period.


            If possible, the complete batch cycle was completed, not waiting for fixes of faults identified in one batch run before continuing. The result was an early detection of problem areas and the possibility of focusing the test.


            Because of the planned strategy to run all batch cycles three times per system, the “number of tests started” did not make any sense in batch testing as it did in on-line. The result was that the plan for batch testing was

            calculated as the total number of batch tests (i.e. number of verifies5 to be executed), evenly spread over the total number of days for batch testing. As a result of this, the progress did not look good in the beginning when a lot of outstanding integration tests were executed. However, after a while the progress improved, and during the last few weeks the graphs showed a steady slope.


            Number of Batch Tests Verified


            image

            Number of Test Verified

            Executed


            Planned


            QAed


            Date


            Figure 9: Progress Tracking Batch. The plan was calculated based on number of verifies to be executed and number of days of testing. The graph also shows number of verifies actually executed and number of tests QAed by date.

            The graph above gave the customer a snap shot of the current situation. In addition we needed some indicators that provided the customer with good visibility of, and therefore confidence in, the test process.


          3. Progress Indicators

            We used two indicators, one related to the test process and one related to the fix process. The first one showed number of faults reported to the fix team and number of faults fixed (i.e. reported back to the test team). The second indicator showed number of faults reported back to the test team for re-test from the fixers and number of faults re-tested. The second indicator also included a graph showing number of fixes from the fix team being rejected by the testers as part of the re-test.


            On-line Faults To Be Fixed and Actually Fixed

            On-line Faults to be Re-tested, Actually Re-tested and Rejected


            image

            To be fixed


            image

            To be retested Act. retested


            Number of Faults

            Number of Faults

            Actually fixed

            Rejected


            Date


            Date

            Figure 10: Progress Indicators. The left graph shows the number of faults delivered to the fix team and number of faults fixed. The graph at the right shows the number of reported faults that have been fixed and returned to re-test (to be re-tested) and the number of faults actually re-tested. The lower curve is the number of fixes being rejected in re-test.


            image

            5 A verify document is a document with the expected result of a particular batch test. The document will list all results to look for after a test run. The number of expected results can vary depending of type of test. To “execute a verify” is to compare the actual result with the expected result after a test run.

            Similar graphs to the above were developed for batch faults.


          4. Estimated To Complete (ETC)

      The calculation of ETC for test projects is always complex, and even more complicated when the preparation work is as limited as in this project. Again, the need for indicators to predict the number of resources needed to meet the end date, was essential to the test approach chosen.


      We closely monitored the number of hours spent in testing and in fixing related to the number of faults identified and fixed. The following graphs were used for both, on-line and batch.


      On-line: Hours per Fault for Test and Fix

      Batch: Hours per Fault for Test and Fix


      image

      image

      Hours per Fault

      Fix


      Test


      Date


      Fix


      Date

      Hours per Fault

      Test

      Figure 11: Estimated to Complete. The number of hours testing per fault found and number of hours analysis / programming per fault fixed were used as indicators to calculate ETC.

      In addition to the number of hours per fault the following numbers were used for calculating the ETC for on- line:


      1. Number of faults found per on-line transaction (i.e. per on-line test case)

      2. Number of fixes being rejected (i.e. generating a new fault to be sent to fixing and re-test)

      3. Number of remaining on-line test cases


      By combining 1, 2 and 3 above, the remaining number of faults could be estimated, and by using the numbers from figure above, the total resource requirements could be estimated.

      For batch the calculation method was somewhat different. In addition to the number of hours per fault found, the numbers used were:


      1. Number of faults found per verify document.

      2. Number of fixes being rejected (i.e. generating a new fault to be sent to fixing and re-test)

      3. Number of verify documents being accepted out of total number of verify documents reviewed

      4. Number of verify documents still to be verified.


      The result graphs for on-line and batch are shown in the following figure. The rising curve is the accumulated hours spent and the falling is the calculated ETC over time. It took some weeks before the ETC-calculations were reliable, but they proved to be very accurate during the last few weeks. If more historical data could have gone in to the ETC calculation, a reliable result could have been provided at an earlier stage.

      On-line: Calculated Hours ETC and Actual Hours

      Batch: Calculated Hours ETC and Actual Hours



      image

      Hours ETC

      ETC


      Actual

      image

      Number of Hours

      ETC


      Actual


      Date

      Date


      Figure 12: Calculated ETC and Actual hours spent for on-line and batch.


  1. Automated Testing

    (Risk Strategy and Risk Mitigation)

    The project was committed to use automated regression testing by utilising the tool AutoTester from AutoTester Inc [1]. This proved to be a commitment very hard to fulfill. Originally the intention was to develop all AutoTester test scripts prior to test execution.


    Due to the changed test approach, the information required to develop AutoTester scripts was not available,

    i.e. the test data and the scripts would be provided by the tester during test execution.


    The Risk Based approach was based on each tester using AutoTester to log all test shots and to record test scripts for automated regression testing. This proved to be very complicated because:


    1. The tester had to think of regression testing all through test execution. This included to plan the test data, the sequence of transactions etc.

    2. All testers shared the same database. They could easily damage each others test data if not paying attention.


    The project used 25% of the total test resources for on-line, in automated regression testing. The regression test team managed to regression test 15% of all on-line transactions, and found 2.5% of all faults.

    The recommendation for the next similar project will be:


    1. Let the manual testers focus on doing manual tests, using a tool for documentation / recording without thinking automation. The result should be readable, not re-playable. The tester should be able to set up his own test data within his limits.

    2. Set up a separate database for automated regression testing.

    3. Select the “worst” transactions for automated regression testing.

    4. Identify a separate test team to focus on automated testing.

    5. Do not start recording for automated regression testing until the function is stable, i.e. most faults have been identified and fixed.

    6. Over time develop a “lifetime” test script for all transactions, i.e. an automated test script to be used as an installation test at customer’s site.


    For those wanting to start with or improve automated testing, I will strongly recommend a book being published this summer (1999) by Dorothy Graham and Mark Fewster "Automating Software Testing" [7].


  2. Further Research

    As a “test” the McCabe complexity, see McCabe, 1976 [3], was checked for a random list of the 15 on-line transactions with the highest number of faults identified and 15 on-line transactions with the lowest number of faults identified. The result showed that the McCabe complexity in average is 100% higher for those with a high number of faults than for those with a low number of faults.


    This material however, needs more investigation. Particularly interesting is the analysis of the function’s logical design to be able to identify functions with a potential of a large number of faults, at an early stage.


  3. Acknowledgement

    A lot of people have helped in making this document. AVENIR ASA, Norway has funded the work, and I thank Per Bakseter for being my sponsor. I will also thank Gro Bjerknes, Hans Schaefer, Bjørnar Evenshaug, Stephen Løken, Stein Onshus and John Curtin for providing valuable comments to the initial version of this paper. Special thanks to Bo Kähler, Ged Hawkins, Travers Sampson and Joy Hanson who gave me examples and corrections to the latest version of this document.


  4. References

No. Reference

  1. AutoTester, AutoTester Inc., 6688 North Central Expressway, Suite 600, Dallas, Texas 75206, Tel: + 1 800 328 1196

  2. Boris Beizer, “Software Testing Techniques” Second Edition, Van Nostrand Reinhold, 1990.

  3. McCabe, Initial paper on cyclomatic complexity definition, McCabe, T.J. 1976, A Complexity Measure, IEEE Trans. On SW Eng., Vol2, No. 4, Dec. 1976

  4. Systems Engineering, LBMS Europe (now part of Platinum Technology Inc.), 1815 South Meyers Road, Oakbrook Terace, IL 60181, USA).

  5. Øvstedal, E. Ø. and Stålhane, Tor “A goal oriented approach to software testing”, Reliability Engineering and System Safety. 1992 Elsevier Science Publishers Ltd., England

  6. Dale Walter Karolak, “Software Engineering Risk Management”, IEEE Computer Society Press, 1996.

  7. Dorothy Graham and Mark Fewster, "Automating Software Testing", 1999

  8. Norman E. Fenton & Shari Lawrence Pfleeger, "Software Metrics, a rigorous & practical approach", 2nd edition, International Thomson Computer Press, 1997.