Summary Software Testing
Summary Software Testing
Intro
1.1. Definitions
Testing Activities
2. Test Selection
2.1. Categories of Test Selection
2.2. Definitions
2.3. Requirements Based Testing
2.3.1.Cause Effect Graphs
2.3.2. Equivalence Partitioning
2.4. Control Flow Based Testing
2.4.1. Motivation
2.4.2. Control Flow Graph
2.4.3.Statement, Path, BranchCoverage
2.4.4. Condition Coverage
2.4.5. Loops
2.4.6. Conclusions about Coverage Criteria
2.4.7. Subsumption
2.5. Data Flow Based Testing
2.5.1. Motivation
2.5.2. Intraprocedural data flows
2.5.3. Interprocedural data flows
2.5.4. Conclusions about Data Flow Testing
3. Combinatorial Testing
3.1. Motivation
3.2. Latin Squares
3.3. Conclusions
4. Random and Statistical Testing
4.1. Motivation
4.2. Grounds
4.3. Weyuker and Jeng (fixed failure rates)
4.4. Gutjahr (failure rates as random variables)
4.5. Summary
4.6. Statistical Testing
4.6.1. Usage profiles in Software Reliability Engineering
4.6.2. Usage Profiles based on Markov chains
4.6.3. Conclusion
5. Model Based Testing
5.1. Motivation
5.2. Models
5.3. Scenarios
5.4. Selection Criteria
5.5. Test Case Generation
5.6. Cost Effectiveness
6. OO Testing
6.1. Differences of OO Software
6.2. Statebased Testing
6.3. Testing with Inheritance
6.4. Testing with polymorphism
6.5. Conclusions
7. From Failures to Tests and Faults
7.1 Motivation
7.2 Test extraction by reproducing program executions
7.2.1. StateBased Test Extraction
7.2.2. EventBased Text Extraction
7.3. Delta Debugging
7.4. Hit Spectra, Code Metrics, Code Churn
7.5. Alternatives: directly target faults
8. Assessment of the quality of testsuites
8.1. Motivation
8.1. Fault Injection
8.2. Mutation Testing
9. Concurrency Testing
10. Fault Models
1. Intro
1.1. Definitions
Goal of testing
Show functionality is implemented
and
detect failures
(while maintaining
reasonable
costs). (Usually done by comparing actual and intended behaviors.)
Test Cases
Definition:
Expected Output
→ a specification is needed
→ abstract: no exception occurs
Definition of a good test case:
Ability to detect likely failures with good cost effectiveness.
What Testing is not:
Improvement of quality → debugging
Fault localization
Types of Defects
Failure: Error: Fault:
Deviation of actual I/O Deviation of system’s actual state. Actual or hypothesized reason
behavior. (Intended state) for the deviation.
(Intended behavior)
Control flow and decisions covering control flow graph
Data flow covering data flow graph
Stochastic profiles Pick test cases at random (w.r.t. distribution)
Combinatorial Testing
E.g. Automotive Infotainment Network.
p
p
With values, we have n combinations.
n
parameters, each of which can take
Assuming that failures are a result of the interaction of only 2
(3,4,...)
parameters
.
➨ Pairwise (twise) interactions
Random (uniform) Testing
Any test selection criterion should be “better” than random testing!
Weyuker and Jeng
Partitionbased testing compared with random testing.
Metric: Find at least one failurecausing input.
Result: Can be the same, better or worse
Example: 100 input domain, 8 failurecausing, 2 tests:
➨ In general we don’t know apriori which blocks are problematic!
(This is why partitionbased testing has not been shown superior to other strategies)
Statistical Testing
RiskBased Testing : Risk = Likelihood * Expected Damage
Do most of the testing for the high risk parts of the system’s functionality
→ Purposely live with remaining faults!
ModelBased Testing
Test don’t describe the intended behavior of a system at all
→ Describe the intended behavior independently of a test (State charts, sequence diagrams)
Specify a test selection and have some magic machine generate tests and code
→ For robustness testing , describe just the environment?
Statistical testing is one instance
OO Software Testing
Encapsulation, inheritance, polymorphism
(Encapsulation Objects are state machines → modelbased testing!
Expensive: Testing a method with an argument of type T means, test it with
all subtypes of
T
as well
!
Test Case Extraction
In case that people just play around → Problem: Turn these executions into managed test
Fault Localization
Ideas to get from failure to the fault:
Minimizing input
Consider two failing tests (Check intersection of invoked methods)
Consider the states of a program (Infer relevant variables)
Consider frequency of changes in svn repository
Fault Injection
Assess the quality of a test suite or a test selection criterion, with a given fault model in mind,
we can inject the problem and see if the test suite finds it.
Mutation testing
e.g. fault model of simple syntactic problems (use “=” instead of “>=”)
Testing Activities
Planning
Organization
Documentation
Test Case Derivation
Execution
Monitoring
Evaluation
Different Levels of Testing
Unit tests
→ all called units are stubbed
Integration Tests
→ individual software modules are combined and tested as a group
System tests
→ testing conducted on a complete, integrated system to check the requirements are
met
Acceptance tests
→ determine whether or not a system satisfies the acceptance criteria (user needs,
requirements) → the user then decides if he accepts the system
2. Test Selection
2.1. Categories of Test Selection
Requirements Controlflow Data flowbased Statistical
based Testing based Testing Testing Testing
Loops
2.2. Definitions
Kinds of Testing
test without knowing about the internal structure of the
program
Black Box Testing
→ test based upon requirements/specification
→ functionality ony
test with respect to the internal structure
White Box Testing
→ e.g. testing of paths
Usability Testing determine how well people can use the system
Stress and Volume Testing intense testing to determine the stability
Robustness Testing Functionality is assumed to be correct
Performance Testing e.g. test response times
Security Testing test for flaws in the security mechanism
Reliability and recovery stability during an interval of time
testing determine how well the system recovers from crashes
Configuration and evaluate the application's compatibility with the
compatibility testing computing environment
Check if old tests and requirements can still be applied
Regression testing
to a changed system
Stubs simulate callees
Drivers simulate callers
Big Band Testing test all modules alone, then combine all
Incremental Testing add one module after another
TopDown
start testing with the top module
Only stubs are needed, no drivers
Requirements can be validated easily, early skeletal
version
BottomUp
start with leaf modules
only drivers needed (simple to write)
disadvantage: no early skeleton of the program
Random variable “In probability and statistics, a
random variable , aleatory
variable or stochastic
variable is a
variable
whose
value is subject to variations due to chance (i.e.
randomness, in a mathematical sense).” (wiki)
Stochastic process Family of random variables indexed by time
2.3.1.Cause Effect Graphs
a cause–effect graph is a directed graph
that maps a set of causes to a set of effects
causes = input condition of the program
effects = output or state transformation
translate naturallanguage specifications into a formal scheme / logical network
Use it when:
helpful in identifying all relevant situations
structure / reproducibility
also good for presentations / meetings since very visible
we don’t know about completeness
Helpful websites for decision table generation:
http://www.softwaretestinghelp.com/causeandeffectgraphtestcasewritingtechnique/
https://amybughunter.wordpress.com/2013/10/02/blackboxtestingwithcauseeffectgraphs
/
2.4.1. Motivation
Why use CFGbased Coverage Criteria?
Use it when:
Finding dead code
Intuitively: each line of code should be executed at least once. (e.g. as a specification
requirement .)
Dual use:
Aposteriori (measure existing test suite)
Apriori (use as selection criterion) + as stopping criterion
We want a structure / methodology how to derive test cases (Partition based
testing tells us to make partitions, but not how)
We want to measure some performance → easier to show management people
One could think that CFGbased coverage criteria will find all failures, because all
parts of the code are covered, BUT:
No CFGbased coverage criteria guarantees the revelation of all failures!
specification wrong
missing paths in the CFG
dependence on data values
2.4.2. Control Flow Graph
Graphical representation of all paths in the code
Nodes: program statements
Edges: transfers of control
Equivalence relation == one class includes all executions with identical paths
every edge is executed at
least once
covers all nodes every path is executed once
→ lower number of paths
# executed statements
k loops → infinite number of
# statements paths
loops with upper bound n →
n paths
Exhaustive Path testing
== composite conditions not
Exhaustive input testing taken into account
→
usefulness hard to prove
not realistic
there are
infeasible paths
(e.g. paths depending on
the same condition cannot
be executed alternately)
2.4.4. Condition Coverage
Advantage : structured way of deriving test cases (predictable)
Disadvantage : might be better or worse than choosing randomly, we don’t know
Literal
= fixed values, such as true/false, integers or characters, in this case: atomic boolean
expressions
Conditions consist of literals → e.g. a<b, a||b, a&&b
Condition Coverage Condition/Decision Coverage
each literal of a condition condition coverage + result must
evaluates once to true and once evaluate both to true and false
to false outcome relevant → subsumes
(a&&b)||c → minimum two cases branch coverage
(all zero, all 1)
outcome irrelevant → branch
coverage not subsumed
Modified Condition/Decision Coverage Multiple Condition Coverage
(MC/DC)
Counter Examples
all defs → all cuses / all puses
Three branchings with calculations in DFG
all cuses → all puses (& vice versa)
Only puses in program
all cuses → all cuses/some puses (& vice versa)
all/some is more test cases, because whenever there is a no cuse a puse will be
taken instead
the other way round it works, because all cuses covered by all/some are also
covered by all cuses
Data flow testing overcomes Almost no tool support
‘locality’ of decisionbased testing Requirements not taken into
Structured account
Measurable Missing defuses cannot be tested
Might be expensive
Agreement : Use as complement, not alone!
Underlying Fault Model:
failures that materialize as erroneous def/use pairs
3. Combinatorial Testing
3.1. Motivation
Example: Automotive Infotainment, multiple components
All combinations are a lot of combinations → combinatorial testing to reduce the
number of test cases
Fault Model: failures are a result of combinations of n components, not all
p
p
With values, we have n combinations.
n
parameters, each of which can take
Assuming that failures are a result of the interaction of only 2 (3,4,...) parameters.
➨ Pairwise (twise) interactions
3.3. Conclusions
Fault models
sneak into requirements based testing in special cases
Combinatorial Testing relies on a specific fault model
CAUTION!
→ No guarantee that this works/ the fault model holds!
4. Random and Statistical Testing
4.1. Motivation
Random Testing is considered the “gold standard” for testing, everything is compared
to it
How can we prove a method is ‘better’ than random testing?
Maximizations of costbenefit ratio → test what is used more often (usage profiles)
4.2. Grounds
Definitions:
Programs → P
/ input domain → D d
of the size /
m
points → produce incorrect output /
n → test cases (or test input data)
→ In general, d >> m
m
Failure rate Θ = |D i| =
i
Is the probability that a failurecausing input will be selected as a test case.
// Es folgt wieder der vergleich RT vs PT mit better, worse, or the same durch Pr(p) = Pr(r) /
Pr(p) > Pr(r) / Pr(p) < Pr(r ) → Warum kommt das da vor? //
4.5. Summary
Weyuker & Jeng
Random Testing can be better, worse or the same as partition testing
In special cases, partition testing is better: equal failure rates and equal
distribution of failures
Gutjahr
deterministic assumptions on failure rates favor random testing
if failure rates are considered
random variables , partition testing is favored
(assumption: equal expected failure rates and one test per block )
Different measures for ‘better’
At least on failure
# of faults detected
Weighted + of faults detected
Is random testing worthless? No, because:
Assumption: Identical expected failure rates
Uniform distributions of test selection from the input domain
It’s cheap, but there is no oracle
“Dear Gutjahr how do you know that the expected failure rates are equal?”
We don’t know. But we assume that they are. If they weren’t we would pick
tests from the domain that has the highest failure rate right? :P (# Gurjahr
sending trollface)
This is the weakest point in Gutjahrs Theory btw.
4.6. Statistical Testing
Goal:
maximize the costbenefit ratio → test most frequent interactions first/more
(i.e. the app should not crash at the login screen…)
80% of users use 20% of the functionality
Two concepts: frequency and severity of failures
Stopping criteria: should relate to reliability of software
Reliability: probability of failure free functioning for a given time
→ FoneFollower Example
Musa advises 23 tests / KLOC in order to get a reliable program, and 2033 tests /
KLOC in order to get a high reliability program. (Note that these datas are old.)
Number of test cases = (Available time * Available staff)/ average time for
preparation of 1 test case
Number of new tests = N/(N1)*T
where N is the occurrence probabilities of new operations and T is the total
cumulative number of test cases from all previous versions
e.g. if a new
component is used 0.2 then 0.2/0.8*Number of test cases.
Between base product and variations:
U (Unique new Uses) = F (expected fraction of field use) * S (Sum of
NEW occurrence probabilities)
Test cases are assigned in proportion with the “U” value of each
version.
To test crucial elements (severity)
FIO (Failure Intensity Objective): # of failures per time unit or operations
Acceleration factor (A) = FIO(system)/FIO(Operation)
Test cases for critical tasks: A*Occurrence proportion * #total test cases
4.6.3. Conclusion
Motivation
use actual usage data → test most frequent interactions
take into account critical operations
optimize costbenefit ratio
Good when…
...an earlier version of the system exists, and thus we have an operational profile ?!
Difference between Usage Profiles / Markov chain based usage profiles
Granularity ( attention to detail
→ markov chains are more detailed I guess...)
Probability of a function easier to define than the one of a transition
Use it when:
Boss tells you that he doesn’t want the program crashing while commercial people
use it. (i.e. the 20% of the program that is used by 80% of the people should run
flawless).
When severity is not crucial. (Nothing is going to happen if my POM app crashes.)
You have a former “base” product from which you can get the statistics. If not, you
can still derive them from simulation, expectancy, measurements.
DON’T Use it when: :
When severity is critical: Nuclear reactor (Go with codereading then)
5. Model Based Testing
5.1. Motivation
Oracle Problem
Automatically derive tests that include finegranular expected output
information (That’s our main motive!)
Specifications tend to be bad
→ derive test cases from an abstract model
5.2. Models
Abstraction
Model needs to be more abstract than the system
→ information is lost
2 kinds of abstraction:
Encapsulation (e.g. libraries) → encapsulate complexity
Subroutines, Garbage collector etc.
Simplification / Omission → omit details → information cannot be reinserted
Abstraction is needed to validate the model → cannot be too complex
If the model is as precise as the SUT → directly validate SUT
→ You have the same work, that’s not cost effective!
If you design 100% SUT (but 0% Environment) you
have to test for all possible inputs. (In a case of a car
this would mean that you have a VERY precise model
(btw unnecessarily precise once again with 100% SUT
we didn’t make any abstraction), but you are going to
test this model from 01000km/h speed. (Since you
have no clue of the environment) Doesn’t make sense. Also the reason for modelling
is to do abstractions and not to copy the real world.)
If you do 100% environment then you don’t have any clue about the system, so the
best thing you can do is robustness testing. (e.g. You apply all the inputs and see
whether the system crashes.)
Models can be described as Mealy Machines (which are state machines with input /
output!)
5.3. Scenarios
One model for both, test cases and code generation
generated from models, e.g. Matlab/Simulink
Use it when:
Code generators
Assumption on the environment
Performance
Exceptions
Cons :
no redundancy → no verification
BUT: not useful to generate code and test cases at the same time! For test
case generation you need a separate model! → no redundancy!
(so basically same as the first Con?!)
Two Models
One model for test cases, one for code
Cons
Expensive → possible solution: split between OEM and Zulieferer
Also when there are changes in the requirements ( Double expense to modify
both test and code model! → However if specification is solid (Automotive
industry etc. then it probably won’t occur.))
if there is a fault you need to find out which model was faulty
Pro:
Redundancy → need to make sure the models are different
Different levels of abstraction possible
Use it when:
Specification is VERY exact. (i.e. car manufacturer(System model)
/suppliers(Test model )
Model for test only
Pro:
Redundancy
Con:
(Expensive) → we don’t know if it’s cheaper
Code and model need to be kept in sync (interleaving) → hard!
Model has to be tested itself
Specification doesn’t profit from model based testing.
Hard when the requirements are changing.
Use it when:
Conformance tests: OEM builds the model, suppliers must show adherence to
model / conform to the model
Scenario of our running chip card example
Model Extraction from Code
Pro :
May make sense if you extract the model manually
Cons :
While having automatic generation there is NO redundancy.
Use it when:
Expost development of test cases only
(exception/no exception possible ?)
(Also useful if there are no requirements documents ?)
6. OO Testing
6.1. Differences of OO Software
Inheritance
Object/Class based on another Object/Class using the same implementation
Mechanism for code reuse
Polymorphism
Single Interface for different types
Dynamic Binding
mechanism in which the method being called upon an object is looked up by
name at runtime
Encapsulation and state , sequencing of messages
Method a sets flag f, method b uses f (Branch coverage may or may not
provoke the failure
Example to do sneak path testing. You check from every state if the method leads to a
defined state. (PSP if it is not defined from the give state, (?) if conditional, and (OK) if it is
defined.
6.5. Conclusions
Special problems induced by special language constructs
Inheritance & Polymorphism
Encapsulation
TradeOff between benefits of inheritance/polymorphism on one side and cost of
testing on the other
State based testing reflects discussions on model based testing
Flattening Classes
Good design helps avoid errors → Contracts about design practices!
6.5. Examples
Mike has written a C++ program with fantastic object orientation and
encapsulations. Mike is however a very unexperienced programmer, and now
struggles finding the faults in his program. What would you suggest?
Flatten the program / state machine to see the interactions better
Write test cases that specifically target fault models (Naughty children, incorrect
initialization, Inadvertent binding etc.)
7.1 Motivation
Are there further ways to extract tests?
How to get from failures to faults? → Fault Localization
Stack consists of frames
frames → method invocations
frames contain:
values of arguments
functionlocal variables
return address
…
elements of a frame can reference a heap address
Heap Dynamically allocated memory
contains data
When to use:
When you are interested in generating test cases that trigger faults.
When there are no external events (or any stochastic crap)
Conclusions
Applicability depends on circumstances ( serial input is good)
Scalability difficult → “Still requires (2^abs(c))2 test case! We don’t fancy
exponential stuff...
How to get around this problem: Trying to add pieces as long as we still don’t cause error.
Hopefully at the end we get a minimal difference between failing and successful run. This is
not the fault itself, but may lead the programmer to the fault.
Abstractions
of program Complexity → Defects Amount of Change →
runs Defects
Similarity
many different measures for
similarity
8. Assessment of the quality of test-suites
8.1. Motivation
The
quality of a test suite
is difficult to assess
Definition of a good test suite? Problem:
First Failure, All failures, severe failures
Cost
Cost of debugging, fault localization
...
Defect Model
Fault Model Failure Model
Exact faulttype in code e.g. Limit Testing
e.g. division by zero Try to find methods to cause failures.
No fault localization
From the Exercise Quoting Dominik:
Fault Model:
2 Step Process
1st Step (if you want to describe a fault): Transformation from a fault free to
a faulty program. You wanted <= but got =. You want to try catch, but didn’t
get it. So it is crashing your system.
What does this transformation do to my program? (in case of x<5 or x<=5 the
path for 5 changed!) → You come up with input space partitions that
reflect this behaviour.
Failure Model:
You DON’T need the transformation, you only need the input space partitioning,
because just from it, you can create test cases that target potential faults with
good cost effectiveness. e.g. Someone tells you that block 7 is most probably going
to cause failures. What do I do? Create a partition which are full of test cases just for
block seven! = And now back to Wuki/Jang I have created a partition where I have
concentrated the failure causing elements, so the partition is likely failure causing!
In Conclusion: At the failure model someone tells you that Jon is a full retard and probably
fucked up every single line of code he wrote. In this case you know or have a
strong
hypothesis where the potential faults are so you come up with a test suite that targets
directly Jon’s messy code. However, at fault models you just see that instead of “chocolate”
the program writes “strawberries”. As a first step you think about the possible
transformations that could affect this output. (e.g. you look at the data flow of this faulty
output and see what parts of the code affect the output and how. As a second step you
come up ( probably with more than 1) test groups that try to locate the mistake in the
code. (e.g. 1 partition aims for the limits, the other for typos whatever, the thing is (at least
as I see) here you are not sure where the faults are, you are only trying to come up with
tests that are going to trigger the failures.
Intuitive Definition of a Fault Model: understanding of specific things that can go wrong in
a program
Fault Model
1. Consider a class of programs written to a specific specification
→ There are some that satisfy that specification, some not
2. Fault = Syntactic difference between a correct and incorrect
program
→ Transformation from a correct to a faulty program possible by
fault injection
3. Classes of Faults
→ Consider all classes with a fault of class K (e.g. division by zero)
4. Consider all programs, that are derived from programs that satisfy the specification by
injecting a fault of class K
5. Approximate fault models
→ we can only make an approximation of these programs (otherwise we would already
know exactly where the failures are)
6. General Idea: Partition the input space according to mapping or heuristic with empirical
evidence to reveal a fault
→ Create Partitions with a potentially high failure rate (cf. ideas of Gutjahr and
Weyuker&Jeng: Partition based testing is better than random testing if there is an underlying
defect model!)
Good Fault Model:
Large number of specifications for which induced failure domains largely overlap with actual
failure domains.
→ Try to approximate a failure domain, that is close to the program's actual faults
→ In other words: Using a fault model can create partitions that are likely failure causing
Failure Model:
No transformation! Only input space partition. E.g. limit testing or combinatorial testing.
List of Fault Models
Boundary Problems → Limit testing (there is empirical evidence)
Division by zero
Minor syntactical problems
Sneak Paths
Stuckatone
Combinatorial Testing
Atomicity Violations
Naughty Children (OO)
Some things make sense, even though there is no defect model!
→ Requirements based testing
→ Severity Testing (e.g. nuclear power plant shutdown could result in huge damage/cost)
Glossary
Code Churn Measure of activity in the repository
Testability degree to which a software artifact supports testing in a given
test context
high testability → finding faults in the system is easier
cannot be measured directly (extrinsic)
lower testability → increased test effort
Scalability algorithm works nicely on small dataset, but not on a big
dataset → time/memory grow exponentially
Software Artifact Code, module, method
Questions
General Questions
1) Definition of a test case?
(Test input → Expected output) + environment conditions
1.1) What makes a test case a good test case?
Ability to detect likely failures with good costeffectiveness
2) When to stop testing?
When one of the “coverage criteria” is met Useful if there is a relationship with
failure detection.
When a specific number of failures was detected Great but requires good
estimates
(Stopping, selection and assessment criteria are the same thing!)
3) Does it make sense to use failure detection as a comparison criterion for different types of
testing?
no, because we are interested in faults not in failures
4) Why is every partitioning scheme problematic?
because we don’t know anything about the underlying fault model, therefore it is
difficult to create revealing subdomains
5) Is requirements based testing a variant of partition based testing?
yes, because each requirement leads to a set of paths through the program, from
the set of paths, the input blocks can be computed
6) Is it a good idea to use requirements based testing?
no, because it is no defect model (if you consider it a variant of partition based
testing)
yes, because the requirements define the most important/ most used parts of the
code
problem: requirements change, they might be incomplete
Questions: Combinatorial based testing
Would you use / You are an CTO of pornhub.com would you use CBT?
Yes, if you know there are problems between some Components!
Questions: Random based Testing
1) What are the Problems with random testing?
→ input is generated, expected outcome is unclear
→ implicit oracle: system does not crash
2) Should you do random testing?
Problem: even though random testing is effective in detecting failures, debugging is
extremely difficult
Pros: cheap (in generating test cases), no information about an underlying defect
model needed
Questions: Control Flow Based Testing
1) Can MC/DC find all faults?
→ No, not all possibilities are tested
2) Why make MC/DC coverage sense?
We see the influence of one literal
Programmers make often failures at boolean condition (AND | OR | NOR)
Partition based testing:
What was the
assumption of Wuyeker and Jeng?
That the failure rate is a fixed value.
This usually does not hold, since a) not fix b) we don’t know in
advance
Questions: Statistical Testing
1) Does it make sense to use statistical testing?
→ depends on if you have usage data
→ doesn’t help for security testing (security scenarios such as shutdown happen
rarely)
Assignment 1: Discuss the approach of testing based on usage profiles for libraries. What
strategy would you use to test libraries?
It doesn’t make sense. But it depends on the scenario. How would you test java’s UI
library? How people use the certain elements (i.e. who uses buttons for what)
depends on the context, even if you have statistics. When you know the context –
you know what and how people are going to use you can. BUT when it is a general
library (i.e. a User Interface) it doesn’t make sense, since you usually don’t know the
context they use the functions.
Suggestion: Use a different strategy – but which?!
“
As the developer of a program
depending on the library you can use usage profiles.
DAFAQ? You tell me not to use it and then suggest to use it? ??? :O
Questions: Models
1) Do coverage based criteria for models make sense?
YES if you want to test the safety critical part of your System
→ Create a Model of it and test it!
It depends on whether or not the model is conform to some defect model
2) Would you use model based testing?
It depends on:
building a model helps the developers understand their systems
better communication between customer and developer
money/time for building a model
if you have a small important part of a system → create a model only for that part
Model can be adapted to new versions, problematic: for new functionality you need a
new model
3) Why do you need different levels of abstraction?
The more you can abstract away in the model, the higher is the cost saving.
The level of abstraction of the test model needs to be higher than that of the SUT. Because
otherwise validating the test model has as much effort as validating the whole system.
This works very vell, if you test a certain aspect of your system, e.g. Safety.
Difference in abstraction (Black Box and W Box) (From slides)
►Black box testing relies on the specification
►The box is abstraction by omission
►Black box allows abstracted inputs and expected outputs
►White box testing shows all the details
Questions: OO Testing
1) Would you test a program written in C similar as one in Java?
→ No
→ Many classes/objects are similar to state machines
→ Specific defect models for OO
2) How would you build the control states for a program?
→ CFG is the projection of code on the program counter
3) Does it make sense to represent an object as a state machine?
→ Depends
→ Stack: No, might not be the most suitable way
→ Network Controller: yes, very adequate
→ Traffic Light: yes
Questions: From failures to tests and faults
1) Would you apply code reading?
→ Depends on time and money: Should combinated with other tests (not replaced)
→ Really effective if you have professional programmers.
→ Don’t use it to assessment of employees
→ Different methods find different faults AND find different faults as testing!
Review: informel Code reading of team members
Walkthrough: Code review as a kind of informal meeting
Checklist (Inspection): Meeting with define roles and activities
2) Would you use inspection?
→ depends on time, process, quality requirement
→ Tests can be repeated but code reading not
Questions: Fault models
1) What is a fault model?
→ A fault model is an abstract model of something that could go wrong
Answers for exercise 2:
A set/partition includes all possible values for all input parameters.
The three rules for creating a partition:
1) No empty set → no block is empty
2) unique values → No value occurs in two blocks
3) The sum of all blocks is the hole set
Assignment 2:
1. Explain the idea behind partitionbased testing?
Dividing the input domain of a system into blocks, and then selecting random
test cases from each block.
2. What is limit testing and when have it a high portability to triggering failures?
testing programs at their limits, e.g. boundaries of loops, max/min inputs
The Fault model: a lot of faults occur at loop boundaries, indices and so on
3. How are coveragebased and partitionbased testing related?
Partitionbased testing includes coveragebased testing for instance we check
both path in a ifelsestatement → The blocks made a way through the
Program
What exactly shows Coverage Criteria?
If you have just 80% coverage there is maybe Deadcode or a problem you don’t have
figure out jet.
Would you use as a CEO of PornHub.com with 20 Million buget Coverage Criteria?
DEPENDS!!!
No, because we have no clough clue if it’s better than RT! (Since Coverage
Criteria doesn’t partition the input space based on a defect model, according
to Jeng/Wuki you don’t know if it is the same, better or worse than random
testing.)
But if we have to meet norms and standards it’s good. / you have to do it.
Which Testing strategy to use?
Cause Effect Graphs
Use it when:
helpful in identifying all relevant situations
structure / reproducibility
also good for presentations / meetings since very visible
we don’t know about completeness
Coverage Criteria
Use it when:
Finding dead code
Intuitively: each line of code should be executed at least once. (e.g. as a
specification requirement .)
Dual use:
Aposteriori (measure existing test suite)
Apriori (use as selection criterion) + as stopping criterion
Problematic when:
in most programming languages,
evaluation of code is different
Dependencies
Data Flow Testing
Use it when:
Interested in flow of data / dependencies want to see the “scope” of each
variable / definition. (i.e. how far does the given definition go, what is messed
up by changing the value of the variable)
Latin square stuff:
Use it when:
You have a fault model that tells you that the fault is caused by combination of
2, 3 parameters which can have limited values (max 34 values)
Partition based testing
Use it when:
You can endure that certain partitions will have outstanding failure rate. (i.e.
you can concentrate failure causing tests in a certain block)
Statistical Testing:
Use it when:
Dumb commercial programs (Android games/apps) Boss tells you that he doesn’t
want the program crashing while commercial people use it. (i.e. the 20% of the
program that is used by 80% of the people should run flawless).
When severity is not crucial. (Nothing is going to happen if my POM app crashes.)
You have former existing profiles . If not, you can still derive them from simulation,
expectancy, measurements but then it comes down to money whether it is worth it.
DON’T Use it when: :
When severity is critical: Nuclear reactor (Go with codereading then)
Model based Testing
One model for both, test cases and code generation
generated from models, e.g. Matlab/Simulink
Use it when:
Code generators
Assumption on the environment
Performance
Exceptions
Cons
:
no redundancy → no verification
BUT: not useful to generate code and test cases at the same time! For test
case generation you need a separate model! → no redundancy!
(so basically same as the first Con?!)
Two Models
One model for test cases, one for code
Cons
Expensive → possible solution: split between OEM and Zulieferer
Also when there are changes in the requirements ( Double expense to modify
both test and code model! → However if specification is solid (Automotive
industry etc. then it probably won’t occur.))
if there is a fault you need to find out which model was faulty
Pro:
Redundancy → need to make sure the models are different
Different levels of abstraction possible
Use it when:
Specification is VERY exact. (i.e. car manufacturer(System model)
/suppliers(Test model )
Model for test only
Pro:
Redundancy
Con:
(Expensive) → we don’t know if it’s cheaper
Code and model need to be kept in sync (interleaving) →
hard
!
Model has to be tested itself
Specification doesn’t profit from model based testing.
Hard when the requirements are changing.
Use it when:
Conformance tests: OEM builds the model, suppliers must show adherence to
model / conform to the model
Scenario of our running chip card example
Model Extraction from Code
Pro
:
May make sense if you extract the model manually
Cons :
While having automatic generation there is NO redundancy.
Use it when:
Expost development of test cases only
(exception/no exception possible ?)
(Also useful if there are no requirements documents ?)
Test case extraction
Copying the stack
When you don’t have external events, thus you simply press “start” on the
program and it runs autonomously, purely based on the code.
Event based
Use it when you have external events such as user interactions.
Shallow copy / deep copy
If you are really interested in the line / execution step where the program went
off the rails, then use deep copy.
In order to reproduce the failure
only the shallow copy is okay (90%) of time.
Test case minimization
Delta debugging
Use it when the test case is relatively small. Since the model has the number
of characters in the exponent, it gets nasty when you have a long string.
Fault localization
Code Churn is probably the best to estimate the number of failures.
Hit spectra / Code Metrics very hard to chose the best method apriori. Not really
useful.
Code Reading
usually the most efficient way to actually localize faults.
Especially used when you have a high severity program (NASA, Nuclear
reactor, and their friends)
Fault injection
Andrews’05 paper says there is a correlation between the kill rate of a test suite and
the failure detection
Pretschner however is on another opinion.
Concurrency testing strategies
Sleep
Logic based
Actual scheduler
I think rather focus on the actual scheduler or the logic based. Sleep is not
exact especially at low numbers fast executions. (e.g. you say sleep(50)
and the program sleeps 150+ ms. )
FORMER EXAM QUESTIONS
Macht es Sinn, Path Coverage etc. zu benutzen um Test Cases zu generieren?
Answer 1:> Nein, da man eventuell dann den falschen Parameter optimiert. Man kann
100% Coverage erreichen ohne alle Fehler zu finden. Evtl. kommt man auch nie auf 100%
wegen Dead Code.
Answer2: > Depends ^.^ But normally not, since any coverage criteria is not based on a
fault model, thus the partitions (with regard to finding faults) are not going to be any better
than random testing. If you have the time and money you can do (but just for fun :))
What is Mutation Testing? What is tested? Is it possible to make a syntactical change
and the output stays the same?
Mutation Testing is automatically inserting small syntactical faults, thus creating a lot of
mutant programs with k mutations. The test suite is assessed by measuring #killed
mutants/#nonequivalent mutants.
In reality, you cannot know which mutants are equivalent → the value is useless
You can make a syntactical change, that will not reflect in the outcome: inserting the change
into dead code, the change has no effect on the code (it might be directly overwritten)
Welche Arten von Model Driven Testing gibt es?
Answer1: Symbolic Execution, Model Checking, Theorem Proving, Search Algorithms.
Answer2: Dunno. What does he refer by “MDT”? : Abstraction(Encapsulation / Omission), or
the 4 Models?
Wenn im Code eine Stelle geändert wird, welche Methode könnte man benutzen um
die Test Cases zu bestimmen, die davon betroffen sind?
Answer: Hit Spectra, Coverage
Ist Use Case Testing sinnvoll? (=operational profile stuff)
Answer 1: Ja, weil requirements based Testing immer sinnvoll ist.
Was ist ein Fault Model?
Answer:
Hypothesis for the reason of failure/fault. Things that can go wrong, and
. A fault model is good, when the
usually go wrong induced failure domains largely
overlap with actual failure domains. (“Intuitively a fault model is the understanding of
“specific things that can go wrong” when writing a program. In a first approximation, we
define fault models to be descriptions of the differences with a correct programs that
contain instances of fault class K
. … A fault model for class K therefore is a description of
the computation of Alphak or a direct description of the failure domains induced by Alphak,
Fi(Alphak,s) for all s E S.
Welche Möglichkeiten gibt es die Anzahl der Test Cases für eine Funktion mit 3
Parametern zu reduzieren?
Answer: Combinatorial Testing? → Latin squares.
Wie kann man parallele Systeme testen? Was ist dabei schwierig? (=Concurrency)
Answer1: Alle Schedules/Wichtige Schedules. Probleme bei concurrency: deadlock, livelock,
atomicity violation, order violation. Schedule testen durch: Scheduler verwenden,
Sleep,
Ticks, Events Plus info:
Transactional Memory (ca. 2040% of these problems) can be fixed.
(“Findings V”)
Inheritance, Polymorphismus, Flattening (see slides)
What is the Coupling Hypothesis?
Small syntactical faults correlate with complex failures.
Test Cases that find small faults also find complex ones.
Was ist Big Bang Testing? Sollte man es benutzen?
Answer: Integration Testing, alle Komponenten auf einmal. Schlecht, da mann dann schwer
Fault Localization machen kann (besser wäre incremental oder combinatorial testing).
http://istqbexamcertification.com/whatisbigbangintegrationtesting/
Cyclomatische Komplexität
> Code Metric, wenn sie hoch ist, dann ist das Programm sehr komplex und fehleranfällig.
“Cyclomatic complexity is a software metric (measurement), used to indicate the complexity
of a program. It is a quantitative measure of the number of linearly independent paths
through a program's source code. “
PUses/CUses