0% found this document useful (0 votes)

41 views6 pages

When A Failing Test Might Be OK - Random Tech Thoughts

Uploaded by

coneac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views6 pages

When A Failing Test Might Be OK - Random Tech Thoughts

Uploaded by

coneac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

When a failing test might be OK – Random Tech Thoughts 29.05.

2023, 08:25

Random Tech Thoughts

When a failing test might be OK

MAY 21, 2023MAY 22, 2023 ~ BOB
Usually, a failing test is a problem. In this article I will cover three cases where this might not
always be true: performance tests, testing a data science model such as a classifier, and testing in
quantum computing. In all these cases, a definitive answer about passing or failing is given by a
set of tests, rather than by a single test.

Dividing jobs between tests

In automated tests, each test does a different job – why bother having more than one test to deliver
some information? It’s common for there to be more test code than production code and
maintaining test code is a cost that can’t be ignored. Redundancy in the test code means that this
maintenance cost is bigger than it needs to be.

This means that any test failing is a sign that at least one part of the system isn’t behaving as
expected.

However, there are times where there is important variability – either intrinsic to the problem
being solved, or unavoidable variability in the way the problem is solved. When it’s too hard to
predict this variability accurately or how it will affect the test outcome, one approach is to create a
set of related tests, and this set is in many ways treated as a single test. The idea is that, while it’s
too hard to predict the behaviour of the system via a single (component) test, it is still possible to
predict its behaviour in general, i.e. across the set of tests.

In this case there are usually two levels of specification:

1. What makes a component test pass or fail?

2. What makes the set of tests pass or fail?

I’ll go into some examples below, where I’ll describe what gives rise to the variability and how the
success criteria are defined for the tests/set of tests.

Note that flaky tests are a similar but different problem. By flaky tests I mean tests that sometimes
pass and sometimes fail, and so give unreliable results. This is often due to variability in the order
in which different parts of the production or test code are executed, and this variability trips the
tests up. Flaky tests are something that should be fixable so that individual tests reliably pass or

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test-…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 1 of 6

When a failing test might be OK – Random Tech Thoughts 29.05.2023, 08:25

fail, but this might require changes to the production code as well as to the test code. The rest of
this article concerns times when variability can’t be dealt with such that individual tests are
reliable.

Performance tests

In this context, I’m using performance as a synonym for latency – how long will the system take to
respond to a request? One way to specify performance requirements is in terms of percentages.
For instance:

95% of requests must be processed in at most 0.1 seconds,

100% of requests must be processed in at most 0.5 seconds.

The performance requirements might be motivated by several things. One might be to ensure the
user gets a good user experience (https://randomtechthoughts.blog/category/user-experience/)
via a GUI. How quickly the GUI and systems such as API, database etc. behind respond to a user
request will influence the user’s perception of and enjoyment of the system. Alternatively, there
might not be any GUI or user involved directly, but the production system is an API that’s called
by other code. The performance requirements might be to ensure that many separate bits of code
can collaborate to create a bigger system, such as a phone network
(https://en.wikipedia.org/wiki/Signalling_System_No._7).

To test the system against its performance requirements, a suitable set of requests is created and
sent to the system under test. The meaning of suitable depends on context, but it is likely to follow
the pattern of requests that has been observed in production already. For instance, for an online
banking system one part of the pattern could be:

1. Log into the system

2. Look at the current balance on an account
3. Transfer money from that account to another account e.g. to pay a bill

In the light of the requirements above, if a request takes 0.7 seconds during a test – is that OK?
Definitely not. If a request takes 0.09 seconds, that’s fine. If a request takes 0.3 seconds, then
things might be OK. If too many other requests are also in the 0.1 – 0.5 second range, then the set
of tests as a whole fails.

Why can’t performance be specified more tightly? There are two sources of variation – the
mixture of different kinds of request, and real-world limitations of the implementation (which I
will explain shortly). The different kinds of request (e.g. the banking related ones above) all need
to be completed before the user thinks things are too slow, even if they need different amounts of
work to complete.

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 2 of 6

When a failing test might be OK – Random Tech Thoughts 29.05.2023, 08:25

The other source of variation is the limitations of the implementation. There are many helpful lies
that one part of the system tells to other parts. The lies are essentially saying that things are
simpler or better than they actually are. Enough of the time the pretence holds, but occasionally
the lie shows through and this affects performance.

The kinds of thing that I mean by helpful lie are:

The database / CPU / etc. is faster than it actually is – a lie involving some kind of cache;
The CPU, memory, network, and other important resources that some process needs are used
only by that process and nothing else – a lie involving virtualisation and other ways to share
things.

In a cache – whether this is an instruction cache on a CPU or a cache of queries in a database –

there is an operation that’s relatively slow and expensive. Operations seen recently have their
results stored in a way that makes it quicker to look up the results than to do the operation from
scratch. However, the cache isn’t infinitely big, so old or infrequently used entries in the cache are
evicted to make way for others once the cache gets full. That means that the next time an evicted
operation is attempted, it will have its full cost and not the reduced cost from using the cache.

Very early computers exposed programmers to all the details of their hardware, so having two or
more programs running on a computer at once became tricky. How will they share the CPU,
memory, disk space etc? More modern computers take that burden off the programmer and
handle it in things like the operating system.

The operating system creates illusions such as virtual memory – a contiguous chunk of memory
that is solely for one program, even though behind the scenes this is made of several separate
chunks of physical memory, and many different programs are using the physical memory at once.
Similarly, each program thinks it’s running on a CPU dedicated to running just that program. In
reality, each program gets a series of slices of CPU time, to allow the CPU to be shared across
many programs.

Enough of the time, the difference between appearance and reality is fine. However, it can
sometimes cause delays in the execution of code. For example, there is a program that is blocked
waiting for a long database query, and so the operating system has decided to divert some of its
physical memory to be used by another program that isn’t currently blocked on anything. Before
this happens, the contents of the memory are written to disk. A little while later, when the
database query finishes, the program is ready to run again and so needs that bit of memory back.
There will therefore be a delay while the data is read from disk back to memory.

These costs will happen at hard to predict times, as they are based on the interactions of many
moving parts at many levels of abstraction. Therefore, it’s easier to describe the system’s latency
in general terms such as percentages and ranges.

Data science
https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 3 of 6
When a failing test might be OK – Random Tech Thoughts 29.05.2023, 08:25

In a previous article I described fuzzy matching

(https://randomtechthoughts.blog/2023/04/12/fuzzy-matching-introduction/). This can be
used as part of classification – seeing what class or category a given thing fits into. For instance, a
system could be trained to say if a photo contains: a cat, a dog, a volcano, or something else.

In this situation, the variability is intrinsic to the problem. Not all cats look the same, and a given
cat will look different from different angles, in different lights, or in different poses. It is usually
unrealistic to expect a classifier to get 100% accurate results all the time. Sometimes it will come
up with the wrong answer (poor accuracy) or won’t be able to give any answer (poor coverage).

One way to represent the behaviour of a classifier is with a confusion matrix, such as the one
below:

Actual
Cat Dog Volcano Other Don’t know
Expected Cat 24 2 2
Dog 38 1
Volcano 15
Other 6 11 1
The numbers show a percentage of all images (tests) in the set of tests. For instance, 24% of images
are cats that are correctly classified as a cat. 2% of images are cats that are (incorrectly) classified
as a volcano etc. Cells containing zero are left blank for clarity. The diagonal line of italic numbers
is where the system is behaving as intended – the actual result matches the expected result.
Everything else is some kind of error.

The right-hand column shows how much of a coverage problem the classifier has, i.e. how many
times it has failed to come up with any answer. The cells that are neither italic nor in the right-
hand column show how much of an accuracy problem the classifier has, i.e. how many times it has
come up with the wrong answer.

If there are only two categories (and no don’t know column) then a confusion matrix can be thought
of as another way of representing the information in a table showing false positives, false
negatives, true positives and true negatives.

How to define criteria for when the test set passes or fails will depend on the context. Given that
it’s likely to be impossible to have both 100% coverage and 100% accuracy, is it better to have high
coverage or high accuracy? Within accuracy, are some categories, or some kinds of mis-
categorisation, more important than others? For instance, given that cats are more similar in how
they fit into human society than they are to volcanoes, if a cat is mis-categorised as a dog is that a
bigger or smaller problem than if it’s mis-categorised as a volcano?

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 4 of 6

When a failing test might be OK – Random Tech Thoughts 29.05.2023, 08:25

Quantum

If you’re unfamiliar with quantum computing, please refer to my article introducing it

(https://randomtechthoughts.blog/2020/09/01/a-beginners-guide-to-practical-quantum-
computing/). A summary is as follows. There are special variables called qubits. A qubit exists in
two states: before measurement and after measurement. Before measurement is equivalent to the
box that contains Schrödinger’s cat before the box is opened. Instead of being a box containing a
cat that is both dead and alive, a qubit before measurement contains both 0 and 1. After
measurement, a qubit contains only 0 or 1, like a bit in conventional programming. This is
equivalent to Schrödinger’s cat’s box once it has been opened – the cat is definitely dead or alive /
the qubit is definitely 0 or 1.

Image credit
(https://www.flickr.com/photos/31690139@N02/2965956885)
under Creative Commons Attribution 2.0 Generic
(https://creativecommons.org/licenses/by/2.0/)

The important part is that before measurement, a qubit has a probability that it will deliver the
value 0 when it’s measured (and 100% minus that probability that it will deliver 1 when it’s
measured). The goal of quantum code is to massage the probability on its qubits such that
probability is moved towards the correct answer[s] and away from the incorrect answer[s]. (These
operations that move probability aren’t the same as measurement, so a qubit is still in a
superposition of states, it’s just that one state becomes more likely than the other one.)

There’s a class of problems, called BQP (https://en.wikipedia.org/wiki/BQP), where the best

approach has a less than 100% probability of delivering the correct answer. (The probability is at
least 2/3.) This means that bug-free code running on valid inputs will deliver the wrong answer

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 5 of 6

When a failing test might be OK – Random Tech Thoughts 29.05.2023, 08:25

some of the time (and the correct answer the rest of the time, on the same inputs).

If you run some code and get the wrong answer, is that evidence that your code has a bug? It
depends on how often it delivers the wrong answer, and how this compares to the expected
frequency of a wrong answer. At this point it’s possible that you think that this is a physical
(quantum) system that happens to include code, so you start reaching for tools usually used for
analysing data in science, such as hypothesis testing and p-values
(https://www.youtube.com/watch?v=vemZtEM63GY). For instance, you create a hypothesis that
the code delivers correct results 87% of the time, and you want to be at least 95% sure of this (you
use p=0.05). This then directs how you will test your code.

The source of variability in this case is quantum physics! It is probabilistic, and so code that uses it
(quantum code) will also be probabilistic.

Summary

Much of the time, a single failing test can reliably tell you useful information about some code. By
this I mean that the code isn’t behaving as expected. However, there are cases where there is too
much variation in either the problem or the implementation of its solution for this to be possible.
These range from conventional cases, such as performance tests, to less conventional cases, such as
quantum.

If a single failing test is unreliable, it’s worth looking at grouping together a set of related tests, in
case the system’s behaviour is predictable over enough cases, i.e. in aggregate.

BLOG AT WORDPRESS.COM.

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 6 of 6

Read The Broken Ring - This Marriage Will Fail Anyway - Chapter 24 - ManhuaScan
No ratings yet
Read The Broken Ring - This Marriage Will Fail Anyway - Chapter 24 - ManhuaScan
110 pages
Chapter-3-Testing Modeling and System Specification
No ratings yet
Chapter-3-Testing Modeling and System Specification
129 pages
Leverage Artificial Intelligence For Success
No ratings yet
Leverage Artificial Intelligence For Success
99 pages
Unit 3
No ratings yet
Unit 3
93 pages
Lecture 6-2025
No ratings yet
Lecture 6-2025
36 pages
Lecture 3 Test Planning
No ratings yet
Lecture 3 Test Planning
32 pages
Software Testing-Testing Basic
No ratings yet
Software Testing-Testing Basic
56 pages
Week2
No ratings yet
Week2
16 pages
Testing General
No ratings yet
Testing General
84 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
ST Mod1 CH1
No ratings yet
ST Mod1 CH1
96 pages
Types of Testing
No ratings yet
Types of Testing
12 pages
Intro to Computers
No ratings yet
Intro to Computers
19 pages
IAU-ST-Lecture2
No ratings yet
IAU-ST-Lecture2
30 pages
Software Development Process Assignment1
No ratings yet
Software Development Process Assignment1
9 pages
Acceptance Testing For Continuous Farley Acceptance Testing Acceptance Testing
No ratings yet
Acceptance Testing For Continuous Farley Acceptance Testing Acceptance Testing
136 pages
PhotonNetworking Documentation PDF
No ratings yet
PhotonNetworking Documentation PDF
477 pages
ISYS6507 Testing and System Implementation: Week 6/session 10 The Test Plan
No ratings yet
ISYS6507 Testing and System Implementation: Week 6/session 10 The Test Plan
52 pages
23ps8776 Software Testing
No ratings yet
23ps8776 Software Testing
12 pages
4
No ratings yet
4
45 pages
Week10 Slides
No ratings yet
Week10 Slides
60 pages
2.1.2. Thinking Ahead
No ratings yet
2.1.2. Thinking Ahead
5 pages
Understandingsoftwaredynamics Preview
No ratings yet
Understandingsoftwaredynamics Preview
5 pages
IT2032 Software Testing Unit-3
No ratings yet
IT2032 Software Testing Unit-3
39 pages
Testing Done Right
No ratings yet
Testing Done Right
6 pages
Kale - The Key Principle of Testing
No ratings yet
Kale - The Key Principle of Testing
4 pages
Lecture 1 Introduction to Testing
No ratings yet
Lecture 1 Introduction to Testing
39 pages
OS and ST 11 march
No ratings yet
OS and ST 11 march
6 pages
Lecture 1 Introduction to Testing
No ratings yet
Lecture 1 Introduction to Testing
39 pages
2.1.2. Thinking Ahead
No ratings yet
2.1.2. Thinking Ahead
6 pages
Shweta ITR Report - f1
No ratings yet
Shweta ITR Report - f1
22 pages
5 TH Module
No ratings yet
5 TH Module
7 pages
Unit I Introduction SW Testing
No ratings yet
Unit I Introduction SW Testing
54 pages
System Design
No ratings yet
System Design
9 pages
ISEB Foundation
No ratings yet
ISEB Foundation
126 pages
Stqa Lab Journal - Mc2146 Pise Sayali Kishor
No ratings yet
Stqa Lab Journal - Mc2146 Pise Sayali Kishor
79 pages
Software Quality: Abstract
No ratings yet
Software Quality: Abstract
3 pages
model-answer-ste-ct1
No ratings yet
model-answer-ste-ct1
8 pages
QA Exam_V4
No ratings yet
QA Exam_V4
5 pages
Scalability Security: Functional Vs Non-Functional Testing
No ratings yet
Scalability Security: Functional Vs Non-Functional Testing
15 pages
Software Testing: Contact Session - 1
No ratings yet
Software Testing: Contact Session - 1
40 pages
Principles of Testing: Software Testing ISEB Foundation Certificate Course
No ratings yet
Principles of Testing: Software Testing ISEB Foundation Certificate Course
67 pages
R5F100LE
No ratings yet
R5F100LE
200 pages
Submitting First Lab
No ratings yet
Submitting First Lab
5 pages
Windows TuneUp Final
No ratings yet
Windows TuneUp Final
24 pages
Software Testing
No ratings yet
Software Testing
8 pages
Chapter 8
No ratings yet
Chapter 8
16 pages
ATS 12-13 Security Testing 1
No ratings yet
ATS 12-13 Security Testing 1
37 pages
Software Testing and Quality Assurance: KLMB It 904
No ratings yet
Software Testing and Quality Assurance: KLMB It 904
28 pages
hive
No ratings yet
hive
47 pages
Notes For ST U1
No ratings yet
Notes For ST U1
16 pages
Testing the METUX Model in Higher Education
No ratings yet
Testing the METUX Model in Higher Education
13 pages
Unit 1
No ratings yet
Unit 1
19 pages
Non-Major Elective Analysis
No ratings yet
Non-Major Elective Analysis
20 pages
Havells SPD
No ratings yet
Havells SPD
36 pages
Chapter 3 - Testing Fundamentals
No ratings yet
Chapter 3 - Testing Fundamentals
36 pages
Examtorrent: Best Exam Torrent, Excellent Test Torrent, Valid Exam Dumps Are Here Waiting For You
No ratings yet
Examtorrent: Best Exam Torrent, Excellent Test Torrent, Valid Exam Dumps Are Here Waiting For You
9 pages
Manual Testing Faqs: What Is Software Quality Assurance'?
No ratings yet
Manual Testing Faqs: What Is Software Quality Assurance'?
12 pages
UCE MOCK 2025 Score Guide
No ratings yet
UCE MOCK 2025 Score Guide
3 pages
Change Log
No ratings yet
Change Log
12 pages
Chapter 1 - Fundamentals of Testing
No ratings yet
Chapter 1 - Fundamentals of Testing
34 pages
SIEMENS SIPART-PS2 - OpsCondensedOverview
100% (1)
SIEMENS SIPART-PS2 - OpsCondensedOverview
2 pages
Unit-1 STQA
No ratings yet
Unit-1 STQA
127 pages
STERIS LIGHT 3686407
No ratings yet
STERIS LIGHT 3686407
8 pages
13-System Testing
No ratings yet
13-System Testing
21 pages
Yenepoya Institute of Arts, Science, Commerce and Management
No ratings yet
Yenepoya Institute of Arts, Science, Commerce and Management
21 pages
ES234211_Programming_Fundamental_MT____Paper 2024
No ratings yet
ES234211_Programming_Fundamental_MT____Paper 2024
6 pages
Solution Manager SP11 Upgrade Plan
No ratings yet
Solution Manager SP11 Upgrade Plan
8 pages
Testing Interview Questions-QA
No ratings yet
Testing Interview Questions-QA
50 pages
Guidelines for Network RTK Survey
No ratings yet
Guidelines for Network RTK Survey
11 pages
Catalogos Tablet - N80T South
No ratings yet
Catalogos Tablet - N80T South
2 pages
Software Testing - What, Why and How: Posted February 6th, 2008 by Admin
No ratings yet
Software Testing - What, Why and How: Posted February 6th, 2008 by Admin
8 pages
Quick Guide: SUN2000 - (50KTL-ZHM3, 50KTL-M3)
No ratings yet
Quick Guide: SUN2000 - (50KTL-ZHM3, 50KTL-M3)
21 pages
SQA As Service Dumps
No ratings yet
SQA As Service Dumps
5 pages
2024 Planned Technology Spend For CIOs in Canadian Government
100% (1)
2024 Planned Technology Spend For CIOs in Canadian Government
1 page
BSN3 CHN Lec 1.2 Steps in Conducting Community Diagnosis
100% (2)
BSN3 CHN Lec 1.2 Steps in Conducting Community Diagnosis
6 pages
SQL CheatSheet
No ratings yet
SQL CheatSheet
17 pages
Spmt281a PDF
No ratings yet
Spmt281a PDF
2 pages
Applications Manual: Daewoo Anti Theft System
No ratings yet
Applications Manual: Daewoo Anti Theft System
8 pages
Short Instructions For Delivering Password Tests: Support@englishlanguagetesting - Co.uk
No ratings yet
Short Instructions For Delivering Password Tests: Support@englishlanguagetesting - Co.uk
3 pages
Ecommerce Purchases Exercise - Jupyter Notebook
No ratings yet
Ecommerce Purchases Exercise - Jupyter Notebook
2 pages
WS1 Course Outline
No ratings yet
WS1 Course Outline
4 pages
Session #2 SAS - Nursing Informatics
No ratings yet
Session #2 SAS - Nursing Informatics
6 pages
Documents' Software Testing
No ratings yet
Documents' Software Testing
8 pages
Updating Program Felcom18-19.
No ratings yet
Updating Program Felcom18-19.
2 pages
Effective Test Case Writing
From Everand
Effective Test Case Writing
D. P. Harrison
4/5 (6)
Industrial Cases in Simulation Modeling
From Everand
Industrial Cases in Simulation Modeling
James A. Chisman PhD
No ratings yet
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Learn Software Testing in 24 Hours
From Everand
Learn Software Testing in 24 Hours
Alex Nordeen
No ratings yet

When A Failing Test Might Be OK - Random Tech Thoughts

Uploaded by

When A Failing Test Might Be OK - Random Tech Thoughts

Uploaded by

When a failing test might be OK – Random Tech Thoughts 29.05.

Random Tech Thoughts

When a failing test might be OK

Dividing jobs between tests

In this case there are usually two levels of specification:

1. What makes a component test pass or fail?

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test-…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 1 of 6

95% of requests must be processed in at most 0.1 seconds,

1. Log into the system

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 2 of 6

The kinds of thing that I mean by helpful lie are:

In a cache – whether this is an instruction cache on a CPU or a cache of queries in a database –

In a previous article I described fuzzy matching

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 4 of 6

If you’re unfamiliar with quantum computing, please refer to my article introducing it

There’s a class of problems, called BQP (https://en.wikipedia.org/wiki/BQP), where the best

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 5 of 6

https://randomtechthoughts.blog/2023/05/21/when-a-failing-test…source=Coding_Jag&utm_medium=Email&utm_campaign=Coding_ jag_141 Page 6 of 6

You might also like