Delft University of Technology
Delft University of Technology
DOI
10.1145/3287324.3287461
Publication date
2019
Document Version
Accepted author manuscript
Published in
SIGCSE 2019 - Proceedings of the 50th ACM Technical Symposium on Computer Science Education
Citation (APA)
Aniche, M., Hermans, F., & van Deursen, A. (2019). Pragmatic software testing education. In SIGCSE 2019
- Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 414-420).
Association for Computing Machinery (ACM). [Link]
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.
2 RELATED WORK The teaching team is currently composed of two teachers and
Software Testing is an important part of any Software Engineering teaching assistants (TAs). The number of TAs vary as our university
program [2, 8, 26, 42], and by itself poses several other challenges to has a policy of 1 TA per 30 students. Teachers are responsible for the
educators. Unfortunately, the topic still does not receive its deserved course design, lectures, creating and assessing multiple choice ex-
attention in several CS programs. Wong [42] argues that many ams, and they have the overall responsibility of the course. TAs are
engineers are not well trained in software testing because most CS responsible for helping students, grading all labwork deliverables,
programs offer ST as an elective course. Clarke et al. [8] also points and for giving concrete and specific feedback on what students can
to the fact that due to the large number of topics to be covered in a improve.
Software Engineering program, little attention is given to Software Learning goals. At the end of the course, students (1) are able to
Testing. Astigarraga et al. [2] show that most CS programs tend create unit, integration, and system tests using current existing
to emphasize development at the expense of testing as a formal tools (e.g., JUnit, Mockito) that successfully test complex software
engineering discipline. Lemos et al. [26] show that software testing systems, (2) are able to derive test cases that deal with exceptional,
education can improve code reliability in terms of correctness; corner, and bad weather cases by performing several different tech-
however, authors also argue that university instructors tend to niques (i.e., boundary analysis, state-based testing, decision tables),
lack the same knowledge that would help students increase their (3) are able to measure and reflect on the effectiveness of the de-
programming skills toward more reliable code. veloped test suites by means of different test adequacy metrics
Educators have been suggesting different approaches on how to (e.g., line and branch code coverage, MC/DC), (4) are able to reflect
introduce testing in a CS curriculum: from students submitting their on limitations of current testing techniques, when and when not
assignments together with test plans or sets [16, 17, 21], performing to apply them in a given context, and to design testable software
black-box testing in a software seeded with errors [21, 24, 31], systems, (5) Participants are able to write maintainable test code
students testing each others’ programs [36], to suggesting students by avoiding well-known test code smells (e.g., Assertion Roulette,
to use a test-first approach at the very beginning of the program [9, Slow or Obscure Tests).
10, 22, 27]. Many of these authors even suggest that tests should be Program. The course covers software quality attributes, maintain-
incorporated to the Computer Science and Software Engineering ability and testability, manual and exploratory testing, automated
curricula, not only as an elective discipline, but throughout the testing, devops, test adequacy, model-based testing, state-based test-
curriculum. More specifically, Jones [23] suggests that students need ing, decision tables, reviews and inspections, design-by-contract,
to see the practice of software testing as part of the educational embedded system testing, test-driven design, unit versus integra-
experience and that each core course in the curriculum should tion testing, mocks and stubs. More specifically:
impart one or more testing experiences.
In addition, educators have proposed tools that are solely fo- • Week 1: Introduction to software testing, fault vs failure, princi-
cused on software testing education. Elbaum et al. [11] propose ples of testing, (un)decidabilitily, introduction to JUnit, introduc-
BugHunt. BugHunt is a tool that contains four different lessons tion to labwork.
on software testing (terminology, black box, white box, efficiency • Week 2: Life cycle, validation vs verification, V-model, code re-
in testing). 79% of the students in their experiment agreed that views. Functional testing, partition testing, boundary testing, and
BugHunt added significant value to the material presented in the domain testing.
lecture(s) on software testing, and 61% of the students agreed that • Week 3: Structural testing, adequacy criteria, code coverage. Unit
BugHunt could replace the classes on testing. Spacco and Pugh vs integration vs system testing, mock objects, and test-driven
propose Marmoset [38], a tool to help incentivize students to test development.
their software. Marmoset’s innovative element is that if a submis- • Week 4: State-based testing, model-based testing, and decision
sion passes all of the public test cases, then students are given the tables.
opportunity to test their code against a test suite that is not publicly • Week 5: Test code quality, test code smells. Design for testability.
disclosed. Design-by-contracts.
• Week 6: Security testing. Search-based software testing.
• Week 7: Guest lectures from industry.
3 PRAGMATIC SOFTWARE TESTING
EDUCATION Key elements. To achieve a pragmatic software testing course, we
The Software Testing and Quality Engineering at Delft University have devised and currently follow some key elements:
of Technology is a course that covers several different aspects of Theory applied in the lecture. We put our efforts into developing lec-
software testing, ranging from topics in the ISTQB industry certifi- tures where students can see theory being applied to practice. Our
cation [5] to software testing automation, as well as the future of lectures often have the following structure: we present a (buggy)
testing by means of selected research papers. code implementation (initially on slides, and later in the IDE), we
The course is currently a compulsory part of the 4th quarter discuss where the bug is, we explore, at a conceptual level, a sys-
of the first year in the Computer Science bachelor. The course tematic approach to detect the bug, we apply the approach into a
corresponds to 5 ECTS (140 hours). Students have two lectures set of concrete examples. In other words, we do not only focus on
of 1.5 hours plus 4 hours of labwork a week. As a pre-requisite, explaining abstract ideas, but on concretely showing how to apply
students should have at least basic knowledge on Java programming them on different real world problems, using real-world tools, like
language. JUnit, Mockito, and Cucumber.
Pragmatic Software Testing Education SIGCSE’19, February 27–March 2, 2019, Minneapolis, MN, USA
Real-world pragmatic discussions. Software testing is a challenging Empirical research has indeed shown that test code smells often
activity to be done in practice. This means that developers often happen in software systems, and that their presence has a strong
make trade-offs in deciding what and how much to test. Engineering negative impact on the maintainability of the affected classes [3].
questions that arise when complex software systems are being We often reinforce the importance of refactoring test code and
tested, such as “how much should I test?”, “how should I test a make sure they are free of smells. To any test code we write during
mobile application that communicates with a web server?”, and live coding sessions, we make sure that they are as free of smells
“should I use mocks to test this application?” are often discussed in as possible. Test smell catalogues such as the ones proposed by
classroom so that students see how to extrapolate from our often Meszaros [32] are deeply discussed in a dedicated lecture.
small exercises to their future real lifes as developers. Design systems for testability. Designing software in such a way
Build a testing mindset. Software testing is not seen as an important that it eases testability is a common practice among practition-
task by many students. A software testing course should inspire ers [13, 18, 29]. This requires us to not only discuss software testing
students to think about testing whenever they implement any piece in our course, but software architecture and design principles of
of code. In our testing course, we aim to achieve such a testing testable software systems, such as dependency inversion [28], ob-
mindset by (1) showing how testing can be a creative activity, re- servability and controllability, in an entire dedicated lecture for the
quiring strong developers, by means of several live coding sessions topic. Questions like “Do I need to test this behavior via an unit
and rich pragmatic discussions, (2) demonstrating not only the use- or a system test?”, “How can I test my mobile application?” are
fulness of any testing technique we teach, but also how they are extensively discussed not only through the eyes of software testing,
applied, as well as what trade-offs such techniques have in the real– but also to the eyes of software design.
world, (3) bringing guest lecturers who talk about the importance Mixture of pragmatic and theoretical books. The two books we use
of software testing for their companies. as textbooks in the course are the “Foundations of software testing:
Software testing automation. The software engineering industry ISTQB certification” [5], which gives students a solid foundation
has long been advocating the automation of any software testing about testing theory, and the “Pragmatic Unit Testing in Java 8 with
activity [12, 35, 41]. However, some software testing courses still JUnit” [25], which gives students concrete and practical examples
focus on writing test case specifications solely as documents, and on how to use testing tools, like JUnit. We believe both complement
do not discuss how to automate them. In our course, to all the each other and both are important for students who will soon
theoretical and systematic test design techniques we present, from become a software tester.
functional testing to structural testing, from unit to system-level Interaction with practitioners. We strongly encourage their inter-
tests, students later write them in a form of an automated test. action with practitioners throughout our course. Having guest
Mastering tools such as JUnit and Mockito, standard tools for test lectures from industry practitioners helps us to show the pragmatic
automation in Java, is a clear learning goal of our course. The side of software testing. Guests focus their lectures on how they
importance of automation also strongly appears in our labwork, apply software testing at their companies, tools they use, their pros
which we discuss next. and cons, and on the mistakes and challenges they face. In the
A hands-on labwork. We see the labwork as an important learning 2017 edition, we also experimented with Ask-Me-Anything (AMA)
method. In our course, by means of a practical labwork assignment, sessions, where we called experts from all over the world via Skype
students apply a selection of techniques to a 3k lines of code game and students had 15 minutes to ask any software-testing related
written in Java, namely, JPacMan. The labwork contains a set of 50 questions.
exercises in which students are able to exercise all the techniques Grading. We currently use the following formula to grade our
we teach. It is important to notice that students not only generate students: 0.25 * labwork + 0.75 * exam. The labwork (as
test cases on the paper, but also automate them. A great amount of we explain below) is composed of 4 deliverables, each graded by
their work is in actually producing automated JUnit test cases. our TAs in a range of [0..10]. We later average the grades of four
In the following, we present the main deliverables of our labwork. deliverables, which compose the labwork component of the grade.
The complete assignment can be found in our online appendix [1]. At the end of the course, we propose a 40-question multiple choice
• Part 0 (Pre-requisites). Clone the project from Github, configure exam. Students may take a resit 6 weeks later if they did not pass in
the project in your IDE, write your first JUnit test, run coverage the first time. We also offer an optional midterm exam for students
analysis. who want to practice beforehand.
• Part 1. Write a smoke test, functional black-box testing, boundary
tests, reflect on test understandability and best practices. 4 RESEARCH METHODOLOGY
• Part 2. White-box testing, mock objects, calculate code coverage The goal of this study is to provide a better understanding of the dif-
and apply structural testing, use decision tables for complex ficulties and challenges that students face when learning pragmatic
scenarios, reflect on how to reduce test complexity and how to software testing.
avoid flaky tests. To that aim, we analyze the data from 230 students of the 2016-
• Part 3. Apply state-based testing, test reusability, refactor and 2017 edition of our software testing course. We propose three re-
reflect on test smells. search questions:
Test code quality matters. Due to the importance of automated test- RQ1 : What common mistakes do students make when learning
ing activities, software testers will deal with large test codebases. software testing?
SIGCSE’19, February 27–March 2, 2019, Minneapolis, MN, USA Maurício Aniche, Felienne Hermans, and Arie van Deursen
Number of students
To answer our research questions, we collect and analyze data
20
from three different sources: the feedback reports that TAs give to
when
students throughout the course, a survey with students, and a sur- after
vey with the TAs, both performed after the course. We characterize before
Lectures Q1 0% 7% 93% and discussions and interactions during the lecture (Q4) are con-
Guest lectures Q2 10% 18% 72%
sidered important by 75% and 65% of students, respectively. We
conjecture that discussions and live coding are moments in which
Live coding Q3 6% 19% 75%
students have the opportunity to discuss the topics they consider
Live discussions Q4 7% 28% 65%
hard, such as how much testing is enough, which test level to use,
PragProg book Q5 29% 51% 20% and test code best practices (as seen in RQ1 and RQ2 ).
ISTQB book Q6 31% 36% 33% On the other hand, the two books we use as textbooks in the
course are not considered fundamental for students. More specif-
Labwork Q7 1% 6% 93%
ically, 31% of students find the ISTQB [5] not important and 36%
Support from TAs Q8 7% 12% 80%
are neutral (Q6), whereas 29% of them find the PragProg [25] not
Related papers Q9 35% 34% 30% important and 51% are neutral (Q5). Reading related papers (Q9) is
AMA sessions Q10 30% 38% 32% also considered not important for 35% of them.
Midterm exam Q11 9% 19% 73%
100 50 0 50 100
6.4 Limitations of our study
The qualitative analysis of the open questions in the survey was
Figure 3: Importance of different activities in software manually conducted by the first author of this paper. The analysis,
testing learning. Scale=strongly disagree, disagree, neutral, therefore, could be biased towards the views of the authors. To
agree, strongly agree. The full questionnaire can be found mitigate the threat, we make all the data available for inspection in
in our appendix [1]. our online appendix [1],
TAs were responsible for giving feedback to students throughout
the study. Although we instruct all TAs on how to grade and what
50% of the students are neutral, and 21% perceive it as a hard topic kind of feedback to give (they all follow the same rubrics), different
(Q3). Not a single TA perceived this topic as easy. We believe these TAs have different personalities. In practice, we observed that some
findings highlight even more the importance of discussing even TAs provided more feedback than other TAs. While we believe this
more the pragmatic side of software testing. could have little impact on the percentages of each theme in RQ1 ,
When it comes to testing code best practices, students had a we do not expect any other theme to emerge,
contradicting perceptions. The usage of mocks to simulate a de- In terms of generalizability, although we analyzed the behavior
pendency (Q4) and writing fast, reproducible, and non-flaky tests of 230 students, we do not claim that our results are complete
(Q17) were considered easy topics to be learned by 42% and 56% of and/or generalizable. Furthermore, most students were dutch (we
students, respectively. TAs agree that students learn these topics only had 3 international students answering our survey), which
with less difficulties. However, when it comes to following testing may introduce cultural bias to our results. We urge researchers
best practices (Q9), 46% of students perceive it as an easy topic, to perform replications of this study in different countries and
while 71% of TAs perceive it as a hard topic for students. The stu- universities.
dents’ perceptions also contradicts the results of RQ1 , where we
observe a large number of feedback focused on best practices in 7 CONCLUSIONS
their assignments. Software testing is a vital discipline in any Software Engineering
Finally, testability seems less challenging for students than for curriculum. However, the topic poses several challenges to edu-
TAs. While students perceive optimizing code for testability (Q10) cators and to students. In this paper, we proposed a pragmatic
as just somewhat challenging (35% find it easy, 41% are neutral, and software testing curriculum and explored students’ common mis-
25% find it hard), 67% of TAs belive that testability is a hard topic for takes, hard topics to learn, favourite learning activities, important
students. As we conjecture that TAs have a better understanding learning outcomes, and challenges they face when studying soft-
of testability than the students, these findings suggest that the ware testing.
students are not sufficiently aware of the difficulty of testability. Researchers and educators agree that software testing education
is fundamental not only to industry, but also to research. We hope
6.3 RQ3 : Which teaching methods do students this paper helps the community to improve even more the quality of
find most helpful? their software testing courses. As Bertolino [4] states in her paper
on the achievements, challenges, and dreams on software testing
In Figure 3, we show how students perceive the importance of each
research: “While it is research that can advance the state of the art, it
learning activity we have in our software testing course.
is only by awareness and adoption of those results by the next-coming
Students perceive activities that involve practitioners as highly
generation of testers that we can also advance the state of practice.
important. More specifically, guest lectures from industry (Q2) were
Education must be continuing, to keep the pace with the advances in
considered important by 72% of participants. The Ask-me-Anything
testing technology”.
sessions (Q10), on the other hand, was considered important by
only 32% of participants; 38% are neutral, and 30% do not consider
ACKNOWLEDGMENTS
them important.
Moreover, different interactions during the lecture are also con- We thank all the students and teaching assistants that followed our
sidered important for students. Teachers performing live code (Q3) course in the last years.
Pragmatic Software Testing Education SIGCSE’19, February 27–March 2, 2019, Minneapolis, MN, USA
REFERENCES undergraduate computer science curriculum. In ACM SIGCSE Bulletin, Vol. 29.
[1] Maurício Aniche, Felienne Hermans, and Arie van Deursen. 2018. Pragmatic ACM, 360–364.
Software Testing Education: Appendix. (2018). [Link] [22] David Janzen and Hossein Saiedian. 2008. Test-driven learning in early program-
1459654. ming courses. In ACM SIGCSE Bulletin, Vol. 40. ACM, 532–536.
[2] Tara Astigarraga, Eli M Dow, Christina Lara, Richard Prewitt, and Maria R Ward. [23] Edward L Jones. 2001. An experiential approach to incorporating software testing
2010. The emerging role of software testing in curricula. In Transforming Engineer- into the computer science curriculum. In Frontiers in Education Conference, 2001.
ing Education: Creating Interdisciplinary Skills for Complex Global Environments, 31st Annual, Vol. 2. IEEE, F3D–7.
2010 IEEE. IEEE, 1–26. [24] Edward L Jones. 2001. Integrating testing into the curriculum—arsenic in small
[3] Gabriele Bavota, Abdallah Qusef, Rocco Oliveto, Andrea De Lucia, and Dave doses. ACM SIGCSE Bulletin 33, 1 (2001), 337–341.
Binkley. 2015. Are test smells really harmful? An empirical study. Empirical [25] Jeff Langr, Andy Hunt, and Dave Thomas. 2015. Pragmatic unit testing in Java 8
Software Engineering 20, 4 (2015), 1052–1094. with JUnit. The Pragmatic Bookshelf.
[4] Antonia Bertolino. 2007. Software testing research: Achievements, challenges, [26] Otávio Augusto Lazzarini Lemos, Fábio Fagundes Silveira, Fabiano Cutigi Ferrari,
dreams. In 2007 Future of Software Engineering. IEEE Computer Society, 85–103. and Alessandro Garcia. 2017. The impact of Software Testing education on code
[5] Rex Black, Erik Van Veenendaal, and Dorothy Graham. 2012. Foundations of reliability: An empirical assessment. Journal of Systems and Software (2017).
software testing: ISTQB certification. Cengage Learning. [27] Will Marrero and Amber Settle. 2005. Testing first: emphasizing testing in early
[6] FT Chan, TH Tse, WH Tang, and TY Chen. 2005. Software testing education and programming courses. In ACM SIGCSE Bulletin, Vol. 37. ACM, 4–8.
[28] Robert C Martin. 2002. Agile software development: principles, patterns, and
training in Hong Kong. In Quality Software, 2005.(QSIC 2005). Fifth International practices. Prentice Hall.
Conference on. IEEE, 313–316. [29] Robert C Martin. 2017. Clean architecture: a craftsman’s guide to software structure
[7] John Joseph Chilenski and Steven P Miller. 1994. Applicability of modified and design. Prentice Hall Press.
condition/decision coverage to software testing. Software Engineering Journal 9, [30] Scott Matteson. [n. d.]. Report: Software failure caused 1.7 trillion
5 (1994), 193–200. in financial losses in 2017. [Link]
[8] Peter J Clarke, Debra Davis, Tariq M King, Jairo Pava, and Edward L Jones. 2014. report-software-failure-caused-1-7-trillion-in-financial-losses-in-2017/.
Integrating testing into software engineering courses supported by a collaborative ([n. d.]).
learning environment. ACM Transactions on Computing Education (TOCE) 14, 3 [31] Renée McCauley and Ursula Jackson. 1999. Teaching software engineering early:
(2014), 18. experiences and results. ACM SIGCSE Bulletin 31, 2 (1999), 86–91.
[9] Stephen H Edwards. 2003. Rethinking computer science education from a test- [32] Gerard Meszaros. 2007. xUnit test patterns: Refactoring test code. Pearson Educa-
first perspective. In Companion of the 18th annual ACM SIGPLAN conference on tion.
Object-oriented programming, systems, languages, and applications. ACM, 148– [33] G. Miller. [n. d.]. A Scientist’s Nightmare: Software Problem Leads to Five
155. Retractions. [Link] ([n. d.]).
[10] Stephen H Edwards. 2004. Using software testing to move students from trial- Last visited in October, 2017.
and-error to reflection-in-action. ACM SIGCSE Bulletin 36, 1 (2004), 26–30. [34] SP Ng, Tafline Murnane, Karl Reed, D Grant, and TY Chen. 2004. A prelimi-
[11] Sebastian Elbaum, Suzette Person, Jon Dokulil, and Matt Jorde. 2007. Bug hunt: nary survey on software testing practices in Australia. In Software Engineering
Making early software testing lessons engaging and affordable. In Proceedings of Conference, 2004. Proceedings. 2004 Australian. IEEE, 116–125.
the 29th international conference on Software Engineering. IEEE Computer Society, [35] Alan Page, Ken Johnston, and Bj Rollison. 2008. How we test software at Microsoft.
688–697. Microsoft Press.
[12] Facebook. [n. d.]. Building and Testing at Facebook. [Link] [36] James Robergé and Candice Suriano. 1994. Using laboratories to teach software
[Link]/notes/facebook-engineering/building-and-testing-at-facebook/ engineering principles in the introductory computer science curriculum. In ACM
10151004157328920/. ([n. d.]). Last visited in October, 2017. SIGCSE Bulletin, Vol. 26. ACM, 106–110.
[13] Steve Freeman and Nat Pryce. 2009. Growing object-oriented software, guided by [37] Johnny Saldaña. 2015. The coding manual for qualitative researchers. Sage.
tests. Pearson Education. [38] Jaime Spacco and William Pugh. 2006. Helping students appreciate test-driven
[14] Vahid Garousi and Aditya Mathur. 2010. Current state of the software testing development (TDD). In Companion to the 21st ACM SIGPLAN symposium on
education in north american academia and some recommendations for the new Object-oriented programming systems, languages, and applications. ACM, 907–
educators. In Software Engineering Education and Training (CSEE&T), 2010 23rd 913.
IEEE Conference on. IEEE, 89–96. [39] Muhammad Dhiauddin Mohamed Suffian, Suhaimi Ibrahim, and Mo-
[15] Vahid Garousi and Junji Zhi. 2013. A survey of software testing practices in hamed Redzuan Abdullah. 2014. A proposal of postgraduate programme for
Canada. Journal of Systems and Software 86, 5 (2013), 1354–1376. software testing specialization. In Software Engineering Conference (MySEC), 2014
[16] Judith L Gersting. 1994. A software engineering “frosting” on a traditional CS-1 8th Malaysian. IEEE, 342–347.
course. In ACM SIGCSE Bulletin, Vol. 26. ACM, 233–237. [40] Joseph Timoney, Stephen Brown, and Deshi Ye. 2008. Experiences in software
[17] Michael H Goldwasser. 2002. A gimmick to integrate software testing throughout testing education: some observations from an international cooperation. In Young
the curriculum. In ACM SIGCSE Bulletin, Vol. 34. ACM, 271–275. Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for. IEEE,
[18] Misko Hevery. 2008. Testability explorer: using byte-code analysis to engineer 2686–2691.
lasting social changes in an organization’s software development process.. In [41] James A Whittaker, Jason Arbon, and Jeff Carollo. 2012. How Google tests software.
Companion to the 23rd ACM SIGPLAN conference on Object-oriented programming Addison-Wesley.
systems languages and applications. ACM, 747–748. [42] Eric Wong. 2012. Improving the state of undergraduate software testing education.
[19] Thomas B Hilburn. 1996. Software engineering-from the beginning. In Software In American Society for Engineering Education. American Society for Engineering
Engineering Education, 1996. Proceedings., Ninth Conference on. IEEE, 29–39. Education.
[20] Thomas B Hilburn and Massood Townhidnejad. 2000. Software quality: a cur-
riculum postscript?. In ACM SIGCSE Bulletin, Vol. 32. ACM, 167–171.
[21] Ursula Jackson, Bill Z Manaris, and Renée A McCauley. 1997. Strategies for
effective integration of software engineering concepts and techniques into the