0% found this document useful (0 votes)
3 views

Teaching Introductory Programming a Quantitative Evaluation of Different Approaches

This paper evaluates the impact of programming language choice, problem-solving training, and formative assessment on beginners learning to program. Findings indicate that using Python, a simpler language, enhances learning compared to Java, and that prior problem-solving training significantly improves student performance. The study emphasizes the need for robust research methodologies in computer science education to derive generalizable insights and improve teaching strategies.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Teaching Introductory Programming a Quantitative Evaluation of Different Approaches

This paper evaluates the impact of programming language choice, problem-solving training, and formative assessment on beginners learning to program. Findings indicate that using Python, a simpler language, enhances learning compared to Java, and that prior problem-solving training significantly improves student performance. The study emphasizes the need for robust research methodologies in computer science education to derive generalizable insights and improve teaching strategies.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Teaching Introductory Programming: a Quantitative Evaluation of

Different Approaches

THEODORA KOULOURI, Brunel University


STANISLAO LAURIA, Brunel University
ROBERT D. MACREDIE, Brunel University1

Teaching programming to beginners is a complex task. In this paper, the effects of three factors – choice of
programming language, problem-solving training and the use of formative assessment – on learning to
program were investigated. The study adopted an iterative methodological approach carried out across
four consecutive years. To evaluate the effects of each factor (implemented as a single change in each
iteration) on students’ learning performance, the study used quantitative, objective metrics. The findings
revealed that using a syntactically-simple language (Python) instead of a more complex one (Java)
facilitated students’ learning of programming concepts. Moreover, teaching problem-solving before
programming yielded significant improvements in students’ performance. These two factors were found to
have variable effects on the acquisition of basic programming concepts. Finally, it was observed that
effective formative feedback in the context of introductory programming depends on multiple parameters.
The paper discusses the implications of these findings, identifies avenues for further research and argues
for the importance of studies in computer science education anchored on sound research methodologies to
produce generalizable results.
Categories and Subject Descriptors: K.3.2 [Computers and Education]: Computer and Information
Science Education.
General Terms: empirical studies, teaching/learning strategies, novice programmers, learning
programming, CS1

1. INTRODUCTION
The past decade has witnessed a boom in industry demand for computing expertise,
driven by the dramatic expansion of computing use, the growing influence of
computing on the economy and the constant influx of new technologies into everyday
life [CC2005 2005]. Yet, while the demand for computing graduates is growing,
university Computer Science departments in the UK [National Audit Office 2007]
and USA [DARPA-RA 2010] report declining enrollment and high attrition rates on
their degree programs. Dropout and failure rates are exacerbated during and
between the first and second years of these programs, reaching as high as 30-40%
[Beaubouef and Mason 2005] and meaning that even where students begin studying
for Computer Science degrees, a high proportion do not go on to graduate.
These high attrition rates are associated with considerable failure rates in
introductory programming courses and/or disenchantment with programming
[McGettrick et al. 2004, in Grand Challenges in Computing: Education; Nikula et al.
2011]. This disenchantment persists with graduates “expressing a dislike of
programming and reluctance to undertake it” [McGettrick et al. 2004 p.12; see also
Ma et al. 2007]. This is also reflected in the outcomes achieved by students, with
large, multinational studies reporting that even at the conclusion of their
introductory programming courses, a large number of students show substandard
performance in elementary programming tasks [McCracken et al. 2001; Lister et al.
2004].
The tension between contemporary society’s need for a supply of skilled computer
scientists who can program and the manifest issues in programming achievement
demonstrated by many students in degree programs presents Higher Education
Institutions with opportunities and challenges in developing skilled computer

Authors’ address: T. Koulouri (corresponding author), S. Lauria, and R. D. Macredie, Department of


Information Systems and Computing, Brunel University, UK, email: [email protected]

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY
XX:2 T.Koulouri et al.

scientists. Core to any computer science education is the nurturing of the ability to
develop well-designed software, and all of the attendant conceptual and practical
skills that underpin this activity. Programming is the practical mechanism through
which these skills and abilities are realized. It forms an essential part of any
computer science (CS) degree and, as already noted, its teaching typically begins in
the first year of a degree program, with an introductory programming course (often
referred to as CS1). The approach to beginning programming at this early stage in
CS degrees is supported by students, CS and non-CS faculty and near-time
employers [CC2001 2001].
The changes to the computing industry associated with, inter alia, new
technologies, and the problems associated with learning to program, have led to
Higher Education Institutions facing the pressing need to rethink their CS curricula,
with special attention given to redesigning the CS1 course, which then has wider
effects. Developing or revising an introductory programming course has a key role in,
and impact on, the wider curriculum. An introductory programming course prepares
students for, and underpins, subsequent courses, which are typically also revised in
response to changes to CS1; and the overall degree prepares students for their future
careers.
The complexity of teaching introductory programming is widely acknowledged
among educators [Robins et al. 2003], being listed as one of the seven challenges in
Computer Science Education (CSE) [McGettrick et al. 2004]. It is also understood
that learning programming is a complex and multifaceted process. Novice
programmers show a fragile ability to take a problem description, decompose it into
subtasks, and implement them [McCracken et al. 2001]. They also have difficulty in
tracing, reading and understanding pieces of code and fail to grasp basic
programming principles and routines [Lister et al. 2004]. The overhead of learning
the syntax and semantics of a language at the same time, and difficulties in
combining new and previous knowledge and developing their general problem-solving
skills, all add to the complexity of learning how to program [Linn and Dabley 1989].
Taken together, these issues highlight the importance of identifying both the
appropriate content and pedagogy for CS1 in order to prepare students for the rest of
the degree programs and provide the foundation for effective computing careers.
Different approaches to teaching programming which reflect the mix of pedagogy and
content issues have been set out, notably in the IEEE and ACM Joint Task Force on
Computer Curricula [CC2001 2001].
Debates around the effectiveness of these approaches to teaching introductory
programming are well-established, of course. Pears et al. [2007], for example,
performed a large literature review on the teaching of introductory programming
which led to the identification of four major categories relating to course development:
curriculum; pedagogy; language choice; and tools for supporting learning. They
reviewed studies in each of these four areas and discussed several unresolved issues
before making recommendations in relation to them. First, they noted that there was
a conspicuous lack of an accepted framework/methodology to guide the processes of
planning, designing, developing, revising and implementing CS1 courses. Second,
they pointed out that empirical data from small-scale studies are abundant in the
literature, but that they rarely inform these processes. Pears et al. [2007] concluded
that larger-scale systematic studies that provide significant new empirical results are
needed; and that such research has the potential to provide valid and applicable
recommendations for educators regarding what and how to teach novice
programmers. Taking this argument as our starting point, the aim of this paper is to

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

identify an effective approach to teaching CS1 through applying a valid


methodological framework. The remainder of the paper is organized as follows.
Section 2 reviews existing approaches to the design of CS1 courses to confirm the
lack of systematic empirical studies in this area, providing the motivation for the
study reported later in this paper. The analysis reveals that the majority of studies
are small-scale, non-longitudinal or non-quantitative (that is, they are either based
on questionnaires or provide analysis that is non-statistical). The section argues that
relatively few studies that compare teaching approaches and, especially, approaches
that employ simple and complex teaching languages, have reliably evaluated the
outcomes of those approaches. As such, open questions remain with regards to what
should be included in the CS1 curriculum and how it should be taught. The
knowledge gaps identified in Section 2 give rise to the paper’s central research
hypothesis, which is presented in Section 3. The paper focuses on questions targeting
three specific aspects of teaching introductory programming and develops relevant
interventions. By reframing empirical problems as research problems, Section 4
formulates hypotheses that target three teaching interventions and seeks to provide
a foundation from which to address teaching- and curriculum-related questions by
developing an iterative methodology/framework for the design implementation and
evaluation of a CS1 course. This is reported through a longitudinal (four-year), large-
scale (>750 participants) study. To measure the outcome of each annual iteration –
the extent of student learning – the approach uses quantitative/objective measures,
in particular the frequency of: (i) programming constructs, typically considered as
fundamental concepts, that is, loops, conditionals and libraries; and (ii) bugs,
identified as semantic or syntactic errors in the software that students produce.
Through the statistical analysis presented in Section 5, the study reveals the set of
parameters that were found to improve student learning in the CS1 course. The
paper concludes by evaluating the effectiveness of two popular programming
languages (Java and Python) for CS1 teaching, contributing to the language choice
debate, and the potential benefit of formative feedback and of exposure to problem-
solving before exposure to programming. Finally, the paper discusses how these
parameters affected different elements and aspects of student learning. Taken
together, the study aims to go beyond being an experience report of teaching a
language or a comparison of two languages. Rather, it attempts to provide a model
on which two languages can be compared in this context and to present a framework
to guide the redesign of the CS1 curriculum.

2. LITERATURE REVIEW
One of the major issues influencing the design or restructuring of a CS1 course is the
choice of programming language, covered in the IEEE and ACM Joint Task Force on
Computer Curricula [CC2001 2001] within the context of “implementation strategies”.
CC2001 [2001 p.24] describes and discusses the three different, typical approaches
for introductory programming courses as follows: ‘“programming-first”
implementations are an imperative-first approach that uses the traditional
imperative paradigm; an “objects-first” approach emphasizes early use of objects and
object-oriented design; and a “functional-first” approach introduces algorithmic
concepts in a language with a simple functional syntax.’
Arguments have been made for the value of each approach. For example, the
advantage of the objects-first approach has been suggested to relate to the
importance of object-oriented programming in industry, with early exposure to
object-oriented principles and concepts considered an advantage for students.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:4 T.Koulouri et al.

However, it has been recognized that curriculum implementations that employ an


objects-first approach may be ill-fated, because the languages typically used for
delivery (such as C++ and Java) are much more complex in terms of syntax and
semantics than the classical languages. Unless special care is taken, this complexity
may prove overwhelming for novice programmers [CC2001 2001, p.30]. This has led
many educators and researchers to advocate that a necessary criterion for selecting a
CS1 language is simple syntax and structure [Mannila and deRaadt 2006]. Within
the movement towards less complex teaching languages, Python has gained much
support [Zelle 1999].
Numerous studies have contributed to the debate on which paradigm (“objects-
first” or ”imperative-first”, for example) and on which language is most appropriate
for teaching introductory programming, yet the results remain inconclusive. These
two aspects – of paradigm and language – are, of course, not completely independent,
meaning that it is not always possible to discuss them in isolation. The study
reported in this paper does not address the paradigm debate per se, but will focus on
the choice of programming language, discussing evidence that supports the
suitability of Python as a teaching language.
This is an important issue as there is also a lack of consensus with regards to
which is the most appropriate programming language for first-year computing
students. While Java remains a popular choice because of its association with
commercial use, the difficulty in learning programming is aggravated by its
previously noted complex syntax and semantics. As such, Java’s notational overhead
has been criticized as making it unsuitable for novice programmers. Python, on the
other hand, has a simple and clean syntax and structure and other characteristics
that make it appealing for teachers and learners, such as dynamic typing, powerful
built-in functions and structures, and a simple development environment.
Python has recently tended to be the language of choice, typically within the
“imperative-first” approach, perhaps because it started as an “academic” language
which, nevertheless, evolved to be “commercial” [Zelle 1999]. Further, Python was
designed to have simple syntax and semantics leading to the elimination of the vast
majority of errors commonly made by novice programmers, such as missing semi-
colons, bracketing problems and variable type declaration errors [Jadud 2005].
Python is also increasingly being used in real-world applications (for instance, in
high-profile organizations like Google and Nokia (see Miller and Ranum [2005]).
Finally, since Python also supports the object-oriented paradigm, it can be used as a
transition language for second-term or -year courses that are based on C++ and Java
[Goldwasser and Letscher 2008].
There are a number of empirical studies that document the process and outcomes
of revising introductory programming courses, with an emphasis on switching to
Python as a teaching language. In particular, Grandell et al. [2006] found that
Python facilitated teaching and learning and increased student satisfaction. These
conclusions were based on analysis of grade distributions, self-reports, identifying the
use of constructs in students’ code, and surveys of student attitudes towards
programming. However, despite the multiple measures used, the authors did not
provide evidence of statistical or grounded analysis, which may limit the strength of
the conclusions that they draw from the study. Similarly, Kasurinen and Nikula
[2007] reported that changes to the course and moving from C to Python led to higher
grades, improved student satisfaction and a decline in dropout and failure rates.
Radenski’s [2006] discussion and evaluation of the “Python first” approach led to the
development of a full online study pack implementing this approach. The results of a

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

survey administered to students who made use of the study pack revealed positive
perceptions of Python as a programming language. Along the same lines, empirical
studies by Patterson-McNeill [2006], Stamey and Sheel [2010], Miller and Ranum
[2005], Oldham [2005], Goldwasser and Letscher [2008] and Shannon [2003] detail
the choices involved in redesigning the curriculum and the shift from one language to
another in their colleges and argue for dramatic improvements with the use of
Python, but no explicit evaluation results are reported in any of these studies.
From the wider perspective of computing education research, Pears and Malmi
[2009] argued that a large part of the literature to date has lacked methodological
rigor. In particular, an extensive methodological review of 352 papers concluded that
the majority of published findings cannot be considered reliable, and, as a result,
many educators and researchers are forced to “reinvent the wheel” (as also suggested
by Almstrum et al. [2005]). The review found that: one-third of studies did not use
human participants or any type of analysis, but presented descriptions of
interventions and anecdotal evidence, while the rest of the studies employed mostly
questionnaires. Random sampling and control groups were rarely used, and even
when quantitative approaches were adopted, only one-third of these studies
performed robust statistical analysis. Finally, more than half of the studies did not
adequately describe their methods and procedures, while around 25% of them did not
state research questions and/or reviewed existing literature [Randolph et al. 2008].
While this report paints a grim picture of the field of computing education research,
there has been a recent paradigm shift towards more systematic and focused
investigations [Pears and Malmi 2009]. Notable examples include a study by Nikula
et al. [2011]; building on a clear methodological framework, the Theory of
Constraints, a five-year longitudinal study was undertaken in which problems were
diagnosed, and suitable interventions (mainly targeting course and student
motivational problems) were designed, implemented, and evaluated through pass
rates. Remarkable methodological rigor can also be found in a paper by Stefik and
Siebert [2013] that reported four large-scale empirical studies involving randomized
controlled trials. Relevant to the research focus of the present study, their analysis
indicated that novice programmers perform better with Python-style syntax
compared to C-style syntax and identified syntactic elements that are most
problematic for learners. Beck and Chizhik [2013] also employed a robust
methodology in order to compare traditional teaching methods and cooperative
learning techniques using control groups and final grades.
Taken together, the review of the relevant literature clearly demonstrates that
there are teaching approaches and programming languages that offer opportunities
for better supporting teachers and learners. However, the evaluation results
associated with the approaches implemented in these studies are, variously, missing,
partial, based only on qualitative data or have not been validated through statistical
analysis. Moreover, most of these studies present the results of designing and
implementing the new approach during one academic year in comparison with the
results of the previous year’s approach. Therefore, it is argued that while such
studies may provide invaluable insights into good practices and innovative teaching
techniques, if they do not select and apply a suitable research framework, they
remain experience accounts, bound to a specific context, and may not be useful for
offering reliable guidelines applicable to other Higher Education situations. Finally,
the lack of conclusive findings in the field may be attributed, as underlined in Ehlert
and Schulte [2009] to the lack of standard measures based on which the “old” and
“new” teaching approaches can be fairly compared. Following this line of argument,

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:6 T.Koulouri et al.

we believe that appropriate key indicators to assess learning outcomes need to be


identified.
The relation between education research, practice and policy has become
prominent, such that evidence-based education reform has been a key element in
government policies, which require schools to justify their choices and practices using
findings from rigorous, experimental research [U.S. Department of Education 2001].
Thus, Slavin [2003] maintains that the validity of educational research should follow
the criteria of other scientific fields, such as medical research, which include a control
group, low variance among groups, large sample size, and statistically significant
results. At the same time, it is recommended that applying research methodologies
from other disciplines should be performed with caution, given the fact that
education research aims to address problems in real-world, dynamic situations, and
not in highly controlled settings, like laboratories [Collins et al. 2004; Reeves 2011].
Following these recommendations, the study reported in the remainder of this
paper aims to address the above shortcomings through a number of means: first, by
developing a suitable methodology, which includes multiple development iterations
over more than one academic year; second, by defining objective parameters to
measure the outcomes of the curriculum revisions; and, third, by evaluating the data
associated with the proposed measures through statistical analysis. In this way, the
influence of the curriculum revisions can be assessed by impartially measuring and
analyzing the variance of key indicators that are associated with student
performance. In order to create a quasi-controlled experimental situation, each
iteration in this study corresponds to an academic year and involves a single change
in the module structure. As such, variance in the key indicators may be argued to be
associated with a particular change.
The research adopts an experimental paradigm, which also takes elements from
prominent iterative methodologies in education research, namely Action Research
[Stenhouse 1975] and Design-based Research [Brown 1992]. Like many
experimental methodologies, once a problem is identified, Action Research and
Design-based Research evolve through multiple cycles of analysis, design, and
evaluation. In particular, Action Research and Design-based Research involve a
spiral of self-reflective cycles of: planning a change; acting; observing the
consequences of the change; reflecting on consequences; re-planning; acting;
observing; reflecting; and so on. However, being mostly qualitative, these approaches
have been criticized as being too context-bound, only aiming to improve a situation
but without producing generalizable findings [Koshy 2009]. Yet, these methodologies
embrace the complexities, dynamics and limitations of authentic real-world
educational settings, which are not captured by lab-based experimental paradigms
[Collins 1992; Collins et al. 2004]. Indeed, it is argued that insights and tools
generated and evaluated within a normal setting are more likely to be useful and
relevant [Reeves 2011].
As mentioned above, the use of specific objective measurements as part of the
observation process is an important aspect introduced in this paper. This, in turn,
should enable the potential bias issue (a common criticism of Action Research and
Design-based Research paradigms) to be addressed during the evaluation and
comparison stages. The aim of these objective comparisons introduced in the paper is
to quantitatively measure a student’s ability to master basic programming concepts,
and, to achieve this, key indicators from student assessments are used in the
evaluation. In effect, this study attempts to address the methodological gap
identified by Pears et al. [2007] and Ehlert and Schulte [2009] and proposes a

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

framework to guide course design and revision based on an established research


paradigm and objective metrics.
Similar work has already been presented in Mannila et al. [2006] who used both a
quantitative and qualitative approach in their investigation. The study involved the
analysis of 30 programs written in Java and 30 programs written in Python produced
by high school students in two different academic years. The results revealed that
the Python programs contained fewer logic and syntax errors and more frequently
fulfilled the required functionality. Interviews with eight of the students who had
learned Python after Java revealed positive perceptions of the language.
In the same vein, the study reported in the remainder of this paper uses
quantitative key indicators as a measure of student programming proficiency; in
particular, in addition to the frequency of syntax errors (bugs), the study also
employs other quantitative indicators such as the presence of loop, conditional and
library constructs, which constitute key programming concepts. Another point of
departure of the study is the sample make-up and size: the investigation involved
over 750 students from an undergraduate first-year degree level. Finally, in the
study reported in this paper, several cycles were introduced in the iterative process
as discussed in this section. In the next section, details of the iterations are
presented and discussed.

3. CENTRAL RESEARCH HYPOTHESIS


The discussion so far has stressed the greater-than-ever importance of programming
skills, which increases the urgency for computer science education to identify the
fittest teaching approaches, especially pertaining to the CS1 module. However, there
appears to be a limited knowledge base of tested teaching approaches on which
practitioners at different educational institutions can draw. As such, the primary
research question of the study was to determine whether a student’s ability to learn
how to program is dependent on the teaching approach of the CS1 module; this was
the basis of the iterative paradigm employed in the investigation discussed in this
paper. The central research hypothesis was formulated as follows:

Ha: Programming proficiency of novice learners is dependent on the teaching


approach of a CS1 module.
In order to address the central research hypothesis and produce teaching
recommendations, the study involved three cycles in which in-class interventions
were developed, implemented, and evaluated based on objective, quantifiable criteria.
A detailed description of the research methods, interventions, set of assumptions and
evaluation metrics is provided in the following sections.

4. RESEARCH METHODOLOGY
Central to this work was the selection and application of a clear research
methodology. The nature of this research and the issues it aims to address were best
served by a controlled experimental design. The appeal of such approach lies in its
focus on hypothesis testing, statistical analysis of quantitative data, and fixed
variables and procedures that can be replicated. Indeed, it is argued that educational
studies that are not heuristic/exploratory, but have some very specific research
questions or hypotheses (“is [teaching approach A] better than [teaching approach B]
for novice programmers?”) could benefit from using the type of controlled trials
traditionally used in bio-medical sciences [Stefik and Siebert 2013]. At the same time,

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:8 T.Koulouri et al.

the work also drew on studies within the tradition of Action Research and Design-
based Research, which embrace the complexity of real-world learning environments
that resist full experimental control. These approaches were developed to guide
formative research, to test and refine educational designs and interventions based on
theoretical findings from previous research. Their aim is not only to refine
educational practice, but also to refine theory and to contribute to the field [Collins
et al. 2004].The research philosophy underpinning our efforts has been one of
“progressive refinement”, as also described and applied in Action Research and
Design-based Research studies. This approach is cyclic and involves the formulation
of a hypothesis, the design of an intervention, experimentation in the classroom, the
analysis of the collected of data, the formulation of a new hypothesis, the design of a
new intervention and so on [Molina, et al. 2007]. Within this paradigm, researchers
and/or practitioners initiate a project because they have diagnosed a problematic
situation, such as an ill-suited curriculum. The diagnosis is often triggered by an
external event, such as abnormally poor grades or feedback provided by students.
Indeed, the efforts described in this paper were triggered by the high failure rate of a
student cohort (which was above 36%, as shown in Table VIII). Subsequently, the
formulation of the hypothesis to be tested is based on experience, previous research
and theoretical principles, as analyzed by the team of practitioners and researchers
involved.
For this research, the body of findings discussed in Section 2, suggesting that the
complexity of Java may be overwhelming for learners and favoring Python as a
suitable teaching language, led to the development of the first hypothesis (presented
in Section 4.2). The second hypothesis and related intervention were also the
products of careful review of literature indicating the pedagogical benefit of formative
feedback. This iteration cycle is discussed in Section 4.3. Finally, as described in
Section 4.4, the intervention of the third iteration targeted the problem-solving skills
of learners, exploring the hypothesis of a correlation between problem-solving and
programming, informed by analysis of existing literature.
As detailed in Section 4.5, the study involved four experimental groups of CS1
students, which corresponded to four full student cohorts in four consecutive years: a
control group that were taught using Java; a group that were taught using Python; a
group that received formative feedback; and a group that received initial problem-
solving training.

4.1 The iterative process


The methodological approach consisted of three iterations, in which a single
parameter that could potentially improve programming proficiency was identified,
introduced and evaluated. Sections 4.2 to 4.4 provide detailed descriptions of each
iteration and, in particular, set out the theoretical and research findings that
motivated the intervention and led to the formulation of the associated research
hypothesis, and the design and implementation of the intervention. Students’
programming ability was measured through their use of specific key programming
concepts and the presence of bugs in the implementation of a required computer
program. The computer program was part of a final, formal assessment (as explained
in Section 4.7). The key concepts included basic programming constructs such as
conditionals and loops, and are discussed in Section 4.8.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

4.2 First iteration: Python is introduced as the introductory programming language


replacing Java.

Motivation for the intervention and research hypothesis

Java was used as the introductory programming language with the previous cohort of
students (the “control group” for the study). However, as indicated by the findings
considered in Section 2, the use of Java to introduce students to the basic aspects of
programming may be problematic, not least because Java is heavily coupled with
object-oriented concepts which this may interfere with the basic aim of an objects-
later strategy. Java “forces” some of the more advanced concepts into the foreground
– concepts which teachers do not typically want to introduce at an early stage
[Kölling 1999]. As a consequence, a student’s focus may switch from learning the
basic programming concepts to learning the language’s syntax. Drawing on the
evidence that Java may not be well-suited for education, especially when introducing
programming to novices [Siegfried et al. 2008], the use of an alternative
programming language with a lower syntactic burden was considered. As discussed
in the previous sections of the paper, several educators have argued that scripting
languages could offer a more effective alternative. Python is one example, offering a
simple and expressive language with support for procedural programming. It can be
argued that the lower overhead associated with Python should provide a gentler
introduction to the basic concepts of programming. By using Python, a greater
emphasis on core principles was expected with less of an unwanted focus on syntax.
As already noted, Python is also widely used in industry and is therefore considered
to be attractive to students. In Figure 1, a basic program to print “Hello World” is
written in Java and Python and exemplifies the syntactic and semantic differences
between the two languages. The program written in Java requires elements such as
access modifiers, classes, methods, keyword static, type, array, method parameter
and type etc, which the Python program omits.

public class HelloWorldApp print "Hello World"


{
public static void main(String[] args)
{
System.out.println("Hello World");
}
}
Fig. 1. A “Hello World” program, written in Java (table on the left) and Python (table on the right)

For all of these reasons, Python was selected as the programming language to replace
Java in the first iteration where, in effect, we observed the impact of the complexity
of programming languages. The associated research hypothesis is provided below:

Ha0_1: Programming proficiency increases for novice learners using Python


compared to Java as the introductory programming language in a CS1 module.

Implementation of the intervention

The experimental design required a “common basis of comparison”. This meant that
a large number of learning and teaching parameters needed to be equal or equivalent
between cycles. In particular, in all iterations, the same programming concepts were

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:10 T.Koulouri et al.

taught in the same order (see Section 4.6). This was only possible for the first 10
weeks of instruction. As such, this period formed the basis on which the four cohorts
would be compared. Similarly, the assessment, which involved a one-hour lab test in
week 11, marking procedures, and module organization were kept constant (see
Section 4.7).

4.3 Second iteration: formative feedback is introduced.

Motivation for the intervention and research hypothesis


In educational settings, feedback is crucial to improving knowledge and skill
acquisition as well as motivating learning [Shute 2008]. Formative feedback is the
information provided to students about their strengths and weaknesses in order to
improve their learning and performance, and does not usually contribute towards the
final grade of the student [Lilley and Barker 2007]; this is in contrast to summative
feedback which, as part of a formal assessment such as a final exam or coursework,
contributes to the final grade. There appears to be a general consensus in
educational research and policy that formative feedback is paramount for student
learning [Black and William 1998; Quality Assurance Agency for Higher Education
2003], and the benefits of formative feedback have been found to be particularly
pronounced for novice students [Moreno 2004; Shute 2008]. Students also appear to
value and expect feedback [Weaver 2006; Higgins et al 2002]. Within the context of
teaching programming to novices, Corbett and Anderson [2001] showed that for
formative feedback to be most effective for programming students, it has to be
immediate and point to individual steps and features in the learner’s work.
Based on this literature, for the second iteration, it was anticipated that introducing
formative feedback would be beneficial for the students. This leads to the next
research hypothesis:

Ha1_2: Programming proficiency increases for novice learners receiving regular


formative feedback in a CS1 module.

Implementation of the intervention


Students were encouraged to submit the programs that they produced as solutions to
weekly lab exercises (see Module Organization in Section 4.6). These students
received formative feedback on their submitted programs within 7 days. A two-stage,
semi-automatic process was used to generate the feedback. First, an ad-hoc
application ran every program and compared its output with the expected output. If
this output was different from that expected, or if the application located a bug (run-
time and compile-time – syntax – error), it would produce feedback and would flag
the program for manual inspection. Second, the human inspector assessed the code
and provided detailed comments, which were added to the automatically-generated
feedback. The formative feedback involved identifying both errors and successful
aspects of a student’s submitted program and, where appropriate, referred to the
lecture material and useful sources and provided hints towards the completion of the
task, but did not give the solution. As such, students could attempt the tasks and
receive feedback any number of times. Table I contains an example of a syntax error
(missing quotation marks around a string value), and the feedback and information
that would be automatically and manually generated for this error.

Table I. Example of student submission and formative feedback provided.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

Task Student Submission Automated Feedback Manual Feedback


Write a programme that s = Hello World global name 'Hello Strings should be
prints the string Hello print s World' is not defined enclosed in quotation
World marks for the data to be
recognized as a string
and not a variable
name. Please refer to
week 2 lecture notes
(slides 5-10).

It should be noted that the same tasks were given to students in the previous
iteration. To minimize the likelihood of this activity being perceived as a form of
monitoring students’ outcomes, no grading was associated with it. However, a
consequence of the non-compulsory element of the activity was that the submission
rate was not as high as desired; only 50% of the students engaged at least once in the
process.

4.4 Third iteration: problem-solving training is introduced.

Motivation for the intervention and research hypothesis


The purpose of an introductory programming course is to develop students’ problem-
solving skills and introduce them to primary concepts of design and programming.
Yet, this aim is not always clear, and a common misconception of students is that the
course is a “Java class” or a “C++ class” [Zelle 1999]; a tendency sometimes
attributed to the complexity of these languages [Zelle, 1999]. According to the
CC2001 (2001), problem-solving is the key skill in the discipline, but is also where
the greatest weakness of novice students lies. It is, again, recognized that language
syntax can detract from developing problem-solving concepts (p.23). As such, it is
recommended that all introductory courses (independent of the underlying – “objects-
first” or “objects-later” – paradigm or choice of language) include sessions in which
problem-solving and problem-solving processes are covered. Thus, at this stage in the
iterative process, it was noted that a lack of problem-solving skills could be one of the
underlying reasons for students having difficulty in utilizing key concepts, such as
loops and conditionals. Therefore, in the last cycle of the iterative process the focus
was on problem-solving training. This resulted in the final research hypothesis:

Ha2_3: Programming proficiency increases for learners receiving problem-solving


training before the beginning of a CS1 module.

Implementation of the intervention


During the final cycle, a “crash course” was introduced during induction week (that is,
just before the formal start of the module at the beginning of the academic year).
The course attempted to follow the principles of problem-solving training described in
Polya [1973]. It lasted about three hours and the aim was to improve participants’
problem-solving abilities through practical, hands-on activities. In particular,
students controlled a virtual robot through a basic set of natural language
instructions (for example, “turn left”, “move”), with the aim of completing a given
task within a virtual environment. Subsequently, the set of instructions was fed to
and translated by an application and were executed by a real robot in real-time. In
effect, students needed to understand and analyze the problem (task), devise a plan

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:12 T.Koulouri et al.

that consisted of a sequence of actions for the robot, carry out the plan and review the
outcome of their strategies. The session did not touch on programming topics such as
loops, conditionals and arrays. For this last iteration, the impact of introducing these
problem-solving tutorials was again assessed using the same approach as for the
other cycles.

The research hypotheses tested in the study are consolidated in Table II.

Table II. Research Hypotheses. The overarching hypotheses of the study and those formulated for each
iteration of the study.
Hypotheses

Ha Programming proficiency of novice learners is dependent on the teaching approach of a CS1


module.
Ha0_1 Programming proficiency increases for novice learners using Python compared to Java as the
introductory programming language in a CS1 module.
Ha1_2 Programming proficiency increases for novice learners receiving regular formative feedback in
a CS1 module.
Ha2_3 Programming proficiency increases for novice learners receiving problem-solving training
before the beginning of a CS1 module.

4.5 Participants

The groups.
Four distinct groups of students took part in the different iterations of the study; the
experience of each group is outlined below.
G0: For this group, during the 10 weeks before the test, teaching emphasized
mastery of basic programming skills with Java as the implementation language. In
particular, lecture and laboratory sessions and exercises focused on loops,
conditionals, use of libraries and packages (for example, to generate random numbers,
for input and output, etc.) (the complete list of topics can be found in Table III).
These students used BlueJ as the learning environment. This cohort served as the
control group for the study. The size of this cohort was 157 students (with 19% being
female).
G1: The G0 and G1 groups had essentially the same structure to their
teaching/learning experience, with the programming language used being the only
difference for the groups. Python, rather than Java, was used for the G1 cohort.
These students used Python IDLE as the learning environment. This group consisted
of 195 students (with 18% being female).
G2: The G1 and G2 groups had essentially the same structure to their
teaching/learning experience. Throughout the term, the groups were given weekly,
ungraded tasks (exercises to be implemented using Python), but students in the G2
group were encouraged to also submit their solutions. Formative feedback was
provided for the submitted work. Task submission was not compulsory. The size of
this group was 193 students (25% being female).
G3: The G3 group had essentially the same structure to their teaching/learning
experience as the G2 group but had an additional three-hour problem solving “crash
course” before the beginning of the module. The G3 group consisted of 216 students
(with 22% being female).

Student background.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

The four groups corresponded to four full cohorts of students from consecutive years,
enrolled in the first year of the suite of computing degrees offered at a single
university. The students comprising the groups had comparable backgrounds and
levels of prior programming proficiency. Within a cohort, individual students may
have had greater proficiency than others, but this was true in equal measure for each
group. As such, the research assumed that the average levels of initial programming
proficiency for each student cohort were similar. This assumption was supported by
the fact that the admissions criteria applied for the particular programs of study
were the same for the duration of the study and the admission qualifications profiles
of the cohorts were very similar. No student was in more than one cohort.

4.6 Module content and organization

Organization
The organization of the module was one of the fixed parameters of the experimental
design. The module consisted of weekly lectures (duration: 1 hour), laboratory
sessions (duration: 2 hours). While the whole cohort attended the same lecture
session, for the laboratory sessions the cohort was split into three smaller groups.
Each week, the lecture preceded the laboratory session and introduced the
topic/concept. During the laboratory session, students were asked to complete self-
paced tutorials. These tutorials provided short explanations of the concepts and
instructions on how to implement, run and debug simple programs based on the
concepts. Depending on the cohort, students used BlueJ or Python IDLE as the
interactive development environment (see Sections 4.2-4.4). The tutorials also
included exercises – simple and more challenging tasks – without solutions.
Depending on the cohort, students had the opportunity to submit their solutions and
receive feedback. Students also used the university e-learning environment which
provided access to resources such as lecture notes, the laboratory tutorials,
discussion forums and additional learning materials.

Content
Table III shows the module content and teaching sequence followed. The topics, the
sequence, and the depth of coverage of topics were kept constant across the groups.
As can be seen from Table III, seven topics were taught over a period of 10 weeks.
Then, in week 11, students took a formal assessment (as explained in Section 4.7).
It should be noted that the implementation strategy followed in all groups
corresponded to an “objects-later” approach; that is, fundamental programming
constructs were taught without any explicit reference to object-oriented programming,
or to language-specific elements (such as types and access modifiers). While object-
oriented programming aspects may “creep in” in even simple programs in Java,
students were not expected to understand or produce any such element.
In week 12, all groups were formally introduced to object-oriented principles, but
their curriculum was no longer comparable given that the three “Python” groups (G1-
G3) were being exposed to Java for the first time at this point.

Table III. Module Content and Teaching Sequence. The topics taught in each session.

Stage Topics

T1. Overview

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:14 T.Koulouri et al.

Variables, primitive data types, and data structures (arrays


T2. and lists)

T3. Control structures: iteration

T4. Control structures: selection

T5. Sub-programs: procedures/methods

T6. Importing libraries

T7. Syntax Errors and Exceptions

Teaching Staff
The teaching team consisted of a module leader, two supporting lecturers and six
graduate teaching assistants. The module leader delivered all lectures covering the
seven fundamental topics during the first 10 weeks (with the supporting lecturers
being involved later in the term). The module leader and supporting lecturers ran
and coordinated the laboratory sessions, and along with the graduate teaching
assistants, they provided one-to-one help to students. The module leader and
supporting lecturers were the same for every cohort, with G0 being the first cohort
taught by the team. Not all of the graduate teaching assistants remained until the
end of the four-year study. Finally, only the module leader was involved in all of the
marking/grading to ensure consistency.

4.7 Assessment and marking


Students were assessed on their ability to implement programs. A lab test was
chosen as the assessment method, as it has been found to be an accurate assessor of
programming ability [Chamillard and Joiner 2001; Califf and Goodwin 2002;
Bennedsen and Carpensen 2007], and superior to written exams and assignments
[Daly and Waldron, 2004]. During the test, students had one hour to complete the set
task and they could use the same environment as the one used during their
laboratory tutorials (BlueJ or Python IDLE, depending on the group). As such, for
the duration of the test the students were required to implement, run and debug
their programs. The students submitted their programs electronically. The students
had access to both the standard Java/Python documentation and to module resources
(lecture notes, laboratory examples, etc.). The tasks were kept as similar as possible
in each cohort assessment.
The test covered the basic programming language aspects addressed during the
previous 10 weeks (see Table III). The aim of the assessment was to evaluate the
ability of the students to translate the task given into a program that executed
successfully. Moreover, students were tested on their ability to apply basic concepts
such as loops, conditionals, etc. The test consisted of a set of sub-tasks, with some
sub-tasks targeting a specific basic concept and others targeting the ability to
combine some of these basic concepts. For example, a task such as ‘print the string
“Hello World” 20 times’ would have been used to evaluate the students’
understanding of applying a loop, while a task such as ‘generate two random numbers
Y and X and print the value of Y if Y < X’ would have been used to evaluate their
ability to combine two basic concepts together (importing and using libraries, and
understanding of conditionals).
Within the 60-minute time limit, possible submissions by the students for the first
task (‘print the string “Hello World” 20 times’) included the following: no code
implemented for this task; the code included 20 print statements; a loop that

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

repeated the print statement 20 times. For the second example (‘generate two random
numbers Y and X and print the value of Y if Y < X’), possible solutions included:
initiating Y and X with the student’s own values instead of random values;
generating the random values but printing all values of Y, instead of when Y < X; etc.

Grading
Each task in the lab test was simple enough to have a “perfect solution”. As noted
above, this solution would require the use of one of the taught concepts or a
combination of them. Full marks were awarded if a student submitted a program
which reflected the perfect solution. Half marks were awarded if a student submitted
a program which produced the correct output, but without using the anticipated
concept. For example if the task was ‘print the string “Hello World” 20 times’, a
solution which included a loop that repeated the print statement 20 times would
receive full marks, while a solution that contained 20 print statements would receive
half marks. The grading process involved two stages. First, an auto grader filtered
out the programs that did not run (because of run-time or compile-time errors) and
the programs that gave incorrect output (semantic errors). Then, a human marker
(the module leader) examined the programs that ran and gave correct output and
awarded marks based on the criteria set out above.

4.8 Evaluation metrics/learning indicators


As explained at the start of Section 4, the research paradigm employed in this study
involved the collection of “hard”, objective data to evaluate a particular intervention
and test the underlying hypothesis. While the literature review presented in Section
2 offered only a “birds-eye” view of the field of computing education, it revealed that
researchers have used a variety of metrics to evaluate the impact of their
interventions on learning. Among the evaluation metrics employed in related studies
are assessment results (of examinations and coursework assignments), final grades,
and drop-out rates. There are some limitations with these metrics that discouraged
their use in this study. In particular, if viewed in isolation, results and grades appear
too coarse and reveal little about qualitative elements of the programs produced by
students, and which of the taught concepts were acquired and which were not.
Second, as assessment content and methods differ between institutions, meaningful
comparisons using test results are not possible. Third, drop-out rates are influenced
by many factors in addition to, or irrespective of, the CS1 course [Nikula et al. 2011].
Finally, data on final grades and drop-out rates are collected after the completion of a
CS1 course, which means that learning problems are diagnosed when it is already too
late for a particular cohort. In effect, we argue that evaluation metrics for teaching
interventions should be reframed to be “indicators of learning”; this could shift the
focus from the teaching intervention – the process – to learning, which is, after all,
the end point of interest. Taken together, the study sought to identify metrics that:
could indicate that a concept taught was also understood; correspond to concepts that
are universally taught to students at this level and, as such, could be applied by
different institutions to measure performance; could be derived using minimal
resource requirements; and could be also used remedially.

Fundamental programming concepts: looping, conditionals, libraries


These characteristics guided the decision to look at whether students were able to
appropriately use the most important concepts taught in the course up to that point.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:16 T.Koulouri et al.

These concepts were: looping, conditionals and library use. The additional advantage
of selecting these concepts was that their building blocks are discrete keywords
(looping: for, while and do; conditionals: if, case, ?: operator; library use: import),
which can be located and counted in a program without human intervention and
effort. Therefore, the frequencies of appropriate use of these constructs in the
assessment were used as indicators. Simply put, a task in the lab test was
constructed such that its perfect solution would include the use of particular
concept(s) taught (for instance, a loop and/or a conditional). If the student used the
associated construct (“if”, “for”, “while”, etc.) in his/her solution, this could be
indicative of the student understanding the concept and when it should be used. This
type of indicator was independent of syntactic elements and errors. It should be
noted that the exam tasks were simple enough to have model answers, and if a model
answer required one type of construct (for example, one conditional), but the
submitted program contained more than one, this would not increase the count.

Bugs
Bugs were the second type of indicator used in this study. Bugs refer to logic,
compile-time and run-time errors. However, given the level of the students, the
content of the 10 weeks of instruction, and of the exam content, the bugs present in
student submissions were largely limited to compile-time errors. So, in effect, this
metric solely consisted of syntax errors. This resonates with previous research
findings that indicate that the majority of student errors in CS1 are compile-time
errors, and, in particular, syntax errors. Jadud’s [2005] analysis found that the three
most common student errors are, in fact, rather “trivial”: semicolons, typographic
errors in variable names and bracketing (accounting for 42% of errors found in
students’ work). The study also reported that these mistakes are made repeatedly.
Syntax errors are an important area of programming pedagogy research.
Experienced programmers rarely make syntax errors, and when they do, they have
clear strategies to correct them very quickly. However, syntax errors are significant
for novice programmers [Sleeman et al. 1988; Kummerfeld and Kay 2003]; correcting
them is a time-consuming process and often leads to random debugging behavior,
also influenced by the fact that students do not understand compiler messages
[Kummerfeld and Kay 2003]. It is recognized that logical and run-time errors are
much deeper, more important and difficult to correct. However, a novice programmer
has to get past compiler errors first.
As suggested above, failure to use the correct construct for a task – such as a loop
to solve a loop problem – is indicative of inadequate knowledge/learning, but this is
not the only type of error based on inadequate knowledge/learning Kummerfeld and
Kay [2003] indicated that the ability to efficiently fix syntax errors necessitates
knowledge of the programming language in order to understand error messages.
Taken together, it is argued that failure to correct simple syntax errors is also
indicative of inadequate knowledge/learning, based on the basic assumption that a
competent student should be able to locate and correct simple errors within a time
limit.
While the number of syntax errors in a programme is a valuable indicator, its use
as a metric is problematic when comparing the effectiveness and suitability of
teaching languages since some languages, like Java and C++, have more elaborate
syntax. As such, it is argued that syntax errors as a metric of programming and
debugging ability should not be used in isolation, but in conjunction with other
metrics, like the ones proposed earlier in this section.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

While arguing these four metrics are useful and appropriate for the study, the
paper is not suggesting that this is a exhaustive set with higher intrinsic value than
any others. Rather, for the specific CS1 content of the 10 weeks, which was the basis
of comparison of the four iterations/years, these metrics were considered adequately
consistent, robust and objective to assess learning and level of understanding of
programming of CS1 students to that point in their degree programmes. Assessing
the ability of CS2 or CS1 term 2 students would have demanded a different set of
metrics; most probably, a larger number of, and more sophisticated and fine-grained,
metrics to correspond to students’ more advanced knowledge.

5. RESULTS
Table IV shows a summary of the measurements for each key indicator relative to
each cohort. In particular, the cells in the table contain the number of programs that
had one (or more) bugs, loop, conditional and import statements. The values in the
last column indicate the sample size of each cohort.

Table IV. Observed Frequencies Summary. Columns show the frequencies of programs containing a key
indicator. Rows show the measured values for each cohort. The last column indicates the total size of each
cohort.

Group Loop Conditional Import Bugs Total Cohort Size

G0 27 55 56 90 157

G1 106 77 100 64 195

G2 113 92 117 60 193

G3 148 157 146 36 216

To validate each hypothesis, the differences in the frequencies observed for


different groups/iterations were calculated. Table V outlines which groups in the
iteration were compared in order to address each hypothesis.

Table V. Group Comparisons. The groups compared for testing each hypothesis are indicated.
Hypotheses Groups compared

Ha G0 vs. G1, G2, G3

Ha0_1 G0 vs. G1

Ha1_2 G1 vs. G2

Ha2_3 G2 vs. G3

To statistically validate our results, χ-squared tests were conducted at a 0.05


significance level. R [The R Foundation 2010] was used to carry out these statistical
tests. It must be noted that the simultaneous occurrence of different key indicators
could be observed for every participant (for example, when the program submitted
contained a loop and a conditional). Therefore, key indicators are not mutually
exclusive. A solution is to calculate the χ-squared for each indicator separately
[Agresti and Liu 1999]. A hypothesis is then seen as accepted if the majority of the p-
values for a key indicator are below the significance level. Table VI shows a summary
of the results.

Table VI. χ-squared Analysis Summary. This table shows the p-values obtained by the separate χ-squared
analyses performed for each key indicator to test each hypothesis. The last column indicates the overall

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:18 T.Koulouri et al.

rejection decision for the hypothesis at a 0.05 significance level. A hypothesis is accepted if the majority of the
values for a key indicator are below the significance level
Hypothesis Loop Conditional Import Bugs Result

Ha < 0.001 < 0.001 < 0.001 < 0.001 Supported

Ha0_1 < 0.001 0.478 0.005 < 0.001 Supported

Ha1_2 0.431 0.117 0.071 0.82 Not supported

Ha2_3 0.046 < 0.001 0.172 < 0.001 Supported

The results suggest that programming proficiency (as measured by the presence of
bugs and use of key programming concepts such as loops, conditionals and importing
libraries) depends on the module structure (Ha supported). Moreover, both
hypotheses Ha0_1 and Ha2_3 were confirmed while the analysis failed to support
Ha1_2. Therefore, the results suggest that the use of a simple programming
language such as Python can have a significant impact on novices learning how to
program. Also, while formative feedback did not provide an observable benefit,
problem-solving training resulted in an improvement of programming ability.
However, the frequencies of each key indicator in Table IV indicate a more
intricate pattern of differences; it appears that the magnitude of the effect of each
module change is different for each key indicator. These differences generate richer
questions with regards to which module changes influence and bring a greater
benefit to the use and understanding of a particular programming concept. Thus,
further analysis was undertaken in order to tease apart the individual effects on the
learning of programming concepts and to understand the underlying reasons behind
these differences.
The analysis involved the calculation of the differences between the frequency
values of each key indicator between two groups. This analysis aimed to elucidate
the individual effects on each key concept of: (i) programming language and (ii)
problem-solving. Since the module change implemented in G2 (formative feedback)
was not found to produce a significant effect, only groups G0, G1 and G3 were
considered in this analysis. The differences in the frequencies of each key indicator
between these groups (G0 vs. G1 and G1 vs. G3) were calculated. Each observed
frequency from Table IV above was normalized by turning the value into a
percentage of the total. Table VII shows the percentage difference for each key
indicator using the observed frequency values from Table IV.

Table VII. Calculated Percentage Differences. Columns show the percentage difference for each key indicator
using the observed frequency values from Table IV. Data from Table IV was normalized by turning each
frequency value into a percentage of the total.
Groups Loop(%) Conditional(%) Import(%) Bugs(%)

G1-G0 37 4 15 -24

G3-G1 14 33 16 -16

Table VII suggests that the impact of the programming language used is not the
same for each key indicator. In particular, when Python was introduced, replacing
Java as the teaching language, variations in the frequencies related to the use of
loops were higher than the variations related to conditionals. At the same time,
when problem-solving training was introduced, the difference in the use of
conditionals was the most pronounced.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

To provide a complete picture of the cohorts, the distribution of final grades


achieved by the students in each group is presented in Table VIII. Chi-square
analysis yielded a significant association (χ2(12) = 77.862, p < .001), and the data
appear to reiterate the pattern of results of the analysis presented above. The final
grades reveal a steady improvement in students’ overall performance. As mentioned
in section 4, the research project was initiated because of high failure rates (36.6% for
G0). With this figure dropping to 10.6% (for G3) during the last iteration, it is
assumed that the interventions produced a clear positive outcome.

Table VIII. Grade Distribution of all Groups.


Grade
A B C D E/F
Group G0 7 15 22 56 57
4.5% 9.6% 14.0% 35.7% 36.3%
G1 5 18 29 78 64
2.6% 9.3% 14.9% 40.2% 33.0%
G2 14 23 50 78 31
7.1% 11.7% 25.5% 39.8% 15.8%
G3 25 34 66 68 23
11.6% 15.7% 30.6% 31.5% 10.6%

Final grades have been traditionally used in CSE research to measure the success
of an intervention. However, this study proposed and applied an alternative
combination of metrics/learning indicators. The fact that the results of both analyses
(of final grades and the identified set of metrics) coincide supports the validity of the
proposed metrics for evaluating whether learning outcomes have been achieved, but
it may also indicate that these metrics may hold predictive power for students’
overall performance. In this light, the proposed metrics may also be used as
monitoring mechanisms and to trigger intervention during the module.

6. DISCUSSION
There have been suggestions that Higher Education Institutions appear to be unable
to cope with increasing industry demands for graduates with programming skills.
This has been attributed to students’ dissatisfaction with, and failure rates in,
programming modules, which are pronounced in, and after, their first year, and to a
lack of adequate programming skills even after graduation. As such, universities are
impelled to rethink their introductory programming curricula. In order to undertake
such changes on the basis of a sound evidence base, the study reported in this paper
aimed to undertake rigorous research and provide robust evidence to underpin
recommendations with regards to content and methods for an introductory
programming (CS1) module. This section discusses the findings of the study in light
of related literature and distils them into recommendations for the design of CS1
curricula.

6.1 Research and practice in CSE


A survey of previous publications in CSE revealed that a large body of research
consists of small-scale case studies, which typically involve introducing and
evaluating a new teaching method or tool. Indeed, several literature reviews

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:20 T.Koulouri et al.

[Holmboe et al. 2001; Berglund et al. 2006; Pears et al. 2007; Randolph et al. 2008]
point out that these studies are valuable as a medium for educators to share
experiences, but that their results are hard to generalize and use in other
educational contexts because they often lack a sound methodological framework.
These reviews emphasize that it is only by following a clear methodology in applying
and evaluating teaching techniques that research can offer solid and practical
contributions to the field of computer science education.
The study reported in this paper embraced principles of Action Research and
Design-based Research methodologies, which aim to improve practice and to learn
through action; planned change is implemented, monitored and analyzed in cycles.
These approaches recognize that teaching programming is a real-world situation that
contains multiple dependent variables that co-exist, although not all of them need to
be investigated. The study reported in this paper involved a process of progressive
refinement undertaken in three iteration cycles; in each cycle a single change was
identified, implemented and the effects of the change were evaluated. Each proposed
change was formulated as research hypothesis, and its evaluation was performed
based on statistical analysis of verifiably reliable data on measures associated with
programming ability. Unlike previous small-scale studies, this research was driven
by data obtained during four consecutive years and student cohorts (with 761
participants in total).
By adopting this approach, the study aimed to provide valid conclusions that may
help CS1 curriculum developers in different educational settings. It also attempted
to provide observations and guidelines that may be implemented as part of any
teaching paradigm adopted in an institution (functional, objects-oriented, or
imperative). Finally, even if the conclusions are not in line with an institution’s
strategies, this study serves to highlight the importance of approaching the issue of
teaching introductory programming as one would any other research problem; that is,
it should be built on careful consideration of published findings, and suitable, explicit
and rigorous experimentation and evaluation.
Much of the controversy in the field of education research concerns the issue of
generalizability. As argued above, approaches based on quantitative and reliable
measurement, large samples and valid statistical analysis, like the one reported in
this paper, tend to be less susceptible to such criticism. Still, quantitative studies are
associated with many limitations and misconceptions. In relation to educational
research, it may be problematic that quantitative studies do not seek to control or
interpret all variables that operate in the real-world educational setting being
analyzed, which may confound the success or contribution of a particular
intervention [Denzin and Lincoln 1998]. Like qualitative approaches, the
involvement of the experimenter in the practice may also introduce bias. As such, it
is important that researchers do not over-interpret results derived from quantitative
analysis, taking them at face value. Instead, such findings should be triangulated
with existing knowledge and, where possible, complemented by qualitative methods.
Finally, studies need to present clear and sufficient information regarding methods,
outcomes, assumptions and situational parameters, in order for educators to assess
the applicability of the observations to their own circumstances and for future
research to reliably review, compare and replicate them.

6.2 The benefit of a syntactically simple language.


One of the fiercest debates in CSE is about which is the most suitable language to
teach novices. For the past 10 years, Java has been ranked as the most popular

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

language 1 , but some educators argue that languages like Java are too
syntactically/semantically rich and complex to serve as a pedagogical and learning
tool, and they instead recommend the use of educational languages in CS1 courses.
However, universities face the pressure and responsibility to select a language with
the greatest practical relevance as well as market appeal. As such, moving away
from such commercially-valuable languages may result in the failure to attract
students as the program looks less relevant from an employability perspective.
Therefore, such decisions should only be made after careful consideration of reliable
data and sources.
The analysis presented in this paper first confirmed the basic premise of this
research: the choice of language has an impact on the development of programming
skills of novices. Moreover, the analysis compared Java and Python, and revealed
that Python facilitated students’ learning of the fundamental programming concepts
and structures.
In effect, this paper makes the case for Python, a syntactically simple language,
which, at the same time, is not a purely educational language, since it is increasingly
being used in real-world applications, currently being ranked among the 10 most
popular languages in industry 2 . Python offers the possibility to write and run
programs without the notational overhead imposed by Java, because of the
straightforward syntax and development environment. Selecting Java as the
introductory programming language results in instructors and students spending
more time on the syntax of the language rather than the algorithm. Yet, the aim of
an introductory programming course is not to teach a language per se, but to teach
the basic concepts of programming, improve algorithmic thinking and prepare
students for the remainder of their studies. As such, by no means do we argue
against the value and necessity of teaching Java, but we maintain that Python is
more suited for an introductory programming module and can provide the necessary
foundations for students from which they can move on to Java (or an similarly
complex language) in the second term or year, being better equipped and confident.
Taken together, the recommendation derived from these findings is that Python is
an effective language for teaching introductory programming in the first term or year
of a computer science degree, since it enables solid acquisition of fundamental
concepts and constructs and debugging skills, and, as such, it can be integrated
within the paradigm of choice of an institution (for instance, functional, object-
oriented or imperative paradigms).

6.3 Formative feedback for improving programming skills.


Formative feedback is a pedagogical tool necessary to improve knowledge and skill
acquisition as well as to motivate learning [Shute 2008]. It is particularly useful for
novice students, because one of the difficulties that they face is that they do not know
what is expected of their work. At the same time, the provision of formative feedback
necessitates large investments in terms of resources such as staff time and
specialized software.
In the second iteration, students had the possibility to submit their weekly work
and receive feedback on successful and erroneous aspects of their code, as well as
pointers to help them address any errors. It was predicted that this group of
students would outperform the previous cohort. However, the analysis failed to

1,2 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:22 T.Koulouri et al.

confirm the related hypothesis. Similar results were reported by a three-year study
exploring the effect of formative feedback using the exam scores of computer science
students, with the author concluding that there was little or no correlation between
the provision of formative feedback and student achievement [Irons 2010].
There are two possible explanations for the absence of the effect of formative
feedback in our study; the first stems from students and the broader context, and the
second relates to the quality of feedback. First, the participation in the formative
feedback process was lower than expected and this could explain the lack of
significant improvements in student learning for the G2 cohort. This may be related
to the fact that young students appear to be less keen to take advantage of
opportunities unless there is a tangible benefit associated with the task. Indeed,
several studies have found that most undergraduate computer science students are
“externally motivated”, that is, they are driven to work in order to perform well in
summative assessments that contribute to their final grades and degree outcome
rather than from a “thirst to develop knowledge” [Carter and Boyle 2002 p.1; Bostock
2004]. Additional reasons for the low engagement of students with the feedback
process may include fear of participation, seeing feedback as a bad sign, and negative
attitudes towards learning [Black 1999]. Research has also indicated that students
may find feedback difficult to understand [Lillis and Turner 2001], or not even read it
[Ecclestone 1998]. As such, it is argued that simply offering feedback has limited
value; in order for students to make the most of it, a broader change in their
motivations and perceptions should take place [Black 1999], while Rust [2002] also
recommends that students should be required (not encouraged) to actively engage
with it. In addition to providing feedback, then, structured opportunities should be
provided to students to understand, reflect on, and discuss the feedback with tutors.
A second explanation may lie in the nature of the feedback provided in this study.
Research by Corbett and Anderson [1991; 2001] shows that results are mixed with
regards to whether introductory programming students benefit from feedback; for
instance, parameters that determine whether feedback leads to improvement have to
do with whether it is provided on-demand, automatically, at the end of each line or
the program, consists of goal hints or explanations, etc. These parameters may also
interact with the proficiency level and learning style of students. Indeed, such
observations give rise to richer research questions, with regards to feedback
parameters that can provide a true advantage to learners of programming.
In light of the findings of this study and those of previous literature, it is advised
that for formative feedback to yield observable benefits to their performance, novice
students of programming may need to be externally motivated and guided. Moreover,
the amount, nature and timing of feedback should be fine-tuned to the particular
characteristics of the task and students in order to be effective and worthwhile.

6.4 The benefit of problem-solving training before programming.


Problem-solving involves understanding a complex problem and decomposing it into
simple, manageable steps. As stated above, the aim of an introductory programming
module should also be to improve students’ problem-solving skills, and the link
between problem-solving and programming skills is well-known [Tu and Johnson
1990]. Preoccupation with syntactic detail and the executable performance
requirements may also detract from students thinking of the algorithm and, then,
mentally following its execution. At the same time, most introductory programming
courses, at least in the beginning, involve simple tasks with straightforward
solutions, with no weight given to design and analysis, which gives students a false

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

impression of the programming process [CC2001 2001]. The practice to cover


problem-solving concepts at the beginning of an introductory programming course
has been recommended by CC2001 [2001] and evidence of the approach’s value has
been demonstrated by Turner and Hill [2007], who adopted it and reported positive
student perceptions.
Therefore, in the third iteration of this study, the week before the formal start of
the module focused on problem-solving activities that introduced algorithmic
concepts and constructs, independent of a particular programming language. The
analysis compared the performance outcomes of the groups in the two relevant
iterations (G3 and G2; with and without problem-solving training). The results
confirmed that problem-solving yields an actual, quantifiable benefit in the
programming ability of students by the end of the 10 weeks. A possible explanation
is that novice programmers normally tend to begin coding without considering and
analyzing the problem. So, it may be the case that through even a brief exposure to
problem-solving procedures before programming sessions began, students were able
to understand that an exercise is a problem that can be solved if decomposed into
steps that can be then translated into lines of code. In addition, having already
gained some experience with algorithms and concepts relating to data and control
may subsequently have facilitated the students’ easier and more rapid progress
through the remainder of the introductory programming course. This inference is
also supported by the finding (discussed below) that the use of conditionals was
mostly facilitated by problem-solving training.
Palumbo [1990] discussed the “reverse” causal relationship of programming and
problem-solving; that is, whether learning programming improves problem-solving
skills. His conclusions, however, may be viewed from both sides. In particular, he
suggested that the development of problem-solving skills could be facilitated if, after
learning the syntax of the language, the student learns how to: (i) break the program
into subprograms; (ii) use iteration; and (iii) use conditions to solve the programming
problem. From these points, it could be inferred that if, indeed, problem-solving
training is linked to iteration and conditions, it is sensible to suggest that even
limited duration training in problem-solving can help students to grasp the
particular fundamental concepts which students were taught in this course.
In conclusion, it is recommended that novice learners of programming undertake
a brief problem-solving training course before the beginning of programming sessions.
This approach helps students to develop a basic understanding of analysis and design,
algorithmic and problem-solving concepts without relating them to a particular
executable language. Having a conceptual foundation facilitates and accelerates the
transfer to an executable programming context within a functional, imperative or
object-oriented paradigm.

6.5 Impact of a syntactically simple language and problem-solving on the acquisition of


fundamental programming concepts.
Among the objective metrics used to assess learning were the frequencies of use of
loop and conditional constructs. The analysis showed that the changes implemented
in the iterations resulted in effects of different magnitude on the use of these
constructs. In particular, when Python replaced Java as the language of instruction,
the use of loop structures increased more dramatically than the use of conditional
statements. One possible explanation for this phenomenon is that loops are
problematic because novices often fail to understand that “behind the scenes” the
loop control variable is being updated [Du Boulay 1986]. Therefore a simpler syntax

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:24 T.Koulouri et al.

could have a greater impact, since it makes loops easier to use, and the underlying
concept easier to understand.
On the other hand, in the iteration in which “problem-solving before programming”
was implemented, the benefit in the use of conditionals was more pronounced
compared to the use of loops. Thus, it appears that variations in the use of
conditionals may be less associated with programming language complexity and more
associated with abstract thinking skills. This implies that beginners need to acquire
logical and mathematical structures in order to be more proficient in the use of
conditional statements, a point also noted by Rogalski and Samurçay [1990]. This
seems to be achieved when developing the students’ problem solving skills before
programming is addressed, suggesting that this ordering should be followed in the
design of the module.
It is perhaps to be expected that a more syntactically-complex language will result
in novice programmers making more syntax errors. However, it is only recently that
empirical studies have began to investigate the extent of the impact. Denny, Luxton-
Reilly and Tempero [2011] found that syntax presents a major barrier for novice
learners; in a large-scale experiment, students with excellent final grades submitted
non-compiling programs almost 50% of the time, even in simple exercises, while low
performing students submitted non-compiling code 73% of the time. As the authors
state, it appears that syntax is a greater challenge for all novice students than
anticipated. The same study found that there is a negative correlation between
syntax error frequency and perceptions and attitudes about programming, while
suggesting that spending excessive amounts of time on correcting syntax errors
hinders learners. Denny, Luxton-Reilly and Tempero [2012] also explored the idea
that “not all syntax errors are equal”. Indeed, while all students make the same types
of syntax error, top students can quickly fix missing semicolons (the most common
syntax error) while less able students take twice as long to correct the problem.
However, students of all abilities find it equally difficult to correct the second and
third most common syntax errors. So, if, indeed, novice programmers largely make
syntax errors which they then repeat or are unable to resolve within a specified time,
it may be argued that a language which is more likely to lead to syntax errors may be
unsuitable as a teaching language for novice programmers.

7. CONCLUSIONS AND FUTURE WORK


How to approach the teaching of introductory programming is a multidimensional
problem and has given rise to enduring debates. The outcome of the investigation
reported in this paper has contributed to this body of literature through findings
derived from a rigorous and relatively large scale, longitudinal research study which
offers benefits over many of the existing studies in the field. First, the paper has
introduced the argument that some objective measurements should be considered to
assess the effects of the strategies adopted. Second, the reported study integrated
principles of experimental and applied research methodologies, arguing that to
evaluate effectively the effects of any didactic change in the module structure, it is
crucial to rigorously control the process in a naturalistic setting. This led to iteration
through a cycle of a single change being implemented, then the effects being observed,
and to the next change being defined based on these objective observations, before
repeating the cycle. Third, quantitative evidence has been produced from these cycles
of data collection and analyzed to frame recommendations for the design of CS1
curricula. The paper further argues that, by approaching questions of what and how

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

to teach in this context primarily as “research problems”, the resulting findings of


such research can be generalized and the recommendations used by educators across
different settings.
From the iterative process adopted, the following outcomes emerged. First, the
choice of programming language seems to affect student learning. This may be
because some programming languages are too complex (in terms of syntax, semantics,
etc.) for beginners and may distract them from learning basic programming concepts,
which may, in turn, have a lasting impact on students’ confidence in, and perception
of, their programming abilities. Second, introducing problem-solving concepts before
teaching more specific programming aspects has an impact on how students learn to
program. This strategy could help students realize that it is essential to develop an
ability to both break down complex problems into sub-tasks and produce the correct
sequence of actions and accelerate consolidation of concepts, such as data and control
structures, introduced later in the course. From this perspective, programming
problems can then be solved as a systematic implementation of tackling one sub-task
at a time and backtracking afterwards. It could also be argued that the positive effect
of early exposure to problem-solving on programming ability arises independently of
the language used in the programming course. However, an empirical investigation
of possible interaction effects is worthwhile; that is, an interesting question is
whether the benefit of problem-solving is present, grows or diminishes, when
different instruction languages and techniques are used later in the course. Third,
formative feedback may not be necessarily as effective as expected unless students
are ready to have a pro-active role in seeking and responding to feedback. In the
context of introductory programming, effective formative feedback should have
particular characteristics in terms of timing and targeting specific features of the
code. Identifying the parameters of effective feedback is an interesting future
research challenge. The findings of this study have also identified other potential
avenues for further research, outlined below.
The study does not argue that the proposed three interventions constitute the only
possible set, or even a superior set compared to other candidate interventions.
Research in, and development of, a course is a long-term endeavour. Interventions
should be continuously planned and tested until as many aspects of the problematic
space as possible have been addressed and the course improved in a sustainable way.
With the field of computer science education maturing, educators are gaining access
to a growing knowledge base of solid findings to guide them through making
judicious decisions regarding teaching interventions. For example, there is robust
evidence that pair programming and related cooperative learning instructional
techniques lead to improvements in students’ performance, self-efficacy and
perceptions [Braught et al. 2011; Beck and Chizhik 2013]. At the same time, well-
documented previous efforts give educators the resources as well as the confidence to
apply “non-traditional” pedagogical approaches. These include the Media
Computation approach, in which students learn how to program through the
manipulation of visual and audio media [Sloan and Troy 2008; Guzman 2009]; and
Peer Instruction, which relies on students working in small groups [Lee 2013; Simon
et al. 2010; Porter et al 2011]. Originating from non-CS classroom environments,
these approaches have been found to increase achievement, attainment and
motivation in CS1 students.
The data in this study were obtained from a (relatively short) teaching cycle of
about 10 weeks. Therefore, how these findings affect the settling time of the learning
curve is one aspect that merits further investigation. A next step in this area would

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:26 T.Koulouri et al.

be to explore the correlation between the choice of the programming language used in
the CS1 course and the length of the transient time for the learning curve. Another
interesting, related research area would be to investigate how using Python first has
an impact on learning Java at a later stage (as also addressed in Mannila
et al. [2006]). Finally, a follow-up study about students’ performance in the second
and third years of their studies could help provide an appreciation of the long-term
effects of curriculum changes introduced in the CS1 course.
This paper has argued for the importance of choices and practices informed by
reliable analysis of experimental data. Richer insight and deeper understanding
could, though, be gained if objective measurements were inspected in light of findings
from analysis of qualitative data, such as video recordings, interviews and verbal
protocols [Renumol et al. 2010].
There are also areas of further study associated with extending the variables that
are considered. For example, human factors in the student population, such as
gender and ethnicity, were not considered in the reported study. However, gender
and ethnic background are consistently found to correlate strongly with attrition
rates of first year students in computing, with female and minority-ethnic students
being particularly vulnerable [Talton et al. 2006; Rosson et al. 2011]. Therefore, it is
planned to develop further this study by analyzing the impact of gender and ethnic
background on the strategies adopted to teach programming to beginners. Through
this investigation, it may be possible to identify the teaching methods and curriculum
changes that equally support and are suitable for all students, without adversely
affecting a particular demographic group. For example, it has been consistently
found that pair programming greatly benefits female programmers while not
impairing the performance of male students [Berenson et al. 2004; McDowell et al.
2003].

REFERENCES
Agarwal, K. K., Agarwal, A., and Celebi, M. E. 2008. Python puts a squeeze on Java for CS0 and beyond.
Journal of Computing Sciences in Colleges, 23, 6, 49-57.
Agresti, A., and Liu, I.M. 1999. Modeling a Categorical Variable Allowing Arbitrarily Many Category
Choices. Biometrics, 55, 3, 936–943.
Almstrum, V. L., Hazzan, O., Guzdial, M., and Petre, M. 2005, February. Challenges to computer science
education research. In ACM SIGCSE Bulletin Vol. 37, No. 1, pp. 191-192. ACM.
Beaubouef, T., and Mason, J. 2005. Why the high attrition rate for computer science students: some
thoughts and observations. ACM SIGCSE Bulletin, 37, 2, 103-106.
Beck, L., & Chizhik, A. 2013. Cooperative learning instructional methods for CS1: Design, implementation,
and evaluation. ACM Transactions on Computing Education (TOCE), 13, 3, 10.
Bennedsen, J., & Caspersen, M. E. 2007. Assessing Process and Product: A Practical Lab Exam for an
Introductory Programming Course 1. Innovation in Teaching and Learning in Information and
Computer Sciences, 6, 4, 183-202.
Berenson, S. B., Slaten, K. M., Williams, L., & Ho, C. W. 2004. Voices of women in a software engineering
course: reflections on collaboration. Journal on Educational Resources in Computing (JERIC), 4, 1, 3.
Berglund, A., Daniels, M., and Pears, A. 2006, January. Qualitative research projects in computing
education research: an overview. In Proceedings of the 8th Australasian Conference on Computing
Education-Volume 52, pp. 25-33. Australian Computer Society.
Black, P. 1999. Assessment, learning theories and testing systems. Learners, learning and assessment,
118-134.
Black, P., & Wiliam, D. 1998. Assessment and classroom learning. Assessment in education, 5, 1, 7-74.
Bostock, S. J. 2004. Motivation and electronic assessment. Effective learning and teaching in computing,
86-99.
Braught, G., Wahls, T., & Eby, L. M. 2011. The case for pair programming in the computer science
classroom. ACM Transactions on Computing Education (TOCE), 11, 1, 2.
Brown, A. L. 1992. Design experiments: Theoretical and methodological challenges in creating complex
interventions in classroom settings. The Journal of the Learning Sciences, 2, 2, 141-178.
Bruce, K. B. 2004. Controversy on how to teach CS 1: a discussion on the SIGCSE-members mailing list.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

ACM SIGCSE Bulletin, 36, 4, 29-34.


Califf, M. E., & Goodwin, M. (2002, February). Testing skills and knowledge: Introducing a laboratory
exam in CS1. In ACM SIGCSE Bulletin, 34, 1, 217-221. ACM.
Carter, J., and Boyle, R. 2002. Teaching delivery issues: Lessons from computer science. Journal of
Information Technology Education, 1, 2, 65-90.
Chamillard, A. T., & Braun, K. A. 2000. Evaluating programming ability in an introductory computer
science course. ACM SIGCSE Bulletin, 32, 1, 212-216.
Clear, T., Edwards, J., Lister, R., Simon, B., Thompson, E., and Whalley, J. 2008, January. The teaching of
novice computer programmers: bringing the scholarly-research approach to Australia. In Proceedings
of the tenth conference on Australasian computing education-Volume 78, pp. 63-68. Australian
Computer Society, Inc.
Collins, A. 1992. Toward a design science of education, pp. 15-22. Springer Berlin Heidelberg.
Collins, A., Joseph, D., and Bielaczyc, K. 2004. Design research: Theoretical and methodological issues.
The Journal of the learning sciences, 13, 1, 15-42.
Computing Curricula. 2001. IEEE CS, ACM Joint Task Force on Computing Curricula. IEEE Computer
Society Press and ACM Press, Retrieved from http://www.acm.org/education/curricula.html
Computing Curricula: the overview report. 2005. IEEE CS, ACM Joint Task Force on Computing
Curricula. IEEE Computer Society Press and ACM Press, Retrieved from
http://www.acm.org/education/curricula.html
Corbett, A. T., and Anderson, J. R. 1990, July. The effect of feedback control on learning to program with
the LISP tutor. In Proceedings of the Twelfth Annual Conference of the Cognitive Science Society. pp.
796-803.
Corbett, A. T., and Anderson, J. R. 2001, March. Locus of feedback control in computer-based tutoring:
Impact on learning rate, achievement and attitudes. In Proceedings of the SIGCHI conference on
Human factors in computing systems. pp. 245-252. ACM.
Daly, C., & Waldron, J. 2004, March. Assessing the assessment of programming ability. In ACM SIGCSE
Bulletin. 36, 1, 210-213. ACM.
DARPA 2010. Computer Science - Science, Technology, Engineering, and Mathematics (CS-STEM)
Education Research Announcement (RA). DARPA-RA-10-03.
Denny, P., Luxton-Reilly, A., and Tempero, E. 2012. All syntax errors are not equal. In Proceedings of the
17th ACM Annual Conference on Innovation and Technology in Computer Science Education
(ITiCSE’12). 75–80.
Denny, P., Luxton-Reilly, A., Tempero, E., and Hendrickx, J. 2011. Understanding the syntax barrier
fornovices. In Proceedings of the 16th Annual Joint Conference on Innovation and Technology in
Computer Science Education (ITiCSE’11). 208–212.
Denzin, N. K. and Lincoln, Y. S., (Eds.). 1998. The landscape of qualitative research: Theories and issues.
Sage.
Du Boulay, B. 1986. Some Difficulties of Learning to Program. Journal of Educational Computing
Research, 2, 1, 57–73.
Ecclestone, K. 1998, September. Just tell me what to do’barriers to assessment-in-learning in higher
education. In Scottish Educational Research Association Annual Conference, University of Dundee. 25-
26.
Ehlert, A., and Schulte, C. 2009, August. Empirical comparison of objects-first and objects-later. In
Proceedings of the fifth international workshop on Computing education research workshop. pp. 15-26.
ACM.
Goldwasser, M. H., and Letscher, D. 2008. Using Python To Teach Object-Oriented Programming in CS1.
Innovation and Technology in Computer Science Education. June, 30-2.
Grandell, L., Peltomäki, M., Back, R. J., and Salakoski, T. 2006, January. Why complicate things?:
introducing programming in high school using Python. In Proceedings of the 8th Australasian
Conference on Computing Education-Volume 52. pp. 71-80. Australian Computer Society, Inc
Guzdial, M. 2009. Education Teaching computing to everyone. Communications of the ACM, 52(5), 31-33.
Higgins, R., Hartley, P., & Skelton, A. (2002). The conscientious consumer: reconsidering the role of
assessment feedback in student learning. Studies in Higher Education, 27, 1, 53-64.
Holmboe, C., McIver, L., and George, C. 2001, April. Research agenda for computer science education. In
Proceedings of the 13th Workshop of the Psychology of Programming Interest Group (pp. 207-233.
Bournemouth University, London, England.
Irons, A. 2010. An Investigation into the Impact of Formative Feedback on the Student Learning Experience
(Doctoral dissertation, Durham University).
Jadud, M. C. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education,
15, 1, 25-40.
Kasurinen, J., and Nikula, U. 2007, October. Lower dropout rates and better grades through revised
course infrastructure. In Proceedings of the 10th IASTED International Conference on Computers and
Advanced Technology in Education. pp. 152-157. ACTA Press.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:28 T.Koulouri et al.

Kölling, M. The problem of teaching object-oriented programming. Journal of Object Oriented


Programming.
Koshy, V. 2009. Action research for improving educational practice: a step-by-step guide. Sage.
Kummerfeld, S. K., & Kay, J. 2003, January. The neglected battle fields of syntax errors. In Proceedings of
the fifth Australasian conference on Computing education. 20. 105-111. Australian Computer Society,
Inc.
Lee, C. B. 2013, March. Experience report: CS1 in MATLAB for non-majors, with media computation and
peer instruction. In Proceeding of the 44th ACM technical symposium on Computer science education
35-40. ACM.
Lilley, M., and Barker, T. 2007. Students’ Perceived Usefulness of Formative Feedback for a Computer-
adaptive Test. In Electronic Journal of e-learning. 5, 31-38.
Lillis, T., & Turner, J. 2001. Student writing in higher education: contemporary confusion, traditional
concerns. Teaching in Higher Education, 6, 1, 57-68.Linn, M. C., and Dalbey, J. 1989. Cognitive
consequences of programming instruction. Studying the novice programmer, 57-81.
Lister, R., Adams, E. S., Fitzgerald, S., Fone, W., Hamer, J., Lindholm, M., ... and Thomas, L. 2004. A
multi-national study of reading and tracing skills in novice programmers. ACM SIGCSE Bulletin, 36, 4,
119-150.
Ma, L., Ferguson, J., Roper, M., and Wood, M. 2007, March. Investigating the viability of mental models
held by novice programmers. In ACM SIGCSE Bulletin, Vol. 39, No. 1, pp. 499-503. ACM.
Mannila, L., and de Raadt, M. 2006, February. An objective comparison of languages for teaching
introductory programming. In Proceedings of the 6th Baltic Sea conference on Computing education
research: Koli Calling 2006. pp. 32-37. ACM.
Mannila, L., Peltomki, M., and Salakoski, T. 2006. What about a simple language? Analyzing the
difficulties in learning to program. Computer Science Education, 16, 3, 211–227.
McCracken, M., Almstrum, V., Diaz, D., Guzdial, M., Hagan, D., Kolikant, Laxer, C., Thomas, L., Utting, I.,
and Wilusz, T. 2001. A multi-national, multi-institutional study of assessment of programming skills
of first-year CS students. ACM SIGCSE Bulletin, 33, 4, 125-180.
McDowell, C., Werner, L., Bullock, H. E., & Fernald, J. 2003, May. The impact of pair programming on
student performance, perception and persistence. In Proceedings of the 25th international conference
on Software engineering. 602-607. IEEE Computer Society.
McGettrick, A., Boyle, R., Ibbett, R., Lloyd, J., Lovegrove, G., and Mander, K. 2004. Grand challenges in
computing - education. British Computer Society.
Miller, B. N., and Ranum, D. L. 2005, April. Teaching an introductory computer science sequence with
Python. In Proceedings of the 38th Midwest Instructional and Computing Symposium, Eau Claire,
Wisconsin, USA.
Miller, B., and Ranum, D. 2006. Freedom to succeed: a three course introductory sequence using Python
and Java. Journal of Computing Sciences in Colleges, 22, 1, 106-116.
Molina, M., Castro, E., & Castro, E. 2007. Teaching experiments within design research. The International
Journal of Interdisciplinary Social Sciences, 2, 4, 435-440.
Moreno, R. 2004. Decreasing cognitive load for novice students: Effects of explanatory versus corrective
feedback in discovery-based multimedia. Instructional science, 32, 1-2, 99-113.
National Audit Office. Staying the course: The retention of students in higher education. 2007. Report by
the National Audit Office, 44. Retrieved from http://www.nao.org.uk/report/staying-the-course-the-retention-of-
students-in-higher-education/
Necaise, R. D. 2008. Transitioning from Java to Python in CS2. Journal of Computing Sciences in Colleges,
24, 2, 92-97.
Nikula, U., Gotel, O., and Kasurinen, J. 2011. A motivation guided holistic rehabilitation of the first
programming course. ACM Transactions on Computing Education (TOCE), 11, 4, 24.
Oldham, J. D. 2005. What happens after Python in CS1?. Journal of computing sciences in colleges, 20, 6,
7-13.
Palumbo, D. B. 1990. Programming language/problem-solving research: A review of relevant issues.
Review of Educational Research, 60, 1, 65-89.
Patterson-McNeill, H. 2006. Experience: from C++ to Python in 3 easy steps. Journal of Computing
Sciences in Colleges, 22, 2, 92-96.
Pears, A., and Malmi, L. 2009. Values and objectives in computing education research. ACM Transactions
on Computing Education (TOCE), 9, 3, 15.Pears, A., Seidman, S., Malmi, L., Mannila, L., Adams, E.,
Bennedsen, J., Devlin, M., and Paterson, J. (2007, December. A survey of literature on the teaching of
introductory programming. In ACM SIGCSE Bulletin ACM. 39, 4, 204–223.
Polya, G. 1973. How to solve it: A new aspect of mathematical method. Princeton university press.
Porter, L., Bailey Lee, C., Simon, B., Cutts, Q., & Zingaro, D. 2011, June. Experience report: a multi-
classroom report on the value of peer instruction. In Proceedings of the 16th annual joint conference on
Innovation and technology in computer science education. 138-142. ACM.
Quality Assurance Agency for Higher Education. 2003. Learning from subject review. Retrieved from

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches

http://www.qaa.ac.uk/Publications/InformationAndGuidance/Documents/learningFromSubjectReview.
pdf
R Development Team. 2010. R: A language and environment for statistical computing. R Foundation
Statistical Computing.
Radenski, A. 2006, June. Python First: A lab-based digital introduction to computer science. In ACM
SIGCSE Bulletin . ACM. 38, 3, 197–201.
Randolph, J., Julnes, G., Sutinen, E., and Lehman, S. 2008. A methodological review of computer science
education research. Journal of Information Technology Education: Research, 7. 1, 135-162.
Reeves, T. 2011. Can educational research be both rigorous and relevant. Educational Designer, 1, 4, 1-24.
Reges, S. 2006. Back to basics in CS1 and CS2. ACM SIGCSE Bulletin, 38. 1, 293-297.
Renumol, V. G., Janakiram, D., and Jayaprakash, S. 2010. Identification of Cognitive Processes of
Effective and Ineffective Students During Computer Programming. ACM Transactions on Computing
Education (TOCE), 10. 3, 10.
Robins, A., Rountree, J., and Rountree, N. 2003. Learning and teaching programming: A review and
discussion. Computer Science Education, 13. 2, 137-172.
Rogalski, J., and Samurçay, R. 1990. Acquisition of programming knowledge and skills. Psychology of
programming, 18, 157-174.
Rosson, M. B., Carroll, J. M., and Sinha, H. 2011. Orientation of undergraduates toward careers in the
computer and information sciences: Gender, self-efficacy and social support. ACM Transactions on
Computing Education (TOCE), 11, 3, 14.
Rust, C. 2002. The Impact of Assessment on Student Learning How Can the Research Literature
Practically Help to Inform the Development of Departmental Assessment Strategies and Learner-
Centred Assessment Practices?. Active learning in higher education, 3, 2, 145-158.
Shannon, C. 2003, February. Another breadth-first approach to CS I using python. In ACM SIGCSE
Bulletin. Vol. 35, No. 1, pp. 248-251. ACM.
Shute, V. J. 2008. Focus on formative feedback. Review of educational research, 78, 1, 153-189.
Siegfried, R. M., Chays, D., and Herbert, K. G. 2008, July. Will there ever be consensus on cs1. In Proc.
2008 International Conference on Frontiers in Education: Computer Science and Computer
Engineering–FECS. Vol. 8, pp. 18-23.
Simon, B., Kohanfars, M., Lee, J., Tamayo, K., & Cutts, Q. 2010, March. Experience report: Peer
instruction in introductory computing. In Proceedings of the 41st ACM technical symposium on
Computer science education. 341-345. ACM.
Slavin, R. E. 2003. A reader's guide to scientifically based research. Educational Leadership, 60, 5, 12-16.
Sloan, R. H., & Troy, P. 2008, March. CS 0.5: a better approach to introductory computer science for
majors. In ACM SIGCSE Bulletin . 40, 1, 271-275. ACM.
Sleeman, D., Putnam, R. T., Baxter, J., & Kuspa, L. 1988. An introductory Pascal class: A case study of
students' errors. Teaching and Learning Computer Programming: Multiple Research Perspectives. RE
Mayer. Hillsdale, NJ, Lawrence Erlbaum Asociates, 237-257.
Stamey, J., and Sheel, S. 2010. A boot camp approach to learning programming in a CS0 course. Journal of
Computing Sciences in Colleges, 25, 5, 34-40.
Stefik, A., & Siebert, S. 2013. An empirical investigation into programming language syntax. ACM
Transactions on Computing Education (TOCE), 13, 4, 19.Stenhouse, L. 1975. An Introduction to
Curriculum Research and Development. London, Heinemann.
Talton, J. O., Peterson, D. L., Kamin, S., Israel, D., and Al-Muhtadi, J. 2006, March. Scavenger hunt:
computer science retention through orientation. In ACM SIGCSE Bulletin. Vol. 38, No. 1, pp. 443-447.
ACM.
Tu, J. J., and Johnson, J. R. 1990. Can computer programming improve problem-solving ability?. ACM
SIGCSE Bulletin, 22, 2, 30-33.
Turner, S., and Hill, G. 2007. Robots in problem-solving and programming. In 8th Annual Conference of
the Subject Centre for Information and Computer Sciences.
U.S. Department of Education. 2001. No Child Left Behind Act, Retrieved from
http://www2.ed.gov/policy/elsec/leg/esea02/index.html
Vilner, T., Zur, E., and Gal-Ezer, J. 2007, June. Fundamental concepts of CS1: procedural vs. object
oriented paradigm-a case study. In ACM SIGCSE Bulletin. Vol. 39, No. 3, pp. 171-175. ACM.
Weaver, M. R. 2006. Do students value feedback? Student perceptions of tutors’ written responses.
Assessment & Evaluation in Higher Education, 31, 3, 379-394.
Zelle, J. M. 1999, March. Python as a first language. In Proceedings of 13th Annual Midwest Computer
Conference. Vol. 2.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

You might also like