Teaching Introductory Programming a Quantitative Evaluation of Different Approaches
Teaching Introductory Programming a Quantitative Evaluation of Different Approaches
Different Approaches
Teaching programming to beginners is a complex task. In this paper, the effects of three factors – choice of
programming language, problem-solving training and the use of formative assessment – on learning to
program were investigated. The study adopted an iterative methodological approach carried out across
four consecutive years. To evaluate the effects of each factor (implemented as a single change in each
iteration) on students’ learning performance, the study used quantitative, objective metrics. The findings
revealed that using a syntactically-simple language (Python) instead of a more complex one (Java)
facilitated students’ learning of programming concepts. Moreover, teaching problem-solving before
programming yielded significant improvements in students’ performance. These two factors were found to
have variable effects on the acquisition of basic programming concepts. Finally, it was observed that
effective formative feedback in the context of introductory programming depends on multiple parameters.
The paper discusses the implications of these findings, identifies avenues for further research and argues
for the importance of studies in computer science education anchored on sound research methodologies to
produce generalizable results.
Categories and Subject Descriptors: K.3.2 [Computers and Education]: Computer and Information
Science Education.
General Terms: empirical studies, teaching/learning strategies, novice programmers, learning
programming, CS1
1. INTRODUCTION
The past decade has witnessed a boom in industry demand for computing expertise,
driven by the dramatic expansion of computing use, the growing influence of
computing on the economy and the constant influx of new technologies into everyday
life [CC2005 2005]. Yet, while the demand for computing graduates is growing,
university Computer Science departments in the UK [National Audit Office 2007]
and USA [DARPA-RA 2010] report declining enrollment and high attrition rates on
their degree programs. Dropout and failure rates are exacerbated during and
between the first and second years of these programs, reaching as high as 30-40%
[Beaubouef and Mason 2005] and meaning that even where students begin studying
for Computer Science degrees, a high proportion do not go on to graduate.
These high attrition rates are associated with considerable failure rates in
introductory programming courses and/or disenchantment with programming
[McGettrick et al. 2004, in Grand Challenges in Computing: Education; Nikula et al.
2011]. This disenchantment persists with graduates “expressing a dislike of
programming and reluctance to undertake it” [McGettrick et al. 2004 p.12; see also
Ma et al. 2007]. This is also reflected in the outcomes achieved by students, with
large, multinational studies reporting that even at the conclusion of their
introductory programming courses, a large number of students show substandard
performance in elementary programming tasks [McCracken et al. 2001; Lister et al.
2004].
The tension between contemporary society’s need for a supply of skilled computer
scientists who can program and the manifest issues in programming achievement
demonstrated by many students in degree programs presents Higher Education
Institutions with opportunities and challenges in developing skilled computer
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY
XX:2 T.Koulouri et al.
scientists. Core to any computer science education is the nurturing of the ability to
develop well-designed software, and all of the attendant conceptual and practical
skills that underpin this activity. Programming is the practical mechanism through
which these skills and abilities are realized. It forms an essential part of any
computer science (CS) degree and, as already noted, its teaching typically begins in
the first year of a degree program, with an introductory programming course (often
referred to as CS1). The approach to beginning programming at this early stage in
CS degrees is supported by students, CS and non-CS faculty and near-time
employers [CC2001 2001].
The changes to the computing industry associated with, inter alia, new
technologies, and the problems associated with learning to program, have led to
Higher Education Institutions facing the pressing need to rethink their CS curricula,
with special attention given to redesigning the CS1 course, which then has wider
effects. Developing or revising an introductory programming course has a key role in,
and impact on, the wider curriculum. An introductory programming course prepares
students for, and underpins, subsequent courses, which are typically also revised in
response to changes to CS1; and the overall degree prepares students for their future
careers.
The complexity of teaching introductory programming is widely acknowledged
among educators [Robins et al. 2003], being listed as one of the seven challenges in
Computer Science Education (CSE) [McGettrick et al. 2004]. It is also understood
that learning programming is a complex and multifaceted process. Novice
programmers show a fragile ability to take a problem description, decompose it into
subtasks, and implement them [McCracken et al. 2001]. They also have difficulty in
tracing, reading and understanding pieces of code and fail to grasp basic
programming principles and routines [Lister et al. 2004]. The overhead of learning
the syntax and semantics of a language at the same time, and difficulties in
combining new and previous knowledge and developing their general problem-solving
skills, all add to the complexity of learning how to program [Linn and Dabley 1989].
Taken together, these issues highlight the importance of identifying both the
appropriate content and pedagogy for CS1 in order to prepare students for the rest of
the degree programs and provide the foundation for effective computing careers.
Different approaches to teaching programming which reflect the mix of pedagogy and
content issues have been set out, notably in the IEEE and ACM Joint Task Force on
Computer Curricula [CC2001 2001].
Debates around the effectiveness of these approaches to teaching introductory
programming are well-established, of course. Pears et al. [2007], for example,
performed a large literature review on the teaching of introductory programming
which led to the identification of four major categories relating to course development:
curriculum; pedagogy; language choice; and tools for supporting learning. They
reviewed studies in each of these four areas and discussed several unresolved issues
before making recommendations in relation to them. First, they noted that there was
a conspicuous lack of an accepted framework/methodology to guide the processes of
planning, designing, developing, revising and implementing CS1 courses. Second,
they pointed out that empirical data from small-scale studies are abundant in the
literature, but that they rarely inform these processes. Pears et al. [2007] concluded
that larger-scale systematic studies that provide significant new empirical results are
needed; and that such research has the potential to provide valid and applicable
recommendations for educators regarding what and how to teach novice
programmers. Taking this argument as our starting point, the aim of this paper is to
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
2. LITERATURE REVIEW
One of the major issues influencing the design or restructuring of a CS1 course is the
choice of programming language, covered in the IEEE and ACM Joint Task Force on
Computer Curricula [CC2001 2001] within the context of “implementation strategies”.
CC2001 [2001 p.24] describes and discusses the three different, typical approaches
for introductory programming courses as follows: ‘“programming-first”
implementations are an imperative-first approach that uses the traditional
imperative paradigm; an “objects-first” approach emphasizes early use of objects and
object-oriented design; and a “functional-first” approach introduces algorithmic
concepts in a language with a simple functional syntax.’
Arguments have been made for the value of each approach. For example, the
advantage of the objects-first approach has been suggested to relate to the
importance of object-oriented programming in industry, with early exposure to
object-oriented principles and concepts considered an advantage for students.
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:4 T.Koulouri et al.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
survey administered to students who made use of the study pack revealed positive
perceptions of Python as a programming language. Along the same lines, empirical
studies by Patterson-McNeill [2006], Stamey and Sheel [2010], Miller and Ranum
[2005], Oldham [2005], Goldwasser and Letscher [2008] and Shannon [2003] detail
the choices involved in redesigning the curriculum and the shift from one language to
another in their colleges and argue for dramatic improvements with the use of
Python, but no explicit evaluation results are reported in any of these studies.
From the wider perspective of computing education research, Pears and Malmi
[2009] argued that a large part of the literature to date has lacked methodological
rigor. In particular, an extensive methodological review of 352 papers concluded that
the majority of published findings cannot be considered reliable, and, as a result,
many educators and researchers are forced to “reinvent the wheel” (as also suggested
by Almstrum et al. [2005]). The review found that: one-third of studies did not use
human participants or any type of analysis, but presented descriptions of
interventions and anecdotal evidence, while the rest of the studies employed mostly
questionnaires. Random sampling and control groups were rarely used, and even
when quantitative approaches were adopted, only one-third of these studies
performed robust statistical analysis. Finally, more than half of the studies did not
adequately describe their methods and procedures, while around 25% of them did not
state research questions and/or reviewed existing literature [Randolph et al. 2008].
While this report paints a grim picture of the field of computing education research,
there has been a recent paradigm shift towards more systematic and focused
investigations [Pears and Malmi 2009]. Notable examples include a study by Nikula
et al. [2011]; building on a clear methodological framework, the Theory of
Constraints, a five-year longitudinal study was undertaken in which problems were
diagnosed, and suitable interventions (mainly targeting course and student
motivational problems) were designed, implemented, and evaluated through pass
rates. Remarkable methodological rigor can also be found in a paper by Stefik and
Siebert [2013] that reported four large-scale empirical studies involving randomized
controlled trials. Relevant to the research focus of the present study, their analysis
indicated that novice programmers perform better with Python-style syntax
compared to C-style syntax and identified syntactic elements that are most
problematic for learners. Beck and Chizhik [2013] also employed a robust
methodology in order to compare traditional teaching methods and cooperative
learning techniques using control groups and final grades.
Taken together, the review of the relevant literature clearly demonstrates that
there are teaching approaches and programming languages that offer opportunities
for better supporting teachers and learners. However, the evaluation results
associated with the approaches implemented in these studies are, variously, missing,
partial, based only on qualitative data or have not been validated through statistical
analysis. Moreover, most of these studies present the results of designing and
implementing the new approach during one academic year in comparison with the
results of the previous year’s approach. Therefore, it is argued that while such
studies may provide invaluable insights into good practices and innovative teaching
techniques, if they do not select and apply a suitable research framework, they
remain experience accounts, bound to a specific context, and may not be useful for
offering reliable guidelines applicable to other Higher Education situations. Finally,
the lack of conclusive findings in the field may be attributed, as underlined in Ehlert
and Schulte [2009] to the lack of standard measures based on which the “old” and
“new” teaching approaches can be fairly compared. Following this line of argument,
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:6 T.Koulouri et al.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
4. RESEARCH METHODOLOGY
Central to this work was the selection and application of a clear research
methodology. The nature of this research and the issues it aims to address were best
served by a controlled experimental design. The appeal of such approach lies in its
focus on hypothesis testing, statistical analysis of quantitative data, and fixed
variables and procedures that can be replicated. Indeed, it is argued that educational
studies that are not heuristic/exploratory, but have some very specific research
questions or hypotheses (“is [teaching approach A] better than [teaching approach B]
for novice programmers?”) could benefit from using the type of controlled trials
traditionally used in bio-medical sciences [Stefik and Siebert 2013]. At the same time,
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:8 T.Koulouri et al.
the work also drew on studies within the tradition of Action Research and Design-
based Research, which embrace the complexity of real-world learning environments
that resist full experimental control. These approaches were developed to guide
formative research, to test and refine educational designs and interventions based on
theoretical findings from previous research. Their aim is not only to refine
educational practice, but also to refine theory and to contribute to the field [Collins
et al. 2004].The research philosophy underpinning our efforts has been one of
“progressive refinement”, as also described and applied in Action Research and
Design-based Research studies. This approach is cyclic and involves the formulation
of a hypothesis, the design of an intervention, experimentation in the classroom, the
analysis of the collected of data, the formulation of a new hypothesis, the design of a
new intervention and so on [Molina, et al. 2007]. Within this paradigm, researchers
and/or practitioners initiate a project because they have diagnosed a problematic
situation, such as an ill-suited curriculum. The diagnosis is often triggered by an
external event, such as abnormally poor grades or feedback provided by students.
Indeed, the efforts described in this paper were triggered by the high failure rate of a
student cohort (which was above 36%, as shown in Table VIII). Subsequently, the
formulation of the hypothesis to be tested is based on experience, previous research
and theoretical principles, as analyzed by the team of practitioners and researchers
involved.
For this research, the body of findings discussed in Section 2, suggesting that the
complexity of Java may be overwhelming for learners and favoring Python as a
suitable teaching language, led to the development of the first hypothesis (presented
in Section 4.2). The second hypothesis and related intervention were also the
products of careful review of literature indicating the pedagogical benefit of formative
feedback. This iteration cycle is discussed in Section 4.3. Finally, as described in
Section 4.4, the intervention of the third iteration targeted the problem-solving skills
of learners, exploring the hypothesis of a correlation between problem-solving and
programming, informed by analysis of existing literature.
As detailed in Section 4.5, the study involved four experimental groups of CS1
students, which corresponded to four full student cohorts in four consecutive years: a
control group that were taught using Java; a group that were taught using Python; a
group that received formative feedback; and a group that received initial problem-
solving training.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
Java was used as the introductory programming language with the previous cohort of
students (the “control group” for the study). However, as indicated by the findings
considered in Section 2, the use of Java to introduce students to the basic aspects of
programming may be problematic, not least because Java is heavily coupled with
object-oriented concepts which this may interfere with the basic aim of an objects-
later strategy. Java “forces” some of the more advanced concepts into the foreground
– concepts which teachers do not typically want to introduce at an early stage
[Kölling 1999]. As a consequence, a student’s focus may switch from learning the
basic programming concepts to learning the language’s syntax. Drawing on the
evidence that Java may not be well-suited for education, especially when introducing
programming to novices [Siegfried et al. 2008], the use of an alternative
programming language with a lower syntactic burden was considered. As discussed
in the previous sections of the paper, several educators have argued that scripting
languages could offer a more effective alternative. Python is one example, offering a
simple and expressive language with support for procedural programming. It can be
argued that the lower overhead associated with Python should provide a gentler
introduction to the basic concepts of programming. By using Python, a greater
emphasis on core principles was expected with less of an unwanted focus on syntax.
As already noted, Python is also widely used in industry and is therefore considered
to be attractive to students. In Figure 1, a basic program to print “Hello World” is
written in Java and Python and exemplifies the syntactic and semantic differences
between the two languages. The program written in Java requires elements such as
access modifiers, classes, methods, keyword static, type, array, method parameter
and type etc, which the Python program omits.
For all of these reasons, Python was selected as the programming language to replace
Java in the first iteration where, in effect, we observed the impact of the complexity
of programming languages. The associated research hypothesis is provided below:
The experimental design required a “common basis of comparison”. This meant that
a large number of learning and teaching parameters needed to be equal or equivalent
between cycles. In particular, in all iterations, the same programming concepts were
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:10 T.Koulouri et al.
taught in the same order (see Section 4.6). This was only possible for the first 10
weeks of instruction. As such, this period formed the basis on which the four cohorts
would be compared. Similarly, the assessment, which involved a one-hour lab test in
week 11, marking procedures, and module organization were kept constant (see
Section 4.7).
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
It should be noted that the same tasks were given to students in the previous
iteration. To minimize the likelihood of this activity being perceived as a form of
monitoring students’ outcomes, no grading was associated with it. However, a
consequence of the non-compulsory element of the activity was that the submission
rate was not as high as desired; only 50% of the students engaged at least once in the
process.
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:12 T.Koulouri et al.
that consisted of a sequence of actions for the robot, carry out the plan and review the
outcome of their strategies. The session did not touch on programming topics such as
loops, conditionals and arrays. For this last iteration, the impact of introducing these
problem-solving tutorials was again assessed using the same approach as for the
other cycles.
The research hypotheses tested in the study are consolidated in Table II.
Table II. Research Hypotheses. The overarching hypotheses of the study and those formulated for each
iteration of the study.
Hypotheses
4.5 Participants
The groups.
Four distinct groups of students took part in the different iterations of the study; the
experience of each group is outlined below.
G0: For this group, during the 10 weeks before the test, teaching emphasized
mastery of basic programming skills with Java as the implementation language. In
particular, lecture and laboratory sessions and exercises focused on loops,
conditionals, use of libraries and packages (for example, to generate random numbers,
for input and output, etc.) (the complete list of topics can be found in Table III).
These students used BlueJ as the learning environment. This cohort served as the
control group for the study. The size of this cohort was 157 students (with 19% being
female).
G1: The G0 and G1 groups had essentially the same structure to their
teaching/learning experience, with the programming language used being the only
difference for the groups. Python, rather than Java, was used for the G1 cohort.
These students used Python IDLE as the learning environment. This group consisted
of 195 students (with 18% being female).
G2: The G1 and G2 groups had essentially the same structure to their
teaching/learning experience. Throughout the term, the groups were given weekly,
ungraded tasks (exercises to be implemented using Python), but students in the G2
group were encouraged to also submit their solutions. Formative feedback was
provided for the submitted work. Task submission was not compulsory. The size of
this group was 193 students (25% being female).
G3: The G3 group had essentially the same structure to their teaching/learning
experience as the G2 group but had an additional three-hour problem solving “crash
course” before the beginning of the module. The G3 group consisted of 216 students
(with 22% being female).
Student background.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
The four groups corresponded to four full cohorts of students from consecutive years,
enrolled in the first year of the suite of computing degrees offered at a single
university. The students comprising the groups had comparable backgrounds and
levels of prior programming proficiency. Within a cohort, individual students may
have had greater proficiency than others, but this was true in equal measure for each
group. As such, the research assumed that the average levels of initial programming
proficiency for each student cohort were similar. This assumption was supported by
the fact that the admissions criteria applied for the particular programs of study
were the same for the duration of the study and the admission qualifications profiles
of the cohorts were very similar. No student was in more than one cohort.
Organization
The organization of the module was one of the fixed parameters of the experimental
design. The module consisted of weekly lectures (duration: 1 hour), laboratory
sessions (duration: 2 hours). While the whole cohort attended the same lecture
session, for the laboratory sessions the cohort was split into three smaller groups.
Each week, the lecture preceded the laboratory session and introduced the
topic/concept. During the laboratory session, students were asked to complete self-
paced tutorials. These tutorials provided short explanations of the concepts and
instructions on how to implement, run and debug simple programs based on the
concepts. Depending on the cohort, students used BlueJ or Python IDLE as the
interactive development environment (see Sections 4.2-4.4). The tutorials also
included exercises – simple and more challenging tasks – without solutions.
Depending on the cohort, students had the opportunity to submit their solutions and
receive feedback. Students also used the university e-learning environment which
provided access to resources such as lecture notes, the laboratory tutorials,
discussion forums and additional learning materials.
Content
Table III shows the module content and teaching sequence followed. The topics, the
sequence, and the depth of coverage of topics were kept constant across the groups.
As can be seen from Table III, seven topics were taught over a period of 10 weeks.
Then, in week 11, students took a formal assessment (as explained in Section 4.7).
It should be noted that the implementation strategy followed in all groups
corresponded to an “objects-later” approach; that is, fundamental programming
constructs were taught without any explicit reference to object-oriented programming,
or to language-specific elements (such as types and access modifiers). While object-
oriented programming aspects may “creep in” in even simple programs in Java,
students were not expected to understand or produce any such element.
In week 12, all groups were formally introduced to object-oriented principles, but
their curriculum was no longer comparable given that the three “Python” groups (G1-
G3) were being exposed to Java for the first time at this point.
Table III. Module Content and Teaching Sequence. The topics taught in each session.
Stage Topics
T1. Overview
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:14 T.Koulouri et al.
Teaching Staff
The teaching team consisted of a module leader, two supporting lecturers and six
graduate teaching assistants. The module leader delivered all lectures covering the
seven fundamental topics during the first 10 weeks (with the supporting lecturers
being involved later in the term). The module leader and supporting lecturers ran
and coordinated the laboratory sessions, and along with the graduate teaching
assistants, they provided one-to-one help to students. The module leader and
supporting lecturers were the same for every cohort, with G0 being the first cohort
taught by the team. Not all of the graduate teaching assistants remained until the
end of the four-year study. Finally, only the module leader was involved in all of the
marking/grading to ensure consistency.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
repeated the print statement 20 times. For the second example (‘generate two random
numbers Y and X and print the value of Y if Y < X’), possible solutions included:
initiating Y and X with the student’s own values instead of random values;
generating the random values but printing all values of Y, instead of when Y < X; etc.
Grading
Each task in the lab test was simple enough to have a “perfect solution”. As noted
above, this solution would require the use of one of the taught concepts or a
combination of them. Full marks were awarded if a student submitted a program
which reflected the perfect solution. Half marks were awarded if a student submitted
a program which produced the correct output, but without using the anticipated
concept. For example if the task was ‘print the string “Hello World” 20 times’, a
solution which included a loop that repeated the print statement 20 times would
receive full marks, while a solution that contained 20 print statements would receive
half marks. The grading process involved two stages. First, an auto grader filtered
out the programs that did not run (because of run-time or compile-time errors) and
the programs that gave incorrect output (semantic errors). Then, a human marker
(the module leader) examined the programs that ran and gave correct output and
awarded marks based on the criteria set out above.
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:16 T.Koulouri et al.
These concepts were: looping, conditionals and library use. The additional advantage
of selecting these concepts was that their building blocks are discrete keywords
(looping: for, while and do; conditionals: if, case, ?: operator; library use: import),
which can be located and counted in a program without human intervention and
effort. Therefore, the frequencies of appropriate use of these constructs in the
assessment were used as indicators. Simply put, a task in the lab test was
constructed such that its perfect solution would include the use of particular
concept(s) taught (for instance, a loop and/or a conditional). If the student used the
associated construct (“if”, “for”, “while”, etc.) in his/her solution, this could be
indicative of the student understanding the concept and when it should be used. This
type of indicator was independent of syntactic elements and errors. It should be
noted that the exam tasks were simple enough to have model answers, and if a model
answer required one type of construct (for example, one conditional), but the
submitted program contained more than one, this would not increase the count.
Bugs
Bugs were the second type of indicator used in this study. Bugs refer to logic,
compile-time and run-time errors. However, given the level of the students, the
content of the 10 weeks of instruction, and of the exam content, the bugs present in
student submissions were largely limited to compile-time errors. So, in effect, this
metric solely consisted of syntax errors. This resonates with previous research
findings that indicate that the majority of student errors in CS1 are compile-time
errors, and, in particular, syntax errors. Jadud’s [2005] analysis found that the three
most common student errors are, in fact, rather “trivial”: semicolons, typographic
errors in variable names and bracketing (accounting for 42% of errors found in
students’ work). The study also reported that these mistakes are made repeatedly.
Syntax errors are an important area of programming pedagogy research.
Experienced programmers rarely make syntax errors, and when they do, they have
clear strategies to correct them very quickly. However, syntax errors are significant
for novice programmers [Sleeman et al. 1988; Kummerfeld and Kay 2003]; correcting
them is a time-consuming process and often leads to random debugging behavior,
also influenced by the fact that students do not understand compiler messages
[Kummerfeld and Kay 2003]. It is recognized that logical and run-time errors are
much deeper, more important and difficult to correct. However, a novice programmer
has to get past compiler errors first.
As suggested above, failure to use the correct construct for a task – such as a loop
to solve a loop problem – is indicative of inadequate knowledge/learning, but this is
not the only type of error based on inadequate knowledge/learning Kummerfeld and
Kay [2003] indicated that the ability to efficiently fix syntax errors necessitates
knowledge of the programming language in order to understand error messages.
Taken together, it is argued that failure to correct simple syntax errors is also
indicative of inadequate knowledge/learning, based on the basic assumption that a
competent student should be able to locate and correct simple errors within a time
limit.
While the number of syntax errors in a programme is a valuable indicator, its use
as a metric is problematic when comparing the effectiveness and suitability of
teaching languages since some languages, like Java and C++, have more elaborate
syntax. As such, it is argued that syntax errors as a metric of programming and
debugging ability should not be used in isolation, but in conjunction with other
metrics, like the ones proposed earlier in this section.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
While arguing these four metrics are useful and appropriate for the study, the
paper is not suggesting that this is a exhaustive set with higher intrinsic value than
any others. Rather, for the specific CS1 content of the 10 weeks, which was the basis
of comparison of the four iterations/years, these metrics were considered adequately
consistent, robust and objective to assess learning and level of understanding of
programming of CS1 students to that point in their degree programmes. Assessing
the ability of CS2 or CS1 term 2 students would have demanded a different set of
metrics; most probably, a larger number of, and more sophisticated and fine-grained,
metrics to correspond to students’ more advanced knowledge.
5. RESULTS
Table IV shows a summary of the measurements for each key indicator relative to
each cohort. In particular, the cells in the table contain the number of programs that
had one (or more) bugs, loop, conditional and import statements. The values in the
last column indicate the sample size of each cohort.
Table IV. Observed Frequencies Summary. Columns show the frequencies of programs containing a key
indicator. Rows show the measured values for each cohort. The last column indicates the total size of each
cohort.
G0 27 55 56 90 157
Table V. Group Comparisons. The groups compared for testing each hypothesis are indicated.
Hypotheses Groups compared
Ha0_1 G0 vs. G1
Ha1_2 G1 vs. G2
Ha2_3 G2 vs. G3
Table VI. χ-squared Analysis Summary. This table shows the p-values obtained by the separate χ-squared
analyses performed for each key indicator to test each hypothesis. The last column indicates the overall
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:18 T.Koulouri et al.
rejection decision for the hypothesis at a 0.05 significance level. A hypothesis is accepted if the majority of the
values for a key indicator are below the significance level
Hypothesis Loop Conditional Import Bugs Result
The results suggest that programming proficiency (as measured by the presence of
bugs and use of key programming concepts such as loops, conditionals and importing
libraries) depends on the module structure (Ha supported). Moreover, both
hypotheses Ha0_1 and Ha2_3 were confirmed while the analysis failed to support
Ha1_2. Therefore, the results suggest that the use of a simple programming
language such as Python can have a significant impact on novices learning how to
program. Also, while formative feedback did not provide an observable benefit,
problem-solving training resulted in an improvement of programming ability.
However, the frequencies of each key indicator in Table IV indicate a more
intricate pattern of differences; it appears that the magnitude of the effect of each
module change is different for each key indicator. These differences generate richer
questions with regards to which module changes influence and bring a greater
benefit to the use and understanding of a particular programming concept. Thus,
further analysis was undertaken in order to tease apart the individual effects on the
learning of programming concepts and to understand the underlying reasons behind
these differences.
The analysis involved the calculation of the differences between the frequency
values of each key indicator between two groups. This analysis aimed to elucidate
the individual effects on each key concept of: (i) programming language and (ii)
problem-solving. Since the module change implemented in G2 (formative feedback)
was not found to produce a significant effect, only groups G0, G1 and G3 were
considered in this analysis. The differences in the frequencies of each key indicator
between these groups (G0 vs. G1 and G1 vs. G3) were calculated. Each observed
frequency from Table IV above was normalized by turning the value into a
percentage of the total. Table VII shows the percentage difference for each key
indicator using the observed frequency values from Table IV.
Table VII. Calculated Percentage Differences. Columns show the percentage difference for each key indicator
using the observed frequency values from Table IV. Data from Table IV was normalized by turning each
frequency value into a percentage of the total.
Groups Loop(%) Conditional(%) Import(%) Bugs(%)
G1-G0 37 4 15 -24
G3-G1 14 33 16 -16
Table VII suggests that the impact of the programming language used is not the
same for each key indicator. In particular, when Python was introduced, replacing
Java as the teaching language, variations in the frequencies related to the use of
loops were higher than the variations related to conditionals. At the same time,
when problem-solving training was introduced, the difference in the use of
conditionals was the most pronounced.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
Final grades have been traditionally used in CSE research to measure the success
of an intervention. However, this study proposed and applied an alternative
combination of metrics/learning indicators. The fact that the results of both analyses
(of final grades and the identified set of metrics) coincide supports the validity of the
proposed metrics for evaluating whether learning outcomes have been achieved, but
it may also indicate that these metrics may hold predictive power for students’
overall performance. In this light, the proposed metrics may also be used as
monitoring mechanisms and to trigger intervention during the module.
6. DISCUSSION
There have been suggestions that Higher Education Institutions appear to be unable
to cope with increasing industry demands for graduates with programming skills.
This has been attributed to students’ dissatisfaction with, and failure rates in,
programming modules, which are pronounced in, and after, their first year, and to a
lack of adequate programming skills even after graduation. As such, universities are
impelled to rethink their introductory programming curricula. In order to undertake
such changes on the basis of a sound evidence base, the study reported in this paper
aimed to undertake rigorous research and provide robust evidence to underpin
recommendations with regards to content and methods for an introductory
programming (CS1) module. This section discusses the findings of the study in light
of related literature and distils them into recommendations for the design of CS1
curricula.
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:20 T.Koulouri et al.
[Holmboe et al. 2001; Berglund et al. 2006; Pears et al. 2007; Randolph et al. 2008]
point out that these studies are valuable as a medium for educators to share
experiences, but that their results are hard to generalize and use in other
educational contexts because they often lack a sound methodological framework.
These reviews emphasize that it is only by following a clear methodology in applying
and evaluating teaching techniques that research can offer solid and practical
contributions to the field of computer science education.
The study reported in this paper embraced principles of Action Research and
Design-based Research methodologies, which aim to improve practice and to learn
through action; planned change is implemented, monitored and analyzed in cycles.
These approaches recognize that teaching programming is a real-world situation that
contains multiple dependent variables that co-exist, although not all of them need to
be investigated. The study reported in this paper involved a process of progressive
refinement undertaken in three iteration cycles; in each cycle a single change was
identified, implemented and the effects of the change were evaluated. Each proposed
change was formulated as research hypothesis, and its evaluation was performed
based on statistical analysis of verifiably reliable data on measures associated with
programming ability. Unlike previous small-scale studies, this research was driven
by data obtained during four consecutive years and student cohorts (with 761
participants in total).
By adopting this approach, the study aimed to provide valid conclusions that may
help CS1 curriculum developers in different educational settings. It also attempted
to provide observations and guidelines that may be implemented as part of any
teaching paradigm adopted in an institution (functional, objects-oriented, or
imperative). Finally, even if the conclusions are not in line with an institution’s
strategies, this study serves to highlight the importance of approaching the issue of
teaching introductory programming as one would any other research problem; that is,
it should be built on careful consideration of published findings, and suitable, explicit
and rigorous experimentation and evaluation.
Much of the controversy in the field of education research concerns the issue of
generalizability. As argued above, approaches based on quantitative and reliable
measurement, large samples and valid statistical analysis, like the one reported in
this paper, tend to be less susceptible to such criticism. Still, quantitative studies are
associated with many limitations and misconceptions. In relation to educational
research, it may be problematic that quantitative studies do not seek to control or
interpret all variables that operate in the real-world educational setting being
analyzed, which may confound the success or contribution of a particular
intervention [Denzin and Lincoln 1998]. Like qualitative approaches, the
involvement of the experimenter in the practice may also introduce bias. As such, it
is important that researchers do not over-interpret results derived from quantitative
analysis, taking them at face value. Instead, such findings should be triangulated
with existing knowledge and, where possible, complemented by qualitative methods.
Finally, studies need to present clear and sufficient information regarding methods,
outcomes, assumptions and situational parameters, in order for educators to assess
the applicability of the observations to their own circumstances and for future
research to reliably review, compare and replicate them.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
language 1 , but some educators argue that languages like Java are too
syntactically/semantically rich and complex to serve as a pedagogical and learning
tool, and they instead recommend the use of educational languages in CS1 courses.
However, universities face the pressure and responsibility to select a language with
the greatest practical relevance as well as market appeal. As such, moving away
from such commercially-valuable languages may result in the failure to attract
students as the program looks less relevant from an employability perspective.
Therefore, such decisions should only be made after careful consideration of reliable
data and sources.
The analysis presented in this paper first confirmed the basic premise of this
research: the choice of language has an impact on the development of programming
skills of novices. Moreover, the analysis compared Java and Python, and revealed
that Python facilitated students’ learning of the fundamental programming concepts
and structures.
In effect, this paper makes the case for Python, a syntactically simple language,
which, at the same time, is not a purely educational language, since it is increasingly
being used in real-world applications, currently being ranked among the 10 most
popular languages in industry 2 . Python offers the possibility to write and run
programs without the notational overhead imposed by Java, because of the
straightforward syntax and development environment. Selecting Java as the
introductory programming language results in instructors and students spending
more time on the syntax of the language rather than the algorithm. Yet, the aim of
an introductory programming course is not to teach a language per se, but to teach
the basic concepts of programming, improve algorithmic thinking and prepare
students for the remainder of their studies. As such, by no means do we argue
against the value and necessity of teaching Java, but we maintain that Python is
more suited for an introductory programming module and can provide the necessary
foundations for students from which they can move on to Java (or an similarly
complex language) in the second term or year, being better equipped and confident.
Taken together, the recommendation derived from these findings is that Python is
an effective language for teaching introductory programming in the first term or year
of a computer science degree, since it enables solid acquisition of fundamental
concepts and constructs and debugging skills, and, as such, it can be integrated
within the paradigm of choice of an institution (for instance, functional, object-
oriented or imperative paradigms).
1,2 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:22 T.Koulouri et al.
confirm the related hypothesis. Similar results were reported by a three-year study
exploring the effect of formative feedback using the exam scores of computer science
students, with the author concluding that there was little or no correlation between
the provision of formative feedback and student achievement [Irons 2010].
There are two possible explanations for the absence of the effect of formative
feedback in our study; the first stems from students and the broader context, and the
second relates to the quality of feedback. First, the participation in the formative
feedback process was lower than expected and this could explain the lack of
significant improvements in student learning for the G2 cohort. This may be related
to the fact that young students appear to be less keen to take advantage of
opportunities unless there is a tangible benefit associated with the task. Indeed,
several studies have found that most undergraduate computer science students are
“externally motivated”, that is, they are driven to work in order to perform well in
summative assessments that contribute to their final grades and degree outcome
rather than from a “thirst to develop knowledge” [Carter and Boyle 2002 p.1; Bostock
2004]. Additional reasons for the low engagement of students with the feedback
process may include fear of participation, seeing feedback as a bad sign, and negative
attitudes towards learning [Black 1999]. Research has also indicated that students
may find feedback difficult to understand [Lillis and Turner 2001], or not even read it
[Ecclestone 1998]. As such, it is argued that simply offering feedback has limited
value; in order for students to make the most of it, a broader change in their
motivations and perceptions should take place [Black 1999], while Rust [2002] also
recommends that students should be required (not encouraged) to actively engage
with it. In addition to providing feedback, then, structured opportunities should be
provided to students to understand, reflect on, and discuss the feedback with tutors.
A second explanation may lie in the nature of the feedback provided in this study.
Research by Corbett and Anderson [1991; 2001] shows that results are mixed with
regards to whether introductory programming students benefit from feedback; for
instance, parameters that determine whether feedback leads to improvement have to
do with whether it is provided on-demand, automatically, at the end of each line or
the program, consists of goal hints or explanations, etc. These parameters may also
interact with the proficiency level and learning style of students. Indeed, such
observations give rise to richer research questions, with regards to feedback
parameters that can provide a true advantage to learners of programming.
In light of the findings of this study and those of previous literature, it is advised
that for formative feedback to yield observable benefits to their performance, novice
students of programming may need to be externally motivated and guided. Moreover,
the amount, nature and timing of feedback should be fine-tuned to the particular
characteristics of the task and students in order to be effective and worthwhile.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:24 T.Koulouri et al.
could have a greater impact, since it makes loops easier to use, and the underlying
concept easier to understand.
On the other hand, in the iteration in which “problem-solving before programming”
was implemented, the benefit in the use of conditionals was more pronounced
compared to the use of loops. Thus, it appears that variations in the use of
conditionals may be less associated with programming language complexity and more
associated with abstract thinking skills. This implies that beginners need to acquire
logical and mathematical structures in order to be more proficient in the use of
conditional statements, a point also noted by Rogalski and Samurçay [1990]. This
seems to be achieved when developing the students’ problem solving skills before
programming is addressed, suggesting that this ordering should be followed in the
design of the module.
It is perhaps to be expected that a more syntactically-complex language will result
in novice programmers making more syntax errors. However, it is only recently that
empirical studies have began to investigate the extent of the impact. Denny, Luxton-
Reilly and Tempero [2011] found that syntax presents a major barrier for novice
learners; in a large-scale experiment, students with excellent final grades submitted
non-compiling programs almost 50% of the time, even in simple exercises, while low
performing students submitted non-compiling code 73% of the time. As the authors
state, it appears that syntax is a greater challenge for all novice students than
anticipated. The same study found that there is a negative correlation between
syntax error frequency and perceptions and attitudes about programming, while
suggesting that spending excessive amounts of time on correcting syntax errors
hinders learners. Denny, Luxton-Reilly and Tempero [2012] also explored the idea
that “not all syntax errors are equal”. Indeed, while all students make the same types
of syntax error, top students can quickly fix missing semicolons (the most common
syntax error) while less able students take twice as long to correct the problem.
However, students of all abilities find it equally difficult to correct the second and
third most common syntax errors. So, if, indeed, novice programmers largely make
syntax errors which they then repeat or are unable to resolve within a specified time,
it may be argued that a language which is more likely to lead to syntax errors may be
unsuitable as a teaching language for novice programmers.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:26 T.Koulouri et al.
be to explore the correlation between the choice of the programming language used in
the CS1 course and the length of the transient time for the learning curve. Another
interesting, related research area would be to investigate how using Python first has
an impact on learning Java at a later stage (as also addressed in Mannila
et al. [2006]). Finally, a follow-up study about students’ performance in the second
and third years of their studies could help provide an appreciation of the long-term
effects of curriculum changes introduced in the CS1 course.
This paper has argued for the importance of choices and practices informed by
reliable analysis of experimental data. Richer insight and deeper understanding
could, though, be gained if objective measurements were inspected in light of findings
from analysis of qualitative data, such as video recordings, interviews and verbal
protocols [Renumol et al. 2010].
There are also areas of further study associated with extending the variables that
are considered. For example, human factors in the student population, such as
gender and ethnicity, were not considered in the reported study. However, gender
and ethnic background are consistently found to correlate strongly with attrition
rates of first year students in computing, with female and minority-ethnic students
being particularly vulnerable [Talton et al. 2006; Rosson et al. 2011]. Therefore, it is
planned to develop further this study by analyzing the impact of gender and ethnic
background on the strategies adopted to teach programming to beginners. Through
this investigation, it may be possible to identify the teaching methods and curriculum
changes that equally support and are suitable for all students, without adversely
affecting a particular demographic group. For example, it has been consistently
found that pair programming greatly benefits female programmers while not
impairing the performance of male students [Berenson et al. 2004; McDowell et al.
2003].
REFERENCES
Agarwal, K. K., Agarwal, A., and Celebi, M. E. 2008. Python puts a squeeze on Java for CS0 and beyond.
Journal of Computing Sciences in Colleges, 23, 6, 49-57.
Agresti, A., and Liu, I.M. 1999. Modeling a Categorical Variable Allowing Arbitrarily Many Category
Choices. Biometrics, 55, 3, 936–943.
Almstrum, V. L., Hazzan, O., Guzdial, M., and Petre, M. 2005, February. Challenges to computer science
education research. In ACM SIGCSE Bulletin Vol. 37, No. 1, pp. 191-192. ACM.
Beaubouef, T., and Mason, J. 2005. Why the high attrition rate for computer science students: some
thoughts and observations. ACM SIGCSE Bulletin, 37, 2, 103-106.
Beck, L., & Chizhik, A. 2013. Cooperative learning instructional methods for CS1: Design, implementation,
and evaluation. ACM Transactions on Computing Education (TOCE), 13, 3, 10.
Bennedsen, J., & Caspersen, M. E. 2007. Assessing Process and Product: A Practical Lab Exam for an
Introductory Programming Course 1. Innovation in Teaching and Learning in Information and
Computer Sciences, 6, 4, 183-202.
Berenson, S. B., Slaten, K. M., Williams, L., & Ho, C. W. 2004. Voices of women in a software engineering
course: reflections on collaboration. Journal on Educational Resources in Computing (JERIC), 4, 1, 3.
Berglund, A., Daniels, M., and Pears, A. 2006, January. Qualitative research projects in computing
education research: an overview. In Proceedings of the 8th Australasian Conference on Computing
Education-Volume 52, pp. 25-33. Australian Computer Society.
Black, P. 1999. Assessment, learning theories and testing systems. Learners, learning and assessment,
118-134.
Black, P., & Wiliam, D. 1998. Assessment and classroom learning. Assessment in education, 5, 1, 7-74.
Bostock, S. J. 2004. Motivation and electronic assessment. Effective learning and teaching in computing,
86-99.
Braught, G., Wahls, T., & Eby, L. M. 2011. The case for pair programming in the computer science
classroom. ACM Transactions on Computing Education (TOCE), 11, 1, 2.
Brown, A. L. 1992. Design experiments: Theoretical and methodological challenges in creating complex
interventions in classroom settings. The Journal of the Learning Sciences, 2, 2, 141-178.
Bruce, K. B. 2004. Controversy on how to teach CS 1: a discussion on the SIGCSE-members mailing list.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY
XX:28 T.Koulouri et al.
ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY
Teaching Introductory Programming: a Quantitative Evaluation of Different Approaches
http://www.qaa.ac.uk/Publications/InformationAndGuidance/Documents/learningFromSubjectReview.
pdf
R Development Team. 2010. R: A language and environment for statistical computing. R Foundation
Statistical Computing.
Radenski, A. 2006, June. Python First: A lab-based digital introduction to computer science. In ACM
SIGCSE Bulletin . ACM. 38, 3, 197–201.
Randolph, J., Julnes, G., Sutinen, E., and Lehman, S. 2008. A methodological review of computer science
education research. Journal of Information Technology Education: Research, 7. 1, 135-162.
Reeves, T. 2011. Can educational research be both rigorous and relevant. Educational Designer, 1, 4, 1-24.
Reges, S. 2006. Back to basics in CS1 and CS2. ACM SIGCSE Bulletin, 38. 1, 293-297.
Renumol, V. G., Janakiram, D., and Jayaprakash, S. 2010. Identification of Cognitive Processes of
Effective and Ineffective Students During Computer Programming. ACM Transactions on Computing
Education (TOCE), 10. 3, 10.
Robins, A., Rountree, J., and Rountree, N. 2003. Learning and teaching programming: A review and
discussion. Computer Science Education, 13. 2, 137-172.
Rogalski, J., and Samurçay, R. 1990. Acquisition of programming knowledge and skills. Psychology of
programming, 18, 157-174.
Rosson, M. B., Carroll, J. M., and Sinha, H. 2011. Orientation of undergraduates toward careers in the
computer and information sciences: Gender, self-efficacy and social support. ACM Transactions on
Computing Education (TOCE), 11, 3, 14.
Rust, C. 2002. The Impact of Assessment on Student Learning How Can the Research Literature
Practically Help to Inform the Development of Departmental Assessment Strategies and Learner-
Centred Assessment Practices?. Active learning in higher education, 3, 2, 145-158.
Shannon, C. 2003, February. Another breadth-first approach to CS I using python. In ACM SIGCSE
Bulletin. Vol. 35, No. 1, pp. 248-251. ACM.
Shute, V. J. 2008. Focus on formative feedback. Review of educational research, 78, 1, 153-189.
Siegfried, R. M., Chays, D., and Herbert, K. G. 2008, July. Will there ever be consensus on cs1. In Proc.
2008 International Conference on Frontiers in Education: Computer Science and Computer
Engineering–FECS. Vol. 8, pp. 18-23.
Simon, B., Kohanfars, M., Lee, J., Tamayo, K., & Cutts, Q. 2010, March. Experience report: Peer
instruction in introductory computing. In Proceedings of the 41st ACM technical symposium on
Computer science education. 341-345. ACM.
Slavin, R. E. 2003. A reader's guide to scientifically based research. Educational Leadership, 60, 5, 12-16.
Sloan, R. H., & Troy, P. 2008, March. CS 0.5: a better approach to introductory computer science for
majors. In ACM SIGCSE Bulletin . 40, 1, 271-275. ACM.
Sleeman, D., Putnam, R. T., Baxter, J., & Kuspa, L. 1988. An introductory Pascal class: A case study of
students' errors. Teaching and Learning Computer Programming: Multiple Research Perspectives. RE
Mayer. Hillsdale, NJ, Lawrence Erlbaum Asociates, 237-257.
Stamey, J., and Sheel, S. 2010. A boot camp approach to learning programming in a CS0 course. Journal of
Computing Sciences in Colleges, 25, 5, 34-40.
Stefik, A., & Siebert, S. 2013. An empirical investigation into programming language syntax. ACM
Transactions on Computing Education (TOCE), 13, 4, 19.Stenhouse, L. 1975. An Introduction to
Curriculum Research and Development. London, Heinemann.
Talton, J. O., Peterson, D. L., Kamin, S., Israel, D., and Al-Muhtadi, J. 2006, March. Scavenger hunt:
computer science retention through orientation. In ACM SIGCSE Bulletin. Vol. 38, No. 1, pp. 443-447.
ACM.
Tu, J. J., and Johnson, J. R. 1990. Can computer programming improve problem-solving ability?. ACM
SIGCSE Bulletin, 22, 2, 30-33.
Turner, S., and Hill, G. 2007. Robots in problem-solving and programming. In 8th Annual Conference of
the Subject Centre for Information and Computer Sciences.
U.S. Department of Education. 2001. No Child Left Behind Act, Retrieved from
http://www2.ed.gov/policy/elsec/leg/esea02/index.html
Vilner, T., Zur, E., and Gal-Ezer, J. 2007, June. Fundamental concepts of CS1: procedural vs. object
oriented paradigm-a case study. In ACM SIGCSE Bulletin. Vol. 39, No. 3, pp. 171-175. ACM.
Weaver, M. R. 2006. Do students value feedback? Student perceptions of tutors’ written responses.
Assessment & Evaluation in Higher Education, 31, 3, 379-394.
Zelle, J. M. 1999, March. Python as a first language. In Proceedings of 13th Annual Midwest Computer
Conference. Vol. 2.
ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY