0% found this document useful (0 votes)
34 views22 pages

Learning Analytics in Education for the Twenty-Fir (1)

Uploaded by

altamashmulla03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views22 pages

Learning Analytics in Education for the Twenty-Fir (1)

Uploaded by

altamashmulla03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 16

Learning Analytics in Education for the


Twenty-First Century

Kristof De Witte and Marc-André Chénier

Abstract The online traces that students leave on electronic learning platforms;
the improved integration of educational, administrative and online data sources;
and the increasing accessibility of hands-on software allow the domain of learning
analytics to flourish. Learning analytics, as in interdisciplinary domain borrowing
from statistics, computer sciences and education, exploits the increased accessibility
of technology to foster an optimal learning environment that is both transparent
and cost-effective. This chapter illustrates the potential of learning analytics to
stimulate learning outcomes and to contribute to educational quality management.
Moreover, it discusses the increasing emergence of large and accessible data sets
in education and compares the cost-effectiveness of learning analytics to that of
costly and unreliable retrospective studies and surveys. The chapter showcases the
potential of methods that permit savvy users to make insightful predictions about
student types, performance and the potential of reforms. The chapter concludes
with recommendations, challenges to the implementation and growth of learning
analytics.

16.1 Introduction

Education stakeholders are currently working within an environment where vast


quantities of data can be leveraged to have a deeper understanding of the educational
attainment of learners. A growing pool of data is generated through software with

K. De Witte ()
Leuven Economics of Education Research (LEER), KU Leuven, Leuven, Belgium
Maastricht Economic and Social Research Institute on Innovation and Technology
(UNU-MERIT), United Nations University, Maastricht, The Netherlands
e-mail: [email protected]; [email protected]
M.-A. Chénier
Leuven Economics of Education Research (LEER), KU Leuven, Leuven, Belgium
e-mail: [email protected]

© The Author(s) 2023 305


E. Bertoni et al. (eds.), Handbook of Computational Social Science for Policy,
https://doi.org/10.1007/978-3-031-16624-2_16
306 K. De Witte and M.-A. Chénier

which students, teachers and administrators interact (Kassab et al., 2020), through
apps, social networking and the collection of user behaviour on aggregators such as
YouTube and Google (De Wit & Broucker, 2017). Moreover, thanks to the Internet
of Everything phenomenon, stakeholders in the education domain have access to
data in which people, processes, data and things connect to the internet and to
each other (Langedijk et al., 2019). That data takes on non-traditional formats
and retains language, location, movement, networks, images and video information
(Lazer et al., 2020). Such non-traditional data sets require cutting-edge analytical
techniques in order to be effectively used for learning purposes and to be translated
into succinct policy recommendations.
Learning analytics, as an interdisciplinary domain borrowing from statistics,
computer sciences and education (Leitner et al., 2017), exploits this new data-
rich landscape to improve the learning process and outcomes of current and future
citizens (De Wit & Broucker, 2017). In education, learning analytics is set squarely
within the new computational social sciences, which consist in the “development
and application of computational methods to complex, typically large-scale human
behavioral data” (Lazer et al., 2009). Learning analytics directs these advances
towards the creation of actionable information in education. It applies data analytics
to the field of education, and it attempts to propose ways to explore, analyse and
visualize data from any relevant data source (Vanthienen & De Witte, 2017). An
important role of learning analytics is the exploitation of the traces left by students
on electronic learning platforms (Greller & Drachsler, 2012). As such, learning
analytics allows teachers to maximize the cognitive and non-cognitive education
outcomes of students (Long & Siemens, 2011). In an optimal learning environment,
one would maximally leverage the potential of students to increase their welfare and
performance not only during schooling but also afterwards, across civil society.
As the COVID-19 pandemic induced shifts towards online and home education,
there is an increased opportunity for data analytics in general and to mitigate the
crisis’ effects both on learning outcomes (Maldonado & De Witte, 2021) and on the
well-being of students (Iterbeke & De Witte, 2020) in particular. The online traces
that students leave on electronic learning platforms allow teachers, schools and
policy-makers to better tailor targeted remedial teaching interventions to the most
needy students. The closures of schools also showed how unequally digital devices
are spread among students, with significant groups of disadvantaged students
without access to basic digital instruments such as stable broadband access and
computer. Similarly, the school closures revealed significant differences between
countries in their readiness for online teaching and in the availability of high-quality
digital instruction. Still, thanks to the unprecedented crisis, multiple countries made
significant investments in the educational ICT infrastructure (De Witte & Smet,
2021). If this coincides with improved training of teachers and school managers;
an improved integration of educational, administrative and online data sources; and
the improved accessibility of hands-on software, we expect to see the domain of
learning analytics to further flourish in the next decades.
The following chapter aims to contribute to this accelerated use of learning
analytics by picturing its potential in multiple educational domains. We first discuss
16 Learning Analytics in Education for the Twenty-First Century 307

the increasing emergence of large and accessible data sets in education and the
associated growth in expertise in educational data collection and analysis. This is
sustained by real-time streamed data and increasingly autonomous administrative
data sets. Section 16.2 compares the cost-effectiveness of learning analytics to
that of costly and unreliable retrospective studies and surveys. Learning analytics
may also contribute to the improvement in the quality of the currently dispensed
education through fraud detection and student performance prediction, for example.
In Sect. 16.3, three tools of growing popularity and potential for learning analytics
are presented: the Bayesian Additive Regression Trees (BART), the Social Network
Analysis (SNA) and the Natural Language Processing (NLP). These tools permit
savvy users to make insightful predictions about student types, performance and the
potential of reforms. The brief description of these techniques aims to familiarize
practitioners and decision-makers with their potential. Finally, alongside recommen-
dations, technical and non-technical challenges to the implementation and growth
of learning analytics and empirically based education in general are discussed. As
the growing possibilities of learning analytics result in sensitive options regarding
data usage and linkages, we discuss in the conclusion section the related ethical and
legal concerns.

16.2 Potential for Educators and Citizens

16.2.1 Growing Opportunities for Data-Driven Policies in


Education

“Students and teachers are leaving large amounts of digital footprints and traces
in various educational apps and learning management platforms, and education
administrators register various processes and outcomes in digital administrative
systems” (Nouri et al., 2019). In this section, we discuss three trends that allow for
growing opportunities in fomenting creative data-driven policies in education: (1)
the development of online teaching platforms, (2) software-oriented administrative
data collection with links between heterogeneous data sets (Langedijk et al., 2019)
and (3) the Internet of Things (Langedijk et al., 2019).
First, consider the online teaching platforms. A prime example of the latter are
massive open online courses (i.e. MOOCs, De Smedt et al., 2017). Institutional
MOOC initiatives have been contributing to making high-quality educational
material accessible to a wide range of students and to maintaining the prestige of
the participating institutions (Dalipi et al., 2018). For adults, MOOC completion has
also been associated with increased resilience to unemployment (Castaño-Muñoz &
Rodrigues, 2021). From a learning analytics perspective, it is interesting to observe
that all student activities can be tracked within the MOOC. This information has
been studied to give empirical grounding to suggestions to reduce course dropout by
fostering peer engagements on online forums, team homeworks and peer evaluations
308 K. De Witte and M.-A. Chénier

(Dalipi et al., 2018). From a methodological perspective, some of the innovative


methodologies exploiting MOOC’s large data sets include K-means clustering,
support vector machines and hidden Markov models.1
A second trend in data-driven policies in education arise from software-oriented
administrative data collections. These refer to the digital warehousing of adminis-
trative data such that this data can be relatively easily linked with other data sets and
easily transformed through, for example, the inclusion of a large quantity of new
observations (e.g. student files) and the ad hoc addition of new variables of interest
(Agasisti et al., 2017). Administrative data sets are built around procedures whose
aims are not primarily to foster data-driven policies (Barra et al., 2017; Rettore &
Trivellato, 2019). In that sense, they can provide rich information about students
and other educational stakeholders while being quicker to gather and significantly
cheaper than retrospective surveys (Figlio et al., 2016).
As a major advantage, software-oriented administrative data collections can
be easily linked to other data sources, such as the wide array of information
surveyed by local governments in their interactions with citizens. Through software
integration, data regarding such diverse domains as public health and agriculture
may be seamlessly captured. To conceptualize the diversity of potential data sources,
Langedijk et al. (2019) describe those data as divided into thematic silos. Each silo
represents an important civil concern, health or education, for example, and within
each silo, stakeholders can define sub-themes onto which interesting data sets are
attached. For example, in the case of education, some proposed sub-themes are
standardized test results, textbook quality and teacher quality2 (Langedijk et al.,
2019). Through the development of electronic networks, links cannot only be
established within silos, where policy-makers may, for instance, be interested in
the relation between teacher quality and test scores, but also across silos, where
improvement in learning outcomes can be associated with changes in the health of
citizens (Langedijk et al., 2019). The analyses required to measure such associations
can take advantage of the typically long-run collection of administrative data (Figlio
et al., 2016). As an additional advantage of the electronic networks, whereas data has
traditionally been transmitted in batches, in order to produce descriptive reports at
set time intervals, for example, electronic networks now permit event registration in
real time (De Wit & Broucker, 2017; Mukala et al., 2015). The real-time extraction

1 K-means clustering divides the observations (e.g. a sample of teachers) into a quantity K of

groups that share similar measured characteristics. That similarity is defined as the squared distance
to the mean of the group’s characteristics (Bishop, 2006). Support vector machines construct a
porous hyperplane that maximally separate the observations closest to it. They are particularly
useful to solve classification problems with high-dimensional data (e.g. registered student activity
during multiple lecture) (Bishop, 2006). Finally, hidden Markov models assume that measurements
are generated by underlying hidden states. These hidden states are modelled as a Markov process
(Bishop, 2006). That approach is particularly suited to the analysis of sequential data such as the
quantity of attempts in an educational game (Tadayon, 2020).
2 Teacher quality is a multi-dimensional concept that is often proxied by teacher value-added

scores.
16 Learning Analytics in Education for the Twenty-First Century 309

of data benefits teachers and students who can rely, for example, on automated
assignments and online dashboards in order to improve their learning experience
and their learning outcomes (De Smedt et al., 2017).
A good example of data set linkages in education are studies with population
data that aim to explore education outcomes in specific subgroups. A recent study
by Mazrekaj et al. (2020) made use of the rich micro-data sets made available
to researchers by the Dutch Central Bureau of Statistics (CBS). These micro-data
cover many themes of social life (e.g. financial, educational, health, environmental,
professional silos and more) and are, though of limited access because of privacy
issues, easy to link together with standard analytics software.
Third, consider the Internet of Things. The Internet of Things denotes the numer-
ous physical devices with integrated internet connectivity (DeNardis, 2020). In
educational settings, these devices are the computers, SNS services, mobile devices,
camera, sensors and software with which students, teachers and administrators
interact (Kassab et al., 2020). They are used to monitor student attendance and class
behaviour and their interactions with online teaching services and laboratories. On
online platforms, but also through mobile apps and logging platforms (e.g. library
access, blogs, electronic learning environment), students’ and tutors’ behaviours
and opinions can be monitored in real time and passed through automatic analytics
platforms or saved to solve future policy issues (De Smedt et al., 2017; De Wit &
Broucker, 2017). Similarly, RFID (radio-frequency identification) sensors track the
locations and availability of educational appliances such as laboratory equipment
and projectors. Students and tutors can communicate with each other regardless
of location, and assessment feedback can be delivered instantaneously, resulting in
higher-quality education.

16.2.2 Learning Analytics as a Toolset

The toolset of learning analytics can be used for several purposes. We first provide
some examples on how it can contribute to improve the cost-effectiveness of
education and next how it can foster education outcomes on cognitive and non-
cognitive scales. Finally, we provide examples of how learning analytics can assist
in educational quality management.

16.2.2.1 Improving Cost-Effectiveness of Education

The increasing public scrutiny and tighter budgets, which are an ever-present reality
of the educational landscape, motivate a double goal for data-driven solutions. These
must improve efficiency and performance with regard to learning outcomes while
also proposing solutions that are competitive in terms of cost (Barra et al., 2017).
There are two poles through which cost-effective learning analytics solutions can be
proposed.
310 K. De Witte and M.-A. Chénier

The first pole stands at the level of data collection. Administrative data sets suffer
from their high cost of data cleaning and collection. Indeed, although data extraction
is usually native to recent administrative software (King, 2016), administrative data
sets typically require ad hoc linkages and research designs (Agasisti et al., 2017).
In the sense that their inclusion in data-driven decision-making is not their primary
purpose, they constitute an opportunistic data source and thus may occasionally
demand more resource investments than deliberate data collection procedures.
Meanwhile, the omnipresent network of computing devices and the associated
online educational platforms permit data extraction at every step of the learning
process (De Smedt et al., 2017). As previously indicated, this type of unstructured
data can be saved, but the real-time data stream can also be designed in such a
way to permit automatic analyses. This deliberate pipeline associating the collected
data to useful analyses can insure cost-effectiveness through economies of scales.
It can also serve as a baseline to future improvements in summarizing data for
students, teachers and stakeholders in general. In short, rich data sets and insightful
analyses can be produced without requiring punctual organizational involvement.
In that sense, the environment in which learning analytics is embedded permits
professionals and stakeholders to benefit from opportunistic analyses and from
insights that are delivered efficiently (Barra et al., 2017). For example, during the
COVID-19 crisis, learning analytics was used to monitor how students were reached
by online teaching.
The second pole to achieve cost-effectiveness in the establishment of data-driven
policy-making for education is that of data analytics. Up until now, technologically
able and creative teams have been achieving parity with the expanding volume,
variety and velocity of data by developing and applying advanced analytical
methods (De Wit & Broucker, 2017; King, 2016). One such method is Data
Envelopment Analysis (DEA). It permits the employment of administrative and
learning data in order to directly fulfil goals related to cost minimization (Barra
et al., 2017; De Witte & López-Torres, 2017; Mergoni & De Witte, 2021). The result
of such analyses may be useful in promoting efficient investments in educational
resources (see, e.g. the report by the European Commission Expert Group on
Quality Investment in Education and Training). Additional spending brings to the
forefront its paradoxical effect of increasing cost-effectiveness in the long run.
Advances in social sciences have already demonstrated the consequences of poor
learning outcomes, the principal of which are “lower incomes and economic growth,
lower tax revenues, and higher costs of such public services as health, criminal
justice, and public assistance” (Groot & van den Brink, 2017). Hence, learning
outcomes deserve an important place in discussions around the cost-effectiveness
of education (De Witte & Smet, 2021).

16.2.2.2 Improving Learning Outcomes

In terms of directly improving educational quality, three ambitions can be distin-


guished for learning analytics: making improvements in (non-)cognitive learning
16 Learning Analytics in Education for the Twenty-First Century 311

outcomes, reducing learning support frictions and a wide deployment and long-term
maintenance for each teaching tool (Viberg et al., 2018). These ambitions are now
discussed.
First, learning outcomes can be interpreted as the academic performance of
students, as measured by quizzes and examinations (Viberg et al., 2018). Learning
outcomes can also be defined in a broader way than similar testable outcomes, for
example, by being related to interpersonal skills and civic qualities. However widely
defined, it is important that the set of criteria identifying educational success is
well-defined by stakeholders and that it is clearly communicated to and open to the
contributions of citizens. In that way, educational policy discussions can be centred
around transparent and recognized aims.
Although there is a rich literature evaluating learning analytics in higher educa-
tion, the contributions of learning analytics tools to improving the (non-)cognitive
learning outcomes of secondary school students have received relatively little
attention in the empirical literature (Bruno et al., 2021). Nevertheless, clear
improvements in writing and argumentative quality have been associated with the
use of automatic text evaluation softwares (Lee et al., 2019; Palermo & Wilson,
2020). These softwares use Natural Language Processing (NLP) to analyse data
extracted from online learning platforms. Automatic text evaluation has also shown
promising results at higher education levels and with non-traditional adult students
(Whitelock et al., 2015b). There is thus flexibility in terms of the type of students or
teachers to whom learning analytics approaches apply.
Another interesting contribution of learning analytics to the outcomes of sec-
ondary school students has been in improving their computer programming abilities.
This has been accomplished through another advanced data analysis technique,
process mining, which helped teachers in pairing students based on captured
behavioural traces during programming exercises (Berland et al., 2015).
Second, with respect to learning support frictions, there is often a lag between the
assumptions behind the design of learning platforms and the observed behaviours
of students (Nguyen et al., 2018). An example of this lag is that students tend to
spend less time studying than recommended by their instructors. Less involved
students also tend to spend less time preparing assignments (Nguyen et al., 2018).
By reducing their ability to receive feedback in a timely manner, a similar lag can
negatively affect both students’ and teachers’ involvement in the learning process.
Thanks to learning analytics tools, students will receive tailored feedback, will
rehearse exercises that are particularly difficult for them and will receive stimulating
examples that fit their interest (Iterbeke et al., 2020). This reduces the learning
support frictions and consequently improves learning outcomes.
Yet, the lag between the desired learning outcomes and student behaviour cannot
be corrected simply through the implementation of electronic platforms or through
a gamification of the learning process. It is critical that the digital tools being
implemented and those implementing them take students’ feedback into account.
Many students are now used to accessing information without having to pass
through much in the manner of physical and social barriers. For those students,
the interactivity and the practicality of the digital learning tools are particularly
312 K. De Witte and M.-A. Chénier

important (Pardo et al., 2018; Selwyn, 2019). Other students may not have the
same familiarity with online computing devices. For these, accessibility has to be
negotiated into the tools.
Many authors warn of a transfer from magisterial education to learning platforms
in which feedback and exercises may be too numerous, superficial or ill-adapted
to students’ capabilities or learning ambitions (Lonn et al., 2015; Pardo et al.,
2018; Topolovec, 2018). Hence, a hybrid approach to learning support is suggested
wherein technologies, such as those just touched upon of automatic text analyses and
process mining, are combined with personalized feedback from teachers and tutors.
Indeed, classroom teaching is often characterized by a lack of personalization and
biases in the dispensation of feedback and exercises. For example, low-performing
students are over-represented among the receiver of teacher feedback. Additionally,
given the same learning objectives, feedback may be administrated differently to
students of different genders and origins. Teachers may find learning analytics tools
useful in helping their students attain the desired learning outcomes while fostering
their personal learning ambitions and their self-confidence (Evans, 2013; Hattie &
Timperley, 2007).
Third, learning analytics can provide additional value to students and teachers. In
that sense, we observe several clear advantageous applications of learning analytics.

• Learning analytics could contribute to non-cognitive skills, as collaboration is an


area where non-cognitive skills play an important role. Identifying collaboration
and the factors that incite it can improve learning outcomes and even help
in preventing fraud. The implementation of analytics methods such as Social
Network Analysis (SNA) in learning platforms may allow teachers to prevent or
foster such collaborations (De Smedt et al., 2017). Simple indicators like the time
of assignment submission can be treated as proxies for collaboration. We discuss
SNA more into depth in Sect. 16.3.
• Another computational approach, process mining, can exploit real-time
behavioural data to summarize the interactions of students with a given course’s
material. Students can then be distinguished based on their mode of involvement
in the course (Mukala et al., 2015). It allows teachers to learn how the teaching
method results in behavioural actions. These insights can be incorporated in the
course design, and on the detection of inefficient behaviour, allowing fast and
personalized intervention (De Smedt et al., 2017).
• A conjoint method to generate value from learning analytics is by implementing
text analyses directly on the learning platforms. Natural Language Processing
(NLP) is a text analysis method that has been shown to greatly improve the
performance of students with regard to assignments such as the writing of
essays (Whitelock et al., 2015a). Generally, text analysis can provide automated
feedback shared with the students and their teachers (De Smedt et al., 2017).
Providing automated feedback makes another argument for the cost-effectiveness
of learning analytics. By giving course providers the ability to score large student
bodies, it allows teachers to put more focus onto providing adapted support
16 Learning Analytics in Education for the Twenty-First Century 313

to their students (De Smedt et al., 2017). We discuss NLP into more depth in
Sect. 16.3.
• Not the least advantage of online learning is that it allows asynchronous and syn-
chronous interactions and communications between the participants to a course
(Broadbent & Poon, 2015). These interactions can be logged as unstructured data
and incorporated into useful text, process and social network analyses.

16.2.2.3 Educational Quality Management

A key component of quality improvement in education is the creation of quality


and performance indicators related to teachers and schools (Vanthienen & De Witte,
2017). Learning analytics’ contribution to educational quality improvement is in
providing data sources and computational methods and combining them in order to
produce actionable summaries of teaching and schooling quality (Barra et al., 2017).
Whereas, traditionally, data analyses have required punctual involvement and costly
(time) investments from stakeholders, learning analytics can rely on computational
power and dense networks of computational devices to automatically propose real-
time reports to policy-makers. Below, contributions in terms of quality measurement
and predictions are introduced.

16.2.2.4 Underlying Data for Quality Measurement

Through the exploitation of unstructured, streamed, behavioural data and pre-


existing administrative data sets, analytical reports can be updated in real time to
reflect the state of education at any desired level, from the individual student and
classroom to the country as a whole. That information is commonly ordered in
online dashboards (De Smedt et al., 2017). Analysts and programmers can even
allow the user to customize the presented summary in real time, by applying filters
on maps and subgroups of students, for example.

16.2.2.5 Efficiency Measurement

An aspect of the quality measurements provided by learning analytics is efficiency


research, in which inputs and outputs are compared against a best practice frontier
(see the earlier discussed Data Envelopment Analysis model). In this branch of
literature, schools are, for instance, compared based on their ability to maximize
learning outcomes given a set of educational inputs (De Witte & López-Torres,
2017; e Silva & Camanho, 2017; Mergoni & De Witte, 2021). The outcome of a
similar analysis might be used for quality assessment purposes.
314 K. De Witte and M.-A. Chénier

16.2.2.6 Predictions

When discussing the potential of learning analytics for educators and stakeholders,
the ability to make predictions about learning outcomes is an unavoidable point
of interest. In quantitative analyses, predictions are generated by translating latent
patterns in historical data, be it structured or unstructured, in order to identify likely
future outcomes (De Witte & Vanthienen, 2017).
Predictions can be produced using, for example, the Bayesian Additive Regres-
sion Trees (BART) model (see Sect. 16.3), as applied in Stoffi et al. (2021). There,
linked administrative and PISA data available only in Flanders is used to distinguish
a group of overwhelmingly under-performing Walloon students and explain their
situation. Typically, such a technique uses administrative data that is available for
both endowment groups in order to make a sensible generalization from one to the
other.
Alternatively, process mining can be used to identify clusters of students and
distinguish successful interaction patterns with a course’s material (Mukala et al.,
2015). Similar applications can be imagined for Social Network Analysis (De
Smedt et al., 2017), through the evaluation of collaborative behaviour, and Natural
Language Processing. These techniques are usually perceived as descriptive, but
their output may very well be included in a predictive framework by education
professionals and researchers.
Learning analytics has initiated a shift from using purely predictive analytics as
a mean to identify student retention probabilities and grades towards the application
of a wider set of methods (Viberg et al., 2018). In return, cutting-edge exploratory
and descriptive methods can improve traditional predictive pipelines.

16.3 An Array of Policy-Driving Tools

It is one thing to comb over the numerous contributions and potential of learning
analytics to data-informed decision-making; it is yet another to actually take the
plunge and settle on tools for problem-solving in education. In what follows, a brief
introduction to distinct methods from the field of computational social sciences is
provided. In that way, the reader can get acquainted with the intuition of the methods
and how they can be used to improve learning outcomes and quality measurement in
education. To set the scene, we also illustrate how the approaches open up the range
of innovative educational questions that can be answered through learning analytics.

16.3.1 Bayesian Additive Regression Trees

The Bayesian Additive Regression Trees (BART) stems from machine learning and
probabilistic programming. It is a predictive and classifying algorithm that makes
16 Learning Analytics in Education for the Twenty-First Century 315

solving complex prediction problems simple by relying on a set of sane param-


eter configurations. Earlier comparable algorithms such as the Gradient Boosting
Machine (GBM) and the Random Forest (RF) require repeated adjustments that
hinge the quality of their predictions on an analyst’s programming ability and lim-
ited computational resources. By contrast, the BART incorporates prior knowledge
about educational science problems in order to produce competitive predictions
and measures of uncertainty after a single estimation run (Dorie et al., 2019). This
contributes to the accessibility of knowledge discovery and the credibility of policy
statements in education.
As with the GBM and the RF, the essential and most basic component of the
BART algorithm is the decision or prediction tree. The prediction tree is a classic
predictive method that, unlike traditional regression methods, does not assume linear
associations between sets of variable. It is robust to outlying variable values, such
as those due to measurement error, and can accommodate a large quantity of data
and high-dimensional data sets.
Their accuracy and relative simplicity have made regression trees popular
diagnostic and prediction tools in medicine and public health (Lemon et al.,
2003; Podgorelec et al., 2002). In education, a recent application of regression
trees has been to explore dropout motivations and predictors in tertiary education
(Alfermann et al., 2021). The regression tree algorithm (i.e. CART or classification
and regression trees, Breiman et al., 2017) does variable selection automatically, so
researchers are able to distinguish a few salient motivations, such as the perceived
usefulness of the work, from a vast endowment of possible predictors.
To predict quantities such as test scores or dropout risk, regression trees separate
the observations into boxes associating a set of characteristics with an outcome. The
trees are created in multiple steps. In each of these steps, all observations comprised
in a box of characteristics are split in two new boxes. Each split is selected by the
algorithm to maximize the accuracy of the desired predictions. The end result of
this division of observations into smaller and smaller boxes are branches through
which each individual observation descends into a leaf. That leaf is the final box
that assigns a single prediction value (e.g. a student’s well-being score) to the set of
observations sharing its branch. Graphically, the end result is a binary decision tree
where each split is illustrated by a programmatic if statement leading onto either the
next binary split or a leaf.
The Bayesian Additive Regression Trees (BART) algorithm is the combination
of many such small regression trees (Kapelner & Bleich, 2016). Each regression tree
adds to the predictive performance of the algorithm by picking up on the mistakes
and leftover information from the previously estimated trees. After hundreds or
possibly thousands of such trees are estimated, complex and subtle associations can
be detected in the data. This makes the BART algorithm particularly competitive in
areas of learning analytics where a large quantity of data are collected and there is
little existing theory as to how interesting variables may be related to the outcome of
interest, be it some aspect of the well-being of students or their learning outcomes.
The specific characteristic of the BART algorithm is its underlying Bayesian
probability model (Kapelner & Bleich, 2016). By using prior probabilistic knowl-
316 K. De Witte and M.-A. Chénier

edge to restrict estimation possibilities to realistic prediction scenario, the algorithm


can avoid detecting spurious association between variables. Each data set, unless
it constitutes a perfect survey of the entire population of interest, contains variable
associations that are present purely due to chance. Such coincidental associations
reduce the ability to predict true outcomes when they are included in predictive
models. Thus, each regression tree estimated by the BART algorithm is kept
relatively small. Because each tree tends to assign predictions to larger sets of
observations (i.e. large boxes), the predictive ability of individual trees is bad. This
is why analysts call them weak learners. However, by combining many such weak
learners, a flexible, precise and accurate prediction function can be generated (Hill
et al., 2020).
The BART algorithm has already been presented earlier in this chapter as a
flexible technique to detect and explain learning outcome inequalities (Stoffi et al.,
2021). A refinement of the algorithm also permits the detection of heterogeneous
policy effects on the learning outcomes of students. This is showed in Bargagli-
Stoffi et al. (2019), where it is found that Flemish schools with a young and less
experienced school director benefit most from a certain public funding policy. The
large administrative data sets provided by educational institutions and governments
are well fit for the application of rewarding but computationally demanding
techniques such as the BART (Bargagli-Stoffi et al., 2019).

16.3.2 Social Network Analysis

The aim of Social Network Analysis (SNA) is to study the relations between
individuals or organizations belonging to the same social networks (Wasserman,
Faust, et al., 1994). Relations between these actors are defined by nodes and ties. The
nodes are points of observations, which can be students, schools, administrations
and more. The ties indicate a relationship between nodes and can contain additional
information about the intensity of various components of that relationship (e.g.
the time spent collaborating, the type of communication; Grunspan et al., 2014).
Specifically for education, SNA aims to describe the networks of students and staff
and make that information actionable to stakeholders. Applications of SNA include
the optimization of learning design, the reorganization of student groups and the
identification of at-risk clusters of students (Cela et al., 2015). Through text analysis
and other advanced analytics methods, SNA can handle unstructured data from
school blogs, wikis, forums, etc. (Cela et al., 2015). We discuss five examples more
in detail next and refer the interested readers to the review by Cela et al. (2015), who
provides many other concrete applications of SNA in education.
As a first example, the recognized importance of peer effects, both within and
outside the classroom, makes Social Network Analysis (SNA) a particularly useful
tool in education (Agasisti et al., 2017; Cela et al., 2015; Iterbeke et al., 2020).
Applications of SNA model peer effects indirectly as a component of unobserved
school or classroom effects that influence the (non-)cognitive skills (Cooc & Kim,
16 Learning Analytics in Education for the Twenty-First Century 317

2017). As a second example, SNA has been applied to describe and explain a
multiplicity of phenomena in schools. In a study of second and third primary school
graders from 41 schools in North Carolina, Cooc and Kim (2017) found that pupils
with a low reading ability who associated with higher ability peers for guidance
significantly improved their reading scores over a summer. Third, other relevant
applications of SNA have been in assessing the participation of peers in the well-
being, be it mental or physical, of students. Surveying 1458 Belgian teenagers,
Wegge et al. (2014) showed that the authors of cyber-bullying were often also
responsible for physically bullying a student. Additionally, it was observed that a
majority of bullies were in the same class as the bullied students. Moreover, a map
of bullying networks isolated some students as being perpetrators of the bullying of
multiple students. In cases of intimidation and bullying, a clear advantage of SNA
over the usual approaches is that the data does not depend on isolated denunciations
from victims and peers. The analysis of Wegge et al. (2014) simultaneously
identifies culprits and victims, suggesting a course of action that does not focus
attention on an isolated victim of bullying. A fourth example application of SNA
is in improving the managerial efficacy and the performance of employees within
educational organizations. One way to do this is by identifying bottlenecks in the
transmission of information through the mapping of social networks. This can take
two forms in the language of SNA: brokerage and structural holes (Everton, 2012).
In a brokerage situation, a single agent or node controls the passing of information
from one organizational sub-unit to the other. Meanwhile, structural holes identify
absent ties between sub-units in the network. In a school, an important broker may
be the principal’s secretary, whereas structural holes may be present if teachers or
staffs do not communicate well with one another (Hawe & Ghali, 2008). As a fifth
illustration, the SNA method has been used to propose a typology of teachers based
on the nature of their ties with students and to identify clusters of students more
likely to be plagiarising with each other (Chang et al., 2010; Merlo et al., 2010;
Ryymin et al., 2008). The ability to cluster students based on the intensity of their
collaborations in a course has also been distinguished as a way to prevent fraud.
Detecting cooperation between students is one of the key application of SNA in
learning analytics (De Smedt et al., 2017).

16.3.3 Natural Language Processing

Natural Language Processing (NLP) is an illustration of the ability of computing


machines to communicate with human languages (Smith et al., 2020). NLP
applications can be achieved with relatively simple sets of rules or heuristics
(e.g. word counts, word matching) or without applying cutting-edge machine
learning techniques (Smith et al., 2020). Given NLP relies on machine learning
techniques, it is better able to understand the context and reveal hidden meanings in
communications (e.g. irony) (Smith et al., 2020).
318 K. De Witte and M.-A. Chénier

In education, the use of NLP has been shown to improving students’ learning
outcomes (Whitelock et al., 2015a) and promoting student engagement. Moreover,
NLP systems have the potential to provide one-on-one tutoring and personalized
study material (Litman, 2016). The automatic grading of complex assignments is
a precious feature of NLP models in education. These may eventually become a
cost-effective solution that facilitate the evaluation of deeper learning skills than
those evaluated through answers to multiple-choice questions (Smith et al., 2020).
By efficiently adjusting the evaluation of knowledge to the learning outcomes
desired by stakeholders, NLPs can contribute to educational performance. External
and open data sets have allowed NLP solutions to achieve better accuracy in
tasks such as grading. Such data sets can situate words within commonly invoked
themes or contexts, for example, allowing the NLP model to make a more nuanced
analysis of language data (Smith et al., 2020). Access to rich language data sets
and algorithmic improvements may even allow NLP solutions to produce course
assessment material automatically (Litman, 2016). However, an open issue with
machine learning implementations of NLP is that the features used in grading by
the computer may not provide useful feedback to the student or the teacher (e.g. by
basing the grade on word counts) (Litman, 2016). Reasonable feedback may still
require human input.

16.4 Issues and Recommendations

Despite the outlined benefits and contributions of learning analytics, there are,
however, still some issues and limitations. A clear distinction can be made between
issues belonging to the technical and non-technical parts of learning analytics (De
Wit & Broucker, 2017). In the first case, there are the issues related to platform
and analytics implementations, data warehousing, device networking, etc. With
regard to the non-technical issues, there are concerns over the public acceptance and
involvement in learning analytics, private and public regulations, human resources
acquisition and the enthusiasm of stakeholders as to the technical potential of
learning analytics. We summarize these challenges and propose a nuanced policy
pathway to learning analytics implementation and promotion.

16.4.1 Non-technical Issues

Few learning analytics papers mention ethical and legal issues linked to the appli-
cations of their recommendations (Viberg et al., 2018). Clearly, developments in
learning analytics participate to and benefit from the expansion of behavioural data
collection. The spread and depth of data collection are generating new controversies
around data privacy and security. These have an important place in public discourse
and, if mishandled by stakeholders, could contribute to further limiting the potential
16 Learning Analytics in Education for the Twenty-First Century 319

of data availability and computational power in learning analytics and similar


disciplines (Langedijk et al., 2019). Scientists are currently complaining about the
restrictions put upon their research by rules and accountability procedures. Such
rules curtail data-driven enterprises and may be detrimental to learning outcome’s
improvements (Groot & van den Brink, 2017). To facilitate collaboration between
decision-makers, it is important that the administrative procedures related to
learning analytics been seen by researchers as contributing to a healthy professional
environment (Groot & van den Brink, 2017).
Additionally, public accountability and policies promoting organizational trans-
parency may be a proper counter-balance to privacy concerns among citizens (e
Silva & Camanho, 2017). The transparency and accessibility of information, by
making relevant educational data sets public, for example, can involve citizens
in the knowledge discovery related to education and foster enthusiasm for data-
driven inference in that domain (De Smedt et al., 2017). It is also important that
the concerned parties, including civil society, are interested in applying data-driven
decision-making (Agasisti et al., 2017). It can be difficult to convince leaders in
education to shift to data-driven policies since, for them, “experience and gut-
instinct have a stronger pull” (Long & Siemens, 2011).
Just as necessary as political commitment, the acquisition of a skilled workforce
is another sizeable non-technical issue (Agasisti et al., 2017). The growth of data-
driven decision-making has yielded an increase in the demand for higher-educated
workers while reducing the employment of unskilled workers (Groot & van den
Brink, 2017). In other words, there is a gap between the growing availability of
large, complex data sets and the pool of human resources that is necessary to clean
and analyse those data (De Smedt et al., 2017). This invokes the problem, shared
across the computational social sciences, of the double requirement of technical
and analytical skills. Often, even domain-specific knowledge is an unavoidable
component of useful policy insights (De Smedt et al., 2017). That multiplicity of
professional requirements has made certain authors talk of the desirable modern
data analyst as a scholar-practitioner (Streitwieser & Ogden, 2016).

16.4.2 Technical Issues

Many technical problems must be tackled before data-driven educational policies


become a gold standard. Generally, there is a need for additional research regarding
the effects of online educational softwares and of digital data collection pipelines
on student and teacher outcomes. Additionally, inequalities in terms of the access
to online education and its usage are an ever-present challenge (Jacob et al., 2016;
Robinson et al., 2015).
There is yet relatively little evidence indicating that learning analytics improve
the learning outcomes of students (Alpert et al., 2016; Bettinger et al., 2017;
Jacob et al., 2016; Viberg et al., 2018). For example, less sophisticated correction
algorithms may be exploited by students who will tailor their solution to obtain
320 K. De Witte and M.-A. Chénier

maximal scores without obtaining the desired knowledge (De Wit & Broucker,
2017). This is a question of adjustment between the spirit and the letter of the
learning process.
Additionally, although the combination of administrative and streamed data is in
many ways advantageous compared to survey data (Langedijk et al., 2019), the fast
collection and analysis of data create issues of data accuracy. With real-time data
analyses and reorientations of the learning process, accessible computing power
becomes an issue.
Meanwhile, the unequal access to online resources and devices plainly removes
a section of the student and teacher population from being reached by the digital
tools of education. In part, this creates issues of under-representation in educational
studies that increasingly rely on data obtained online (Robinson et al., 2015). It
also creates a divide between those stakeholders that can make an informed choice
between using and developing digital tools and face-to-face education and those that
cannot access it or to whom digital education has a prohibitive cost (Bettinger et al.,
2017; Di Pietro et al., 2020; Robinson et al., 2015).
Lack of access to digital or hybrid learning tools (i.e. a mix of face-to-face and
digital education) may directly impede the learning and well-being of students.
Indeed, students with access to online and hybrid education can access resources
independently to enhance their educational pathway (Di Pietro et al., 2020). In
a sense, a larger range of choices makes better educational outcomes attainable.
For example, students at a school within a neighbourhood of low socio-economic
standing may access a diverse network of students and teachers on electronic
platforms (Jacob et al., 2016). In times of crisis such as with the COVID-19 school
lockdowns, ready access to online educational platform also reduces the opportunity
cost of education (Chakraborty et al., 2021; Di Pietro et al., 2020).
However, access is not a purely technical challenge. There are also noted gaps
between populations in terms of the usage that is made of educational platforms
and internet resources more generally (Di Pietro et al., 2020; Jacob et al., 2016).
Students participating to MOOC, for example, are overwhelmingly highly educated
professionals (Zafras et al., 2020). Online education may also leave more discretion
to students. This discretion has proven to be a disadvantage to those who perform
less well and are less motivated in face-to-face classes (Di Pietro et al., 2020).

16.4.3 Recommendations

Data-driven policies will require vast investments in information technology sys-


tems towards both data centres and highly skilled human resources. Therefore,
additional data warehouses need to be built and maintained. Those require strong
engineering capabilities (De Smedt et al., 2017). The integration of teaching and
peer collaborations within computer systems promises to accelerate innovations in
education. One can imagine that, in the future, administrative and real-time learning
data will be updated and analysed in real time. The analyses will also benefit from
16 Learning Analytics in Education for the Twenty-First Century 321

combining data from other areas of interest such as health or finance. Additionally,
the reach of analytics programs could be international, allowing for the shared
integration and advancement of knowledge systems across countries (Langedijk
et al., 2019).
Although there is a large practical potential of data-driven policies and educa-
tional tools, it is important that an educational data strategy not be developed in and
of itself. Unlike what some big data enthusiasts have claimed, the data does not
“speaks for itself” in education (Anderson, 2008). Those teachers, administrators
and policy-makers, who are working to better educate our children, will still face
complicated dilemma appealing to their professional expertise regardless of the level
of integration of data analytics in education.
Furthermore, to insure political willingness, it is critical that work teams and
stakeholders profit from the collected and analysed data (De Smedt et al., 2017).
This contributes to the transparency of data use. Finally, although the evidence is
still quite thin regarding the benefits of learning analytics, it must be noted that
only a small quantity of validated instruments are actually being used to measure
the quality and transmission of knowledge through learning platforms (Jivet et al.,
2018).
Despite this scarcity of evidence pertaining to education, the exploitation of
data through learning analytics can be linked to the recognized advantages of
big data in driving public policy. Namely, it can facilitate a differentiation of
services, increased decisional transparency, needs identification and organizational
efficiency (Broucker, 2016). Generally, the lack of available data backing a decision
is an indication of a lack of information and, thus, sub-optimal decision-making
(Broucker, 2016).
Policies can be better implemented through quick and vast access to information
about students and other educational stakeholders. In other words, the needs
of students and other educational stakeholders can be more efficiently satisfied
with evidence obtained from data collection (e.g. lower cost, higher speed of
implementation). Such evidence-based education is a rational response to the so-
called fetishization of change that has been plaguing educational reforms (Furedi,
2010; Groot & van den Brink, 2017).
It follows that data analytics should not become a new object for the fetishization
of change in educational reforms. Indeed, quantitative goals (e.g. quantity of sensors
in a classroom) should not be confounded with educational attainments (Long &
Siemens, 2011; Mandl et al., 2008). Rather, data analytics should be developed and
motivated as an approach that ensures that there are opportunities to use data in
order to sustain mutually agreeable educational objectives.
These objectives may pertain to the lifetime health, job satisfaction, time
allocation and creativity of current students (Oreopoulos & Salvanes, 2011). In other
words, learning analytics pipelines must be carefully implemented in order to ensure
that they are a rational response to contemporary challenges in education.

Acknowledgments The authors are grateful for valuable comments and suggestions from the
participants of the Education panel of the CSS4P workshop, particularly Federico Biagi and Zsuzsa
322 K. De Witte and M.-A. Chénier

Blaskó. Moreover, they wish to thank Alexandre Leroux of GERME, Francisco do Nascimento
Pitthan, Willem De Cort, Silvia Palmaccio and the members of the LEER and CSS4P team for the
rewarding discussions and suggestions.

References

Agasisti, T., Ieva, F., Masci, C., Paganoni, A. M., & Soncin, M. (2017). Data analytics applications
in education. Auerbach Publications. https://doi.org/10.4324/9781315154145-8
Alfermann, D., Holl, C., & Reimann, S. (2021). Should i stay or should i go? indicators of dropout
thoughts of doctoral students in computer science. International Journal of Higher Education,
10(3), 246–258. https://doi.org/10.5430/ijhe.v10n3p246
Alpert, W. T., Couch, K. A., & Harmon, O. R. (2016). A randomized assessment of online learning.
American Economic Review, 106(5), 378–82. https://doi.org/10.1257/aer.p20161057
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete.
Wired Magazine, 16(7), 16–07.
Bargagli-Stoffi, F. J., De Witte, K., & Gnecco, G. (2019). Heterogeneous causal effects with imper-
fect compliance: A novel bayesian machine learning approach. Preprint arXiv:1905.12707.
Barra, C., Destefanis, S., Sena, V., & Zotti, R. (2017). Disentangling faculty efficiency from stu-
dents’ effort. Data Analytics Applications in Education (pp. 105–128). Auerbach Publications.
https://doi.org/10.4324/9781315154145-5
Berland, M., Davis, D., & Smith, C. P. (2015). Amoeba: Designing for collaboration in computer
science classrooms through live learning analytics. International Journal of Computer-
Supported Collaborative Learning, 10(4), 425–447. https://doi.org/10.1007/s11412-015-9217-
z
Bettinger, E. P., Fox, L., Loeb, S., & Taylor, E. S. (2017). Virtual classrooms: How online college
courses affect student success. American Economic Review, 107(9), 2855–75. https://doi.org/
10.1257/aer.20151193
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Breiman, L., Friedman, J. H., Olshen R. A., & Stone, C. J. (2017). Classification and regression
trees. Routledge.
Broadbent, J., & Poon, W. L. (2015). Self-regulated learning strategies & academic achievement in
online higher education learning environments: A systematic review. The Internet and Higher
Education, 27, 1–13. https://doi.org/10.1016/j.iheduc.2015.04.007
Broucker, B. (2016). Big data governance; een analytisch kader. Bestuurskunde, 25(1), 24–28.
Bruno, E., Alexandre, B., Ferreira Mello, R., Falcão, T. P., Vesin, B., & Gašević, D. (2021).
Applications of learning analytics in high schools: A systematic literature review. Frontiers
in Artificial Intelligence, 4, 132.
Castaño-Muñoz, J., & Rodrigues, M. (2021). Open to moocs? Evidence of their impact on labour
market outcomes. Computers & Education, 173, 104289. https://doi.org/10.1016/j.compedu.
2021.104289
Cela, K. L., Sicilia, M. Á., & Sánchez, S. (2015). Social network analysis in e-learning
environments: A preliminary systematic review. Educational Psychology Review, 27(1), 219–
246. https://doi.org/10.1007/s10648-014-9276-0
Chakraborty P., Mittal, P., Gupta, M. S., Yadav, S., & Arora, A. (2021). Opinion of students on
online education during the COVID-19 pandemic. Human Behavior and Emerging Technolo-
gies, 3(3), 357–365. https://doi.org/10.1002/hbe2.240
Chang, W.-C., Lin, H.-W., & Wu, L.-C. (2010). Applied social network anaysis to project
curriculum. In The 6th International Conference on Networked Computing and Advanced
Information Management (pp. 710–715).
16 Learning Analytics in Education for the Twenty-First Century 323

Cooc, N., & Kim, J. S. (2017). Peer influence on children’s reading skills: A social network analysis
of elementary school classrooms. Journal of Educational Psychology, 109(5), 727. https://doi.
org/10.1037/edu0000166
Dalipi, F., Imran, A. S., & Kastrati, Z. (2018). Mooc dropout prediction using machine learning
techniques: Review and research challenges. In 2018 IEEE Global Engineering Education
Conference (EDUCON) (pp. 1007–1014).
De Smedt, J., vanden Broucke, S. K., Vanthienen, J., & De Witte, K. (2017). Improved student
feedback with process and data analytics. In Data analytics applications in education (pp. 11–
36). Auerbach Publications. https://doi.org/10.4324/9781315154145-2
De Wit, K., & Broucker, B. (2017). The governance of big data in higher education. In Data
analytics applications in education (pp. 213–234). Auerbach Publications. https://doi.org/10.
4324/9781315154145-9
De Witte, K., & Vanthienen, J. (2017). Data analytics applications in education. Auerbach
Publications. https://doi.org/10.1201/b20438
De Witte, K., & López-Torres, L. (2017). Efficiency in education: A review of literature and a way
forward. Journal of the Operational Research Society, 68(4), 339–363. https://doi.org/10.1057/
jors.2015.92
De Witte, K., & Smet, M. (2021). Financing Education in the Context of COVID-19 (Ad hoc report
No. 3/2021). European Expert Network on Economics of Education (EENEE).
DeNardis, L. (2020). The cyber-physical disruption. In The internet in everything (pp. 25–56). Yale
University Press.
Di Pietro, G., Biagi, F., Costa, P., Karpiński, Z., & Mazza, J. (2020). The likely impact of covid-
19 on education: Reflections based on the existing literature and recent international datasets
(Vol. 30275). Publications Office of the European Union.
Dorie, V., Hill, J., Shalit, U., Scott, M., & Cervone, D. (2019). Automated versus do-it-yourself
methods for causal inference: Lessons learned from a data analysis competition. Statistical
Science, 34(1), 43–68. https://doi.org/10.1214/18-STS667
e Silva, M. C. A., & Camanho, A. S. (2017). Using data analytics to benchmark schools: The case
of Portugal. In Data analytics applications in education (pp. 129–162). Auerbach Publications.
https://doi.org/10.4324/9781315154145-6
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational
Research, 83(1), 70–120. https://doi.org/10.3102/0034654312474350
Everton, S. F. (2012). Disrupting dark networks. Cambridge University Press. https://doi.org/10.
1017/CBO9781139136877
Figlio, D., Karbownik, K., & Salvanes, K. G. (2016). Education research and administrative data.
In Handbook of the economics of education (pp. 75–138). Elsevier.
Furedi, F. (2010). Wasted: Why education isn’t educating. Bloomsbury Publishing.
Greller, W., & Drachsler, H. (2012). Translating learning into numbers: A generic framework for
learning analytics. Educational Technology & Society, 15(3), 42–57.
Groot, W., & van den Brink, H. M. (2017). Evidence-based education and its implications
for research and data analytics with an application to the overeducation literature. In Data
analytics applications in education (pp. 235–260). Auerbach Publications. https://doi.org/10.
4324/9781315154145-10
Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014). Understanding classrooms through
social network analysis: A primer for social network analysis in education research. CBE—Life
Sciences Education, 13(2), 167–178. https://doi.org/10.1187/cbe.13-08-0162
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1),
81–112. https://doi.org/10.3102/003465430298487
Hawe, P., & Ghali, L. (2008). Use of social network analysis to map the social relationships of staff
and teachers at school. Health Education Research, 23(1), 62–69. https://doi.org/10.1093/her/
cyl162
Hill, J., Linero, A., & Murray J. (2020). Bayesian additive regression trees: A review and look
forward. Annual Review of Statistics and Its Application, 7, 251–278. https://doi.org/10.1146/
annurev-statistics-031219-041110
324 K. De Witte and M.-A. Chénier

Iterbeke, K., & De Witte, K. (2020). Helpful or harmful? The role of personality traits in student
experiences of the covid-19 crisis and school closure. FEB Research Report Department of
Economics. https://doi.org/10.1177/01461672211050515
Iterbeke, K., De Witte, K., Declercq, K., & Schelfhout, W. (2020). The effect of ability
matching and differentiated instruction in financial literacy education. evidence from two
randomised control trials. Economics of Education Review, 78, 101949. https://doi.org/10.1016/
j.econedurev.2019.101949
Jacob, B., Berger, D., Hart, C., & Loeb, S. (2016). Can technology help promote equality of
educational opportunities? RSF: The Russell Sage Foundation Journal of the Social Sciences,
2(5), 242–271. https://doi.org/10.7758/rsf.2016.2.5.12
Jivet, I., Scheffel, M., Specht, M., & Drachsler, H. (2018). License to evaluate: Preparing learning
analytics dashboards for educational practice. In Proceedings of the 8th International Con-
ference on Learning Analytics and Knowledge (pp. 31–40). https://doi.org/10.1145/3170358.
3170421
Kapelner, A., & Bleich, J. (2016). Bartmachine: Machine learning with bayesian additive
regression trees. Journal of Statistical Software, Articles, 70(4), 1–40. https://doi.org/10.18637/
jss.v070.i04
Kassab, M., DeFranco, J., & Laplante, P. (2020). A systematic literature review on internet of
things in education: Benefits and challenges. Journal of Computer Assisted Learning, 36(2),
115–127. https://doi.org/10.1111/jcal.12383
King, G. (2016). Big data is not about the data! computational social science: Discovery and
prediction.
Langedijk, S., Vollbracht, I., & Paruolo, P. (2019). The potential of administrative microdata for
better policy-making in Europe. In Data-driven policy impact evaluation, (p. 333). https://doi.
org/10.1007/978-3-319-78461-8_20
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., Christakis, N.,
Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy M., Roy D., & Van Alstyne,
M. (2009). Social Science: Computational social science. Science, 323(5915), 721–723. https://
doi.org/10.1126/science.1167742
Lazer, D., Pentland, A., Watts, D. J., Aral, S., Athey S., Contractor, N., Freelon, D., Gonzalez-
Bailon, S., King, G., Margetts, H., Nelson, A., Salganik, M. J., Strohmaier, M., Vespignani,
A., & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science,
369(6507), 1060–1062. https://doi.org/10.1126/science.aaz8170
Lee, H.-S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019).
Automated text scoring and real-time adjustable feedback: Supporting revision of scientific
arguments involving uncertainty Science Education, 103(3), 590–622. https://doi.org/10.1002/
sce.21504
Leitner, P., Khalil, M., & Ebner, M. (2017). Learning analytics in higher education—a literature
review. In Learning analytics: Fundaments, applications, and trends (pp. 1–23). https://doi.
org/10.1007/978-3-319-52977-6_1
Lemon, S. C., Roy J., Clark, M. A., Friedmann, P. D., & Rakowski, W. (2003). Classification
and regression tree analysis in public health: Methodological review and comparison with
logistic regression. Annals of Behavioral Medicine, 26(3), 172–181. https://doi.org/10.1207/
S15324796ABM2603_02
Litman, D. (2016). Natural language processing for enhancing teaching and learning. In Thirtieth
AAAI Conference on Artificial Intelligence.
Long, P., & Siemens, G. (2011). Penetrating the fog: Analytics in learning and education.
EDUCAUSE Review, 46(5), 30.
Lonn, S., Aguilar, S. J., & Teasley, S. D. (2015). Investigating student motivation in the context
of a learning analytics intervention during a summer bridge program. Computers in Human
Behavior, 47, 90–97. https://doi.org/10.1016/j.chb.2014.07.013
Maldonado, J., & De Witte, K. (2021). The effect of school closures on standardised student test.
British Educational Research Journal, 48(1), 49–94. https://doi.org/10.1002/berj.3754
16 Learning Analytics in Education for the Twenty-First Century 325

Mandl, U., Dierx, A., & Ilzkovitz, F. (2008). The effectiveness and efficiency of public spending
(Technical Report). Directorate General Economic and Financial Affairs (DG ECFIN).
Mazrekaj, D., De Witte, K., & Cabus, S. (2020). School outcomes of children raised by same-
sex parents: Evidence from administrative panel data. American Sociological Review, 85(5),
830–856. https://doi.org/10.1177/0003122420957249
Mergoni, A., & De Witte, K. (2021). Policy evaluation and efficiency: A systematic literature
review. International Transactions in Operational Research. https://doi.org/10.1111/itor.13012
Merlo, E., Ríos, S. A., Álvarez, H., L’Huillier, G., & Velásquez, J. D. (2010). Finding inner copy
communities using social network analysis. In International Conference on Knowledge-Based
and Intelligent Information and Engineering Systems (pp. 581–590).
Mukala, P., Buijs, J. C., Leemans, M., & van der Aalst, W. M. (2015). Learning analytics on
coursera event data: A process mining approach. In SIMPDA (pp. 18–32).
Nguyen, Q., Huptych, M., & Rienties, B. (2018). Linking students’ timing of engagement to learn-
ing design and academic performance. In Proceedings of the 8th International Conference on
Learning Analytics and Knowledge (pp. 141–150). https://doi.org/10.1145/3170358.3170398
Nouri, J., Ebner, M., Ifenthaler, D., Saqr, M., Malmberg, J., Khalil, M., Bruun, J., Viberg, O.,
Conde González, M. Á., Papamitsiou, Z., & Berthelsen, U. D. (2019). Efforts in Europe for
Data-Driven Improvement of Education–A Review of Learning Analytics Research in Seven
Countries. International Journal of Learning Analytics and Artificial Intelligence for Education
(iJAI), 1(1), 8–27. https://doi.org/10.3991/ijai.v1i1.11053
Oreopoulos, P., & Salvanes, K. G. (2011). Priceless: The nonpecuniary benefits of schooling.
Journal of Economic Perspectives, 25(1), 159–84. https://doi.org/10.1257/jep.25.1.159
Palermo, C., & Wilson, J. (2020). Implementing automated writing evaluation in different
instructional contexts: A mixed-methods study. Journal of Writing Research, 12(1), 63–108.
Pardo, A., Bartimote, K., Shum, S. B., Dawson, S., Gao, J., Gašević, D., Leichtweis, S., Liu, D.,
Martínez-Maldonado, R., Mirriahi, N., Moskal, A. C. M., Schulte, J., Siemens, G., & Vigentini,
L. (2018). Ontask: Delivering data-informed, personalized learning support actions. Journal of
Learning Analytics, 5(3), 235–249.
Podgorelec, V., Kokol, P., Stiglic, B., & Rozman, I. (2002). Decision trees: An overview and
their use in medicine. Journal of Medical Systems, 26(5), 445–463. https://doi.org/10.1023/
A:1016409317640
Rettore, E., & Trivellato, U. (2019). The use of administrative data to evaluate the impact of active
labor market policies: The case of the italian liste di mobilità. In Data-driven policy impact
evaluation (pp. 165–182). Springer. https://doi.org/10.1007/978-3-319-78461-8_11
Robinson, L., Cotten, S. R., Ono, H., Quan-Haase, A., Mesch, G., Chen, W., Schulz, J., Hale, T. M.,
& Stern, M. J. (2015). Digital inequalities and why they matter. Information, Communication
& Society, 18(5), 569–582. https://doi.org/10.1080/1369118X.2015.1012532
Ryymin, E., Palonen, T., & Hakkarainen, K. (2008). Networking relations of using ict within
a teacher community. Computers & Education, 51(3), 1264–1282. https://doi.org/10.1016/j.
compedu.2007.12.001
Selwyn, N. (2019). What’s the problem with learning analytics? Journal of Learning Analytics,
6(3), 11–19.
Smith, G. G., Haworth, R., & Žitnik, S. (2020). Computer science meets education: Natural
language processing for automatic grading of open-ended questions in ebooks. Journal of Edu-
cational Computing Research, 58(7), 1227–1255. https://doi.org/10.1177/0735633120927486
Stoffi, F. J. B., De Beckker, K., Maldonado, J. E., & De Witte, K. (2021). Assessing sensitivity of
machine learning predictions. a novel toolbox with an application to financial literacy. Preprint
arXiv:2102.04382.
Streitwieser, B., & Ogden, A. C. (2016). International higher education’s scholar-practitioners:
Bridging research and practice , Books, S., (Ed.).
Tadayon, M., & Pottie, G. J. (2020). Predicting student performance in an educational game using
a hidden markov model. IEEE Transactions on Education, 63(4), 299–304. https://doi.org/10.
1109/TE.2020.2984900
326 K. De Witte and M.-A. Chénier

Topolovec, S. (2018). A comparison of self-paced and instructor-paced online courses: The


interactive effects of course delivery mode and student characteristics.
Vanthienen, J., & De Witte, K. (2017). Data analytics applications in education. Auerbach
Publications. https://doi.org/10.4324/9781315154145
Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning
analytics in higher education. Computers in Human Behavior, 89, 98–110. https://doi.org/10.
1016/j.chb.2018.07.027
Wasserman, S., Faust, K. (1994). Social network analysis: Methods and applications. Cambridge
University Press.
Wegge, D., Vandebosch, H., & Eggermont, S. (2014). Who bullies whom online: A social network
analysis of cyberbullying in a school context. Communications, 39(4), 415–433. https://doi.org/
10.1515/commun-2014-0019
Whitelock, D., Twiner, A., Richardson, J. T., Field, D., & Pulman, S. (2015a). Feedback on
academic essay writing through pre-emptive hints: Moving towards. European Journal of
Open, Distance and E-learning, 18(1), 1–15.
Whitelock, D., Twiner, A., Richardson, J. T., Field, D., & Pulman, S. (2015b). Openessayist: A
supply and demand learning analytics tool for drafting academic essays. In Proceedings of the
Fifth International Conference on Learning Analytics and Knowledge (pp. 208–212).
Zafras, I., Kostas, A., & Sofos, A. (2020). Moocs & participation inequalities in distance education:
A systematic literature review 2009-2019. European Journal of Open Education and E-
learning Studies, 5(1), 68–89.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.

You might also like