A Deep Learning Approach Towards Student Performance Prediction in Online Courses Challenges Based On A Global Perspective
A Deep Learning Approach Towards Student Performance Prediction in Online Courses Challenges Based On A Global Perspective
† Electrical and Computer Engineering Department, Lebanese American University, Byblos, Lebanon;
e-mail: [email protected]
‡ Data Science and Artificial Intelligence Department, Faculty of Information Technology, Zarqa University, Zarqa, Jordan;
e-mail: [email protected]
¶ Faculty of Science and Information Technology, Jadara University, Irbid, Jordan;
e-mail: [email protected]
∥ Computer Science Department, Faculty of Information Technology, Zarqa University, Zarqa, Jordan;
e-mail: [email protected]
∗∗ Software Engineering Department, Faculty of Information Technology, Zarqa University, Zarqa, Jordan;
e-mail: [email protected]
Abstract—Analyzing and evaluating students’ [3], [4], [5]. The ongoing researches in this area
progress in any learning environment is stressful and are trying to address certain objectives that enhance
time consuming if done using traditional analysis the e-Learning environment from a certain point of
methods. This is further exasperated by the increasing
number of students due to the shift of focus toward view. One of those important aspects is offering
integrating the Internet technologies in education and personalized learning [6]. The researchers focus on
the focus of academic institutions on moving toward personalized learning because of the importance of
e-Learning, blended, or online learning models. As a this aspect in improving the e-Learning contents and
result, the topic of student performance prediction delivery methods to satisfy the learner’s needs [7].
has become a vibrant research area in recent years.
To address this, machine learning and data mining Another important aspect is the adaptive learning,
techniques have emerged as a viable solution. To that which concentrates on the learning style of the learner
end, this work proposes the use of deep learning [8]. Both aspects will encourage the e-Learning en-
techniques (CNN and RNN-LSTM) to predict the vironment’s parties to integrate the required features
students’ performance at the midpoint stage of the and technologies in the e-Learning environment in
online course delivery using three distinct datasets
collected from three different regions of the world. order to satisfy the individual learner’s needs [9].
Experimental results show that deep learning models Consequently, this requires research in learning ana-
have promising performance as they outperform other lytics to analyze the learners’ needs. Thus, conducting
optimized traditional ML models in two of the three traditional analytical research with the large number
considered datasets while also having comparable of learners nowadays is a complex and time consum-
performance for the third dataset.
Index Terms—Deep Learning, e-Learning, Online ing process.
Courses, Student Performance Prediction As a solution, this paper proposes a predictive
model to predict the learner’s final grade in the course
I. I NTRODUCTION at earlier stages during the course. The predictive
model implements deep learning (DL) models to clas-
With the continued growth of Internet users, sify the students and predicts their final grades at the
the demand for new learning paradigms that rely on midway point of the online course. More specifically,
e-Learning environments is increasing rapidly [1], a convolutional neural network (CNN) and a recurrent
[2]. Different researches have been conducted in neural network with long short term memory (RNN-
the e-Learning environment in order to improve the LSTM) models are proposed as they have the poten-
beneficial advantages of e-Learning courses’ delivery
tial to accurately predict student performance given course program across 6 years.
their promising results in other applications such as Similarly, the authors of [13] proposed using a
pancreas and breast tumor detection [10], [11]. As a decision tree algorithm, namely the J48 (also known
result, this will help the instructors at earlier stages of as C4.5) algorithm, to predict student performance.
the course delivery to take care of the students who Their results emphasized the potential of such algo-
may need help. rithms for student performance prediction as they had
To evaluate the performance of the proposed deep high accuracy and speed.
learning model, three distinct datasets from three The authors of [14] also proposed using the J48
different universities located in three different regions algorithm by comparing its performance with that
in the world are used. More specifically, the first of the k-means algorithm. The results reiterated the
dataset is for a first year engineering course at a superiority of the the J48 algorithm in predicting
European University. The second dataset is for a student performance.
third year engineering course at a North American In contrast, the authors of [16] investigated the
University. Finally, the third dataset is for a first year performance of different clustering algorithms such
undergraduate course in Information Technology (IT) as k-means and hierarchical clustering in accurately
at a Middle Eastern university. This provides us with predicting student performance. The experimental re-
a more global model that is applicable to different sults showed that k-means algorithm outperformed
courses and student demographics. other algorithms as it had the better performance and
The rest of the paper is organized as follows: Sec- the faster building time.
tion II describes the previous researches and related The authors in [17] proposed the use of deep neural
work. Section III represents the proposed approach. network model for an e-learning recommendation
Section IV describes the datasets used. Section V framework. Experimental results illustrated that the
evaluates the performance of the proposed framework proposed model improved the learning experience of
in comparison with other works from the literature. the students.
Finally, Section VI concludes the paper and discusses In a similar fashion, the authors of [18] also pro-
the future work. posed using deep learning to predict student perfor-
mance. Their distinctive characteristic was that they
II. R ELATED W ORKS
used both academic and non-academic subjects. Ex-
Due to the fact that this research area has been perimental results showed that deep learning models
gaining significant attention and growing in popu- are a viable potential solution for such problems as
larity, several research works have investigated the it can offer high accuracy.
use of machine learning and data mining techniques On the other hand, the authors in [15] compared the
in educational settings. As shown in Figure 1, these performance of four tree-based models in accurately
works can be categorized as either being year-to-year predicting students’ performance. Using a Portuguese
prediction frameworks (such as [12], [13], [14], [15]) high school dataset, their experimental results showed
or course-to-course prediction frameworks (such as that tree-based models achieved high accuracy and
[16], [17]). Year-to-year prediction frameworks aim fast execution time.
to predict the performance of the students in the Finally, the authors in [19], [20] used unsupervised
courses/classes of a year based on their performance learning models to cluster users in various engage-
in the previous year. Alternatively, course-to-course ment levels. Based on these levels, they used Apriori
prediction frameworks aim to predict the performance association rules to related academic performance
of students in a course based on their performance in with student engagement. Their experimental results
previous similar courses. showed that there exists a positive correlation be-
tween students’ engagement level and their academic
performance in an e-learning environment.
Despite the promising results presented in previous
related works, their main limitation lies in the fact that
they rely on courses/classes that have already been
completed to predict the performance in subsequent
courses/classes. As mentioned earlier, this is done
either on a course-to-course basis or a year-to-year
Figure 1. Student Performance Prediction Frameworks’ Catego- basis. However, very few previous works aimed at
rization predicting the student performance in a course during
its delivery. Hence, this work aims at filling this gap
For example, the authors in [12] proposed the use by trying to accurately predict the performance of the
of classification models to predict the final grades of students at the midpoint stage of the course delivery
students. This was done by analyzing data from a
to ensure satisfactory completion of the course itself from three different universities located in three sepa-
rather than in subsequent courses. rate regions are used. In what follows, a brief descrip-
tion and visualization of these datasets is provided.
III. P ROPOSED F RAMEWORK
This work proposes using deep learning-based A. Dataset 1: First Year Engineering Course at a
models to predict the student performance at the European University
midpoint stage of the course. As is shown in Figure 2, The first dataset was collected for a first year
the proposed framework includes two main portions. engineering course at a European University [22].
The first portion is the data pre-processing and It consists of a set of tasks of different weights
and difficulty levels that were completed using a
dedicated simulation environment. Note that the data
was collected for 115 students originally. However,
only 52 students completed the course and thus were
used as part of this dataset. Figure 3 plots the first
and second principal components for Dataset 1.
A. Experiment Setup
MATLAB 2022a is used to plot the PCA figures
Table II
for the different considered datasets. Additionally, P ERFORMANCE E VALUATION OF T RADITIONAL ML AND DL
Python is used to develop and evaluate the perfor- M ODELS F OR DATASET 2
mance of the proposed framework. Algorithm Accuracy Precision Recall F-score
Optimized SVM 0.87 0.88 0.88 0.87
B. Performance Metrics Optimized K-NN 0.94 0.95 0.94 0.94
Optimized RF 0.91 0.91 0.91 0.91
Four main metrics are used to evaluate the perfor- Optimized NB 0.91 0.91 0.92 0.91
CNN 0.91 0.91 0.91 0.91
mance of the proposed deep learning-based frame- RNN-LSTM 0.88 0.88 0.88 0.88