Assignment 4
Assignment 4
📊 Worth: 9%
📅 Due: December 14, @ Midnight
🕑 Late submissions: 5% penalty per late day. Maximum of 5 late days allowed.
Feel free to add more functions to avoid code repetition
⚠ What to Submit
One team member should submit the following:
Modules
Ensure that the following modules are installed:
matplotlib
numpy
(Optional read)
In this assignment, you will study the impact of various factors affecting student performance in
exams. The "Student Performance Factors" is a synthetic dataset created for education
purposes
only. It can be found on Kaggle, a very popular data science website where many AI competitions
are
hosted.
The dataset includes various factors that may affect student performance, such as study habits,
attendance, and parental involvement. Your goal is to identify which factors have the greatest
impact
on student exam scores.
Dataset
The data contains 6608 lines and 20 columns. Each line represents the data for one student and
has
the following columns:
int
Hours_Studied Number of hours spent studying per week. 0
orfloat
Column Type Description Index
int
Attendance Percentage of classes attended. 1
orfloat
int
Sleep_Hours Average number of hours of sleep pernight. 5
orfloat
int
Previous_Scores Scores from previous exams. 6
orfloat
int
Tutoring_Sessions Number of tutoring sessions attended permonth. 9
orfloat
int
Exam_Score Final exam score. This is the dependantvariable. 19
orfloat
Columns of interest
Focus on the following factors:
Hours_Studied
Teacher_Quality
School_Type
Two additional numeric factors of your choice.
PART I
Input parameters:
the file_name
Returns:
Task
teacher_list = []
# ... code
for line in csv_reader:
# ... code
if line[TEACHER_INDEX] == '': #missing categorical value
teacher_list.append(None)
else:
teacher_list.append(line[TEACHER_INDEX])
Input parameter
Return
None
Task
Gather some statistics on the student scores, the minimum, maximum, average, standard
deviation,
median as well as the count of students.
Calculate the min_score , max_score , avg_score , the median med_scoreand the standard
deviation std
Hint: You can use numpys function np.median(scores_list) to calculate the median ,
np.mean(scores_list) to calculate the average and np.std(scores_list).
Call the function in the `main(). Copy the results into your report.
Example of output
PART II
In this section, we will focus on plotting and analysing the trends. You are free to design the
function
in which ever way you want, ensure that the graphs and the values are saved properly.
This step uses np.polyfit() to find a linear function that approximates the relationship between
student scores and other columns.
Task:
Fit a linear polynomial function onto the data using np.polyfit()where score_list is a
function of study_hours_list:
Plot both the original data and the model on the same graph.
Print the equation the equation. Copy the results into your report.
Repeat the previous steps with the two other lists choice1_list and choice2_list.
Example of graph
Happy Holidays! 🎄🎉✨ Wishing you a good end of semester and a restful break 🌟