AML PRG Assign I
AML PRG Assign I
Programming Assignment-I
(EC-2 Regular)
Dataset: student.csv
The final deliverables of the Programming Assignment-I are
i) a word file documenting all the findings of every stage
ii) Python code in ipynb format
Save both files in a folder, zip and upload.
1) Descriptive Statistics
Data given in the dataset has to be understood and every feature must be explained by the student.
The datatypes present in the dataset must be found out. The measures of central tendency should
be found and explained. Based on these values, there should be a few critical insights made that
would then lead to their problem statement. Data cleaning should also be performed by
suggesting appropriate techniques to handle missing data and outliers.
Note: Exploratory Data Analysis (EDA) is used to tackle specific tasks such as:
i. Spotting mistakes and missing data;
ii. Mapping out the underlying structure of the data;
iii. Identifying the most important variables;
iv. Listing anomalies and outliers;
2) Data Visualization
Data should be visualized using the various types of charts and graphs that the student has learnt.
Utilizing these visualizations, there should be insights from every visualization that is submitted
and they should help frame the problem statement that is intended to be solved.
Presentation
Presentation is key. Ensure that your notebook is capable of explaining your insights and
visualizations by itself. Section your questions and emphasize your results. Do not hide your
final result in a sea of code or debugging cells.
Examples:
If your question is on data cleaning, highlight the rows which need to be cleaned and
show the results of your data cleaning before and after it has been applied on those
rows.
If your question asks you to prove a statement using visualizations, ensure that you
actually have a concluding statement after your graphs. Do not leave the conclusion
unstated after visualizing the data in your notebook.
It is recommended to have short bullet points explaining what you have done before each task,
especially for non-visualization tasks. This will help us understand your approach to the
problem and can help with partial marks even if you are unable to solve the entire question.
Prioritise interpretability over design. While it is encouraged to have visually appealing graphs,
make sure that you do not lose interpretability of the data in the pursuit of aesthetic
visualizations.
Insights
The last section of your report will have to be dedicated to an out of the box pursuit. If you think
you have a better way of cleaning the dataset or visualizing a question, or if you believe that you
have noticed an interesting insight that can be cleaned from the data, add them at the end of your
notebook and elaborate why you think you’re right in your report or notebook and make sure
you mention it in your recorded video. This carries weightage to your final scores.