0% found this document useful (0 votes)

11 views

Data Mining Project

The project aims to develop a classical machine learning classification model using a provided training dataset, focusing on data preprocessing, model training without deep learning techniques, and achieving high accuracy on unseen data. Students must submit their code by December 28, 2024, and adhere to strict guidelines regarding code execution, dataset handling, and evaluation criteria. The evaluation will be based on classification accuracy, with penalties for improper submissions or tampering with the evaluation process.

Uploaded by

Ädëm Lë Røï

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Data Mining Project

Uploaded by

Ädëm Lë Røï

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Instructions pour le projet final de Compilation

December 5, 2024

1. Objective of the Project

The purpose of this project is to demonstrate your ability to develop a classical machine
learning model for classification using fundamental concepts. You will be provided only
with a training dataset (train_data.csv) to perform the following tasks:

1. Preprocessing the Data:

• Handle missing values, scale/normalize features, encode categorical variables, and

perform any feature selection or transformation necessary to improve model per-
formance.

2. Training a Machine Learning Model:

• Use classical algorithms (e.g., sklearn, xgboost) to train a model based on the
preprocessed data.
• Deep learning techniques (e.g., TensorFlow, PyTorch) are not allowed.

3. Proposing the Most Accurate Model:

• Your goal is to develop the model that achieves the highest classification ac-
curacy on unseen data.

Once you submit your code, I will test your proposed model on a separate test dataset
(test_data.csv), which you will not have access to during the development phase. This
ensures a fair evaluation of your model’s performance on completely unseen data, replicating
real-world scenarios.
Submission Deadline: The submission deadline is set to 28-12-2024 at 23:59. Any
student who fails to submit their code before this date and time will be excluded from
the evaluation process.

1
2. Explanation of the Provided Code
You will receive a Python script (project_code.py) along with the datasets (train_data.csv
and test_data.csv). Both datasets will be located in the same folder of your working di-
rectory. The script is structured to include the following key sections:

1. Preprocessing Section:

• This is where you can modify the code to clean and preprocess the data.
• Examples of valid modifications include:
– Handling missing values.
– Encoding categorical features.
– Scaling or normalizing features.
– Applying basic feature selection or dimensionality reduction.

2. Training Section:

• Select and implement a classical machine learning algorithm for classification (e.g.,
Random Forest, Logistic Regression, XGBoost, etc.).
• Only modify the model implementation.
• Hyperparameter tuning must not be in the code in this step; hyperpa-
rameter tuning should be done separately and should not be done in the final
provided code.

3. Evaluation Section (Do Not Modify):

• The evaluation section computes key metrics (e.g., accuracy, precision, recall,
F1-score, ROC-AUC, confusion matrix).
• You are not allowed to modify this section or print results fraudulently. Any
tampering will result in strict disciplinary actions.

4. Submission Requirements: You are required to submit a folder named after your
Student Code (e.g., IA20, RSI12) containing:

• A Python file named project_code.py.

• Any additional files needed for your implementation.

3. Important Execution Requirements

To ensure a smooth and fair evaluation process, adhere to the following guidelines:

1. Code Submission:

2
• Include your Student Code in the script (e.g., IA20, RSI12, etc.).
• Specify the model used in the code (e.g., RandomForestClassifier, XGBoost,
etc.).
• Ensure your code executes without errors and generates the required results.csv
file.

2. Dataset Locations:

• Both train_data.csv and test_data.csv will be placed in the same folder.

Your code should load them accordingly.

3. Execution Speed:

• Ensure preprocessing and training steps are efficient. Long execution times may
negatively impact evaluation.

4. Hyperparameter Tuning:

• Hyperparameter tuning is not allowed in the provided code. If you wish to

explore tuning, document it separately as an appendix.

4. Evaluation Criteria
Your performance in this project will be evaluated as follows:

Participation
• 5 points: Awarded to every student who attends the practical session (TP).

Theoretical Component
• 5 points: Based on theoretical exercises proposed during the examination (partial).

Project Scoring
Your models will be ranked based on their classification accuracy on the test dataset. Scores
will be distributed as follows:

• 10 points: Top 10 most accurate models.

• 8 points: Next 10 most accurate models.

• 6 points: Next 10 most accurate models.

• 4 points: Next 30 most accurate models.

3
• 2 points: Next 10 most accurate models.

• 0 points: All remaining submissions that fail to produce valid results or perform
poorly.

Disqualification
• Submissions that fail to execute properly or do not generate the required results.csv
file will be excluded from evaluation.

• Students who modify the evaluation function or tamper with the test process will face
serious penalties, including potential academic consequences.

5. Final Reminders
• Use only the following libraries: numpy, matplotlib, seaborn, sklearn, xgboost.

• Ensure your code is properly commented and organized.

• Respect the project rules, as this is not just a test of your technical skills but also your
integrity and adherence to guidelines.

This project provides a valuable opportunity to demonstrate your machine learning pro-
ficiency and compete for top scores. Put in your best effort, and good luck!

Whitepaper - Foundational Large Language Models & Text Generation
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
JtdmoMJK64 hw4
No ratings yet
JtdmoMJK64 hw4
10 pages
col780_a3-1 (1)
No ratings yet
col780_a3-1 (1)
5 pages
IntroML Project Description - CLC 2425
No ratings yet
IntroML Project Description - CLC 2425
5 pages
IDS UNIT1
No ratings yet
IDS UNIT1
3 pages
SEN Question Paper Solution P.A Test 2
No ratings yet
SEN Question Paper Solution P.A Test 2
11 pages
CSC 603 - Final Project
No ratings yet
CSC 603 - Final Project
3 pages
Python Engineer problem statements
No ratings yet
Python Engineer problem statements
5 pages
Project Requirements Student Version 1.0
No ratings yet
Project Requirements Student Version 1.0
6 pages
Onlline Doctor Booking
No ratings yet
Onlline Doctor Booking
19 pages
Machine Learning Dev Ops Engineer Nanodegree Program Syllabus
No ratings yet
Machine Learning Dev Ops Engineer Nanodegree Program Syllabus
16 pages
CCIE Service Provider Exam Format
No ratings yet
CCIE Service Provider Exam Format
3 pages
Dsa Itda 225 Lab Project
No ratings yet
Dsa Itda 225 Lab Project
2 pages
LP-II Lab Manual
No ratings yet
LP-II Lab Manual
11 pages
PRML-Lab01
No ratings yet
PRML-Lab01
2 pages
Problem Statements for Intel Unnati Industrial Training 2025
No ratings yet
Problem Statements for Intel Unnati Industrial Training 2025
13 pages
Final Project
No ratings yet
Final Project
4 pages
Big Data Framework Final Project
No ratings yet
Big Data Framework Final Project
2 pages
Assignment Task #2 - [Question] - 30 Marks
No ratings yet
Assignment Task #2 - [Question] - 30 Marks
4 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
oop_assignment
No ratings yet
oop_assignment
8 pages
ML_Final_Project
No ratings yet
ML_Final_Project
3 pages
Coursework Submission Requirements: COMP1158 Software Engineering Coursework Number: 2 of 3 Coursework ID: 0607AC
No ratings yet
Coursework Submission Requirements: COMP1158 Software Engineering Coursework Number: 2 of 3 Coursework ID: 0607AC
5 pages
TAU2466 Assignment Brief
No ratings yet
TAU2466 Assignment Brief
6 pages
DM Assignment 2
No ratings yet
DM Assignment 2
2 pages
PA3 (1)
No ratings yet
PA3 (1)
6 pages
Project Assignment.2025
No ratings yet
Project Assignment.2025
2 pages
SRS - Training Calendar
No ratings yet
SRS - Training Calendar
3 pages
ITECH1000 Assignment1 Specification Sem22014
No ratings yet
ITECH1000 Assignment1 Specification Sem22014
6 pages
CSE472_Assignment_2
No ratings yet
CSE472_Assignment_2
3 pages
Assignment2 2024
No ratings yet
Assignment2 2024
4 pages
7COM1025 Coursework Briefing Sheet 2023 Main
No ratings yet
7COM1025 Coursework Briefing Sheet 2023 Main
3 pages
CH 02
No ratings yet
CH 02
32 pages
A3-DM-f24-16122024-024141am
No ratings yet
A3-DM-f24-16122024-024141am
3 pages
PF Assignment 03
No ratings yet
PF Assignment 03
15 pages
Python 3
No ratings yet
Python 3
22 pages
Project Assignment 2024
No ratings yet
Project Assignment 2024
3 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Project Guidelines
No ratings yet
Project Guidelines
2 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
cps7001B-assessment-brief
No ratings yet
cps7001B-assessment-brief
11 pages
Sessions - DS using C++ (14EC55) - 5th Semester BE (ECE) - Units 1.pptx
No ratings yet
Sessions - DS using C++ (14EC55) - 5th Semester BE (ECE) - Units 1.pptx
58 pages
Complex Engineering Problem-ES205-Fa2023
No ratings yet
Complex Engineering Problem-ES205-Fa2023
7 pages
Chapter Five: Object-Oriented Testing
No ratings yet
Chapter Five: Object-Oriented Testing
20 pages
CT077 3 2 DSTR GroupAssignment v1
No ratings yet
CT077 3 2 DSTR GroupAssignment v1
6 pages
Final Group Project
No ratings yet
Final Group Project
4 pages
ST_UNIT-5
No ratings yet
ST_UNIT-5
16 pages
Uplyft Round 2 - Case Study
No ratings yet
Uplyft Round 2 - Case Study
3 pages
Milestone
No ratings yet
Milestone
7 pages
Test Trial - Data Engineering Manager
No ratings yet
Test Trial - Data Engineering Manager
3 pages
MMSegmentation
No ratings yet
MMSegmentation
2 pages
Visual Taxonomy Report
No ratings yet
Visual Taxonomy Report
10 pages
Objective
No ratings yet
Objective
3 pages
EXP 9
No ratings yet
EXP 9
2 pages
CH 02
No ratings yet
CH 02
32 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
CVPDL hw3
No ratings yet
CVPDL hw3
26 pages
Artic Tecture
No ratings yet
Artic Tecture
3 pages
Guc 2626 64 20661 2024-11-10T12 58 34
No ratings yet
Guc 2626 64 20661 2024-11-10T12 58 34
3 pages
Agile Foundation Courseware – English
From Everand
Agile Foundation Courseware – English
Nader Rad
No ratings yet
Mastering Generic Programming in C++: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Generic Programming in C++: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Seminar
No ratings yet
Seminar
18 pages
Module-2
100% (1)
Module-2
62 pages
A Deep Learning Approach To The Classification of 3D CAD Models
No ratings yet
A Deep Learning Approach To The Classification of 3D CAD Models
16 pages
A Spiking Neural Network (SNN) Forecast Engine For Short-Term Electrical Load Forecasting
No ratings yet
A Spiking Neural Network (SNN) Forecast Engine For Short-Term Electrical Load Forecasting
8 pages
Image-Steganography-with-CNNs (1)
No ratings yet
Image-Steganography-with-CNNs (1)
8 pages
Ai Sample Paper
0% (1)
Ai Sample Paper
2 pages
Sign Language Recognition Using Cnn
No ratings yet
Sign Language Recognition Using Cnn
35 pages
Intro To Ai Chapter 1 & 2 (Rev)
No ratings yet
Intro To Ai Chapter 1 & 2 (Rev)
15 pages
Automatic Helmet Detection
No ratings yet
Automatic Helmet Detection
9 pages
CNN Implementation in Python
No ratings yet
CNN Implementation in Python
7 pages
Optimizing Energy Consumption in Smart Homes Using Machine Learning Techniques
No ratings yet
Optimizing Energy Consumption in Smart Homes Using Machine Learning Techniques
7 pages
QB ccs345 Eai Question Bank
No ratings yet
QB ccs345 Eai Question Bank
2 pages
Case Studies Why Look at Case Studies?: Deeplearning - Ai
No ratings yet
Case Studies Why Look at Case Studies?: Deeplearning - Ai
50 pages
Anna University: Chennai 600 025: Bonafide Certificate
No ratings yet
Anna University: Chennai 600 025: Bonafide Certificate
4 pages
REVISION 2 ANSWER KEY
No ratings yet
REVISION 2 ANSWER KEY
6 pages
20048026 CU6051ES AI Coursework1.Docx
No ratings yet
20048026 CU6051ES AI Coursework1.Docx
20 pages
Fbi Crime Data
No ratings yet
Fbi Crime Data
6 pages
Dr. Sourabh Shrivastava - Image Processing
No ratings yet
Dr. Sourabh Shrivastava - Image Processing
4 pages
Pattern Recognition and Machine Learning: Fuzzy Sets in Pattern Recognition Debrup Chakraborty Cinvestav
No ratings yet
Pattern Recognition and Machine Learning: Fuzzy Sets in Pattern Recognition Debrup Chakraborty Cinvestav
38 pages
PR Project Synopsis
No ratings yet
PR Project Synopsis
3 pages
Predictive_Monitoring_in_Process_Mining_Using_Deep_Learning_for_Better_Consumer_Service-1
No ratings yet
Predictive_Monitoring_in_Process_Mining_Using_Deep_Learning_for_Better_Consumer_Service-1
12 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
8 pages
5_6339038654182200073
No ratings yet
5_6339038654182200073
3 pages
Automl: A Survey of The State-Of-The-Art
No ratings yet
Automl: A Survey of The State-Of-The-Art
37 pages
Software Requirements Specification For Sales Prediction Model Page-Ii
No ratings yet
Software Requirements Specification For Sales Prediction Model Page-Ii
11 pages
LSTM_ppt
No ratings yet
LSTM_ppt
22 pages
cv-1
No ratings yet
cv-1
5 pages
2301.08243 I-Jepa (2023)
No ratings yet
2301.08243 I-Jepa (2023)
17 pages
AIML Brochure
No ratings yet
AIML Brochure
13 pages