0% found this document useful (0 votes)
11 views

Data Mining Project

The project aims to develop a classical machine learning classification model using a provided training dataset, focusing on data preprocessing, model training without deep learning techniques, and achieving high accuracy on unseen data. Students must submit their code by December 28, 2024, and adhere to strict guidelines regarding code execution, dataset handling, and evaluation criteria. The evaluation will be based on classification accuracy, with penalties for improper submissions or tampering with the evaluation process.

Uploaded by

Ädëm Lë Røï
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data Mining Project

The project aims to develop a classical machine learning classification model using a provided training dataset, focusing on data preprocessing, model training without deep learning techniques, and achieving high accuracy on unseen data. Students must submit their code by December 28, 2024, and adhere to strict guidelines regarding code execution, dataset handling, and evaluation criteria. The evaluation will be based on classification accuracy, with penalties for improper submissions or tampering with the evaluation process.

Uploaded by

Ädëm Lë Røï
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Instructions pour le projet final de Compilation

December 5, 2024

1. Objective of the Project


The purpose of this project is to demonstrate your ability to develop a classical machine
learning model for classification using fundamental concepts. You will be provided only
with a training dataset (train_data.csv) to perform the following tasks:

1. Preprocessing the Data:

• Handle missing values, scale/normalize features, encode categorical variables, and


perform any feature selection or transformation necessary to improve model per-
formance.

2. Training a Machine Learning Model:

• Use classical algorithms (e.g., sklearn, xgboost) to train a model based on the
preprocessed data.
• Deep learning techniques (e.g., TensorFlow, PyTorch) are not allowed.

3. Proposing the Most Accurate Model:

• Your goal is to develop the model that achieves the highest classification ac-
curacy on unseen data.

Once you submit your code, I will test your proposed model on a separate test dataset
(test_data.csv), which you will not have access to during the development phase. This
ensures a fair evaluation of your model’s performance on completely unseen data, replicating
real-world scenarios.
Submission Deadline: The submission deadline is set to 28-12-2024 at 23:59. Any
student who fails to submit their code before this date and time will be excluded from
the evaluation process.

1
2. Explanation of the Provided Code
You will receive a Python script (project_code.py) along with the datasets (train_data.csv
and test_data.csv). Both datasets will be located in the same folder of your working di-
rectory. The script is structured to include the following key sections:

1. Preprocessing Section:

• This is where you can modify the code to clean and preprocess the data.
• Examples of valid modifications include:
– Handling missing values.
– Encoding categorical features.
– Scaling or normalizing features.
– Applying basic feature selection or dimensionality reduction.

2. Training Section:

• Select and implement a classical machine learning algorithm for classification (e.g.,
Random Forest, Logistic Regression, XGBoost, etc.).
• Only modify the model implementation.
• Hyperparameter tuning must not be in the code in this step; hyperpa-
rameter tuning should be done separately and should not be done in the final
provided code.

3. Evaluation Section (Do Not Modify):

• The evaluation section computes key metrics (e.g., accuracy, precision, recall,
F1-score, ROC-AUC, confusion matrix).
• You are not allowed to modify this section or print results fraudulently. Any
tampering will result in strict disciplinary actions.

4. Submission Requirements: You are required to submit a folder named after your
Student Code (e.g., IA20, RSI12) containing:

• A Python file named project_code.py.


• Any additional files needed for your implementation.

3. Important Execution Requirements


To ensure a smooth and fair evaluation process, adhere to the following guidelines:

1. Code Submission:

2
• Include your Student Code in the script (e.g., IA20, RSI12, etc.).
• Specify the model used in the code (e.g., RandomForestClassifier, XGBoost,
etc.).
• Ensure your code executes without errors and generates the required results.csv
file.

2. Dataset Locations:

• Both train_data.csv and test_data.csv will be placed in the same folder.


Your code should load them accordingly.

3. Execution Speed:

• Ensure preprocessing and training steps are efficient. Long execution times may
negatively impact evaluation.

4. Hyperparameter Tuning:

• Hyperparameter tuning is not allowed in the provided code. If you wish to


explore tuning, document it separately as an appendix.

4. Evaluation Criteria
Your performance in this project will be evaluated as follows:

Participation
• 5 points: Awarded to every student who attends the practical session (TP).

Theoretical Component
• 5 points: Based on theoretical exercises proposed during the examination (partial).

Project Scoring
Your models will be ranked based on their classification accuracy on the test dataset. Scores
will be distributed as follows:

• 10 points: Top 10 most accurate models.

• 8 points: Next 10 most accurate models.

• 6 points: Next 10 most accurate models.

• 4 points: Next 30 most accurate models.

3
• 2 points: Next 10 most accurate models.

• 0 points: All remaining submissions that fail to produce valid results or perform
poorly.

Disqualification
• Submissions that fail to execute properly or do not generate the required results.csv
file will be excluded from evaluation.

• Students who modify the evaluation function or tamper with the test process will face
serious penalties, including potential academic consequences.

5. Final Reminders
• Use only the following libraries: numpy, matplotlib, seaborn, sklearn, xgboost.

• Ensure your code is properly commented and organized.

• Respect the project rules, as this is not just a test of your technical skills but also your
integrity and adherence to guidelines.

This project provides a valuable opportunity to demonstrate your machine learning pro-
ficiency and compete for top scores. Put in your best effort, and good luck!

You might also like