0% found this document useful (0 votes)
12 views

Assignment-2 IDS

Uploaded by

boony862000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Assignment-2 IDS

Uploaded by

boony862000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Computer Science & Information Systems

Assignment-2: Introduction to Data Science MM: 10

Problem Statement
Formulate a classification problem that closely aligns with a real-life system within our workplace. The
problem should be significant and can potentially benefit from machine learning classification techniques.
It's essential to choose a problem that is genuinely relevant to our work environment.
Dataset Requirements
1. The data set for the chosen problem should be relevant ensuring that it reflects the real-life
aspects of the problem.
2. The data set should consist of a large number of records, approximately 5000 or more, to ensure a
robust analysis.
3. It should have a sufficient number of attributes (10 or more) with various types, including
numerical, nominal, and categorical.
4. You can choose same dataset as chosen in assignment-1 provided it is a classification dataset.
Write python scripts for
Decision Tree Implementation
 Split the dataset into training and testing sets
 Implement a decision tree classifier using scikit-learn's DecisionTreeClassifier class.
 Train the decision tree classifier on the training dataset.
Model Evaluation
 Evaluate the performance of the trained decision tree classifier using appropriate evaluation
metrics such as accuracy, precision, recall, F1-score, and confusion matrix. Interpret the findings
from each of the evaluation metrics.
 Visualize the decision tree using graphviz or any other suitable visualization tool/library.
Hyperparameter Tuning
 Explore different hyperparameters of the decision tree classifier (e.g., max_depth,
min_samples_split, min_samples_leaf, etc.).
 Evaluate the performance of the tuned model and compare it with the untuned model.

1
Instructions
 This is a group assignment with 3 members in each group
 Choose a unique problem statement and data set for our analysis.
 Utilize Jupyter notebook for scripting and documentation.
 Include visuals where applicable.
 Ensure the code is well-documented, providing clear explanations for each step.
 Submit the assignment as a single document in PDF or Jupyter notebook format.

Deliverables

1. Source of the chosen data set.


2. A well-documented Jupyter notebook with Python code for data analysis.
3. Include the output from each code snippet in the notebook.
4. Summarize interpretations and findings from our dataset.

How to Submit

1. Combine all deliverables into one document.


2. Convert it into PDF format or use the Jupyter notebook format.
3. Name the file as "Group-[number]" before uploading it on the e-learn portal.

Evaluation Criteria

1. The relevance and scope of the chosen problem statement.


2. Fully functional Python scripts demonstrating the entire data analysis process.
3. Clear interpretations and findings communicated effectively.
4. The overall presentation in the submission document.

Timelines for Submission

Kindly submit the assignment latest by Nov 17, 2024.

Note: Plagiarism cases will be penalized strictly.

You might also like