PROJECT

This document outlines the specifications for a mini-project assignment in a data mining course. Students will work in groups of 2-3 people to complete one elective and one mandatory problem. The elective problems involve applying different data mining techniques like association rules, collaborative filtering, k-means clustering, naive bayes classification, and decision trees to various datasets. The mandatory problem involves applying latent semantic analysis. Groups must submit deliverables including data processing documentation, a report describing their solution, and a demonstration. Deliverables are due by the specified deadline.

Uploaded by

de santos

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views

PROJECT

Uploaded by

de santos

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Data Mining Course: MAFE312IU

HCMC International University Date Start: 2021-05-03

Assoc. Prof. Dr. Tran Manh Ha Date End: 2021-06-03

Mini-Project
(Individual/Group submission)

Specification

This mini-project aims to provide practical activities of data mining design and implementation for
students. Through doing the mini-project, students can learn data mining techniques, provide and
implement solutions for solving data mining problems. Students form groups of 2-3 students; each
group chooses one elective problem and one mandatory program below to work with, then submits
reports with deliverables by the deadline. Groups confirm members and problems by 06/05/2021.

Problem 1.1: Association Rules (elective) (50 points)

Each group uses the Frequent Itemsets method (the Apriori algorithm) and the learned lessons to
mine the online retail dataset available at http://archive.ics.uci.edu/ml/datasets/Online+Retail.
The solution must be evaluated by the test dataset and real self-generated dataset.

Problem 1.2: Collaborative Filtering (elective) (50 points)

Each group uses the Collaborative Filtering method and the learned lessons to mine the movie
dataset available at https:// grouplens.org/datasets/movielens/100k/. The solution must be eval-
uated by the test dataset and real self-generated dataset.

Problem 1.3: K-means Clustering (elective) (50 points)

Each group uses the K-means Clustering method and the learned lessons to mine the digital hand-
written dataset available at http://yann.lecun.com/exdb/mnist/. The solution must be evaluated
by the test dataset and real self-generated dataset.

Problem 1.4: Naive Bayes Classifier (elective) (50 points)

Each group uses the Naive Bayes Classifier method and the learned lessons to mine the spam
filtering Ling-Spam dataset available at https://aclweb.org/aclwiki/Spam filtering datasets. The
solution must be evaluated by the test dataset and real self-generated dataset.

Problem 1.5: Decision Trees (elective) (50 points)

Each group uses the Decision Trees method and the learned lessons to mine the animal Zoo dataset
available at http://archive.ics.uci.edu/ml/datasets/Zoo. The solution must be evaluated by the
test dataset and real self-generated dataset.

Problem 1.6: Latent Semantic Analysis (mandatory) (50 points)

Each group uses the Latent Semantic Analysis method and the learned lessons to seek similar
documents from the text dataset to the provided queries. The field-specific dataset is provided
by the Instructor. The solution must include text data pre-processing, term-document matrix
generation, and singular value decomposition application.

Deliverables Requirements

D1: Data pre-processing, data processing with algorithm and implementation, data post-
processing (50%)
D2: A report describing D1. Note that the solutions including design, algorithm, implemen-
tation, ... must be presented in diagrams, flowcharts, pseudo-code, ... while code snippets
should be included in the appendices. (35%)
D3: A demonstration presenting D1 (15%)

Aruna Koneru - English Language Skills-Tata McGraw Hill Education (2011)
100% (4)
Aruna Koneru - English Language Skills-Tata McGraw Hill Education (2011)
507 pages
CSC522: Automated Learning and Data Analysis Asynchronous Online Class
No ratings yet
CSC522: Automated Learning and Data Analysis Asynchronous Online Class
11 pages
Semester Project Requirements
No ratings yet
Semester Project Requirements
3 pages
Lab Practice-II Manual
No ratings yet
Lab Practice-II Manual
57 pages
MCSE615L_DATA-ANALYTICS_TH_1.0_71_MCSE615L_67 ACP
No ratings yet
MCSE615L_DATA-ANALYTICS_TH_1.0_71_MCSE615L_67 ACP
2 pages
Introduction To Big data-21CS753-syllabus
No ratings yet
Introduction To Big data-21CS753-syllabus
3 pages
CSCI 6882 - Data Warehouse and Data Mining
No ratings yet
CSCI 6882 - Data Warehouse and Data Mining
4 pages
BAI602-ML-I
No ratings yet
BAI602-ML-I
4 pages
2425-HK1-MMDS
No ratings yet
2425-HK1-MMDS
3 pages
Big Data Analystics
No ratings yet
Big Data Analystics
4 pages
Extended Essay Proposal Form EE PF 1 3
No ratings yet
Extended Essay Proposal Form EE PF 1 3
2 pages
Complex Engineering Problem-RTES Spring 2022
No ratings yet
Complex Engineering Problem-RTES Spring 2022
3 pages
7th Cssyll
No ratings yet
7th Cssyll
49 pages
Semester Project Description
No ratings yet
Semester Project Description
3 pages
3T EDS 151 Modular Activity Guide
No ratings yet
3T EDS 151 Modular Activity Guide
1 page
Computer Networks Syllabus
No ratings yet
Computer Networks Syllabus
3 pages
CCP Assignment Data Science 03062022 121250pm
No ratings yet
CCP Assignment Data Science 03062022 121250pm
3 pages
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
No ratings yet
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
2 pages
mca2syll
No ratings yet
mca2syll
27 pages
Project EE331 2019S
No ratings yet
Project EE331 2019S
2 pages
Ai &Ml Syllabus
No ratings yet
Ai &Ml Syllabus
4 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Main EL CM1 2 2023
No ratings yet
Main EL CM1 2 2023
72 pages
EContent_7_2025_01_31_11_08_21_01IT0610pdf__2023_12_17_20_26_49pdf__2025_01_16_07_59_27
No ratings yet
EContent_7_2025_01_31_11_08_21_01IT0610pdf__2023_12_17_20_26_49pdf__2025_01_16_07_59_27
3 pages
MGMT3012 - CourseOutline Revised 2022 S4
No ratings yet
MGMT3012 - CourseOutline Revised 2022 S4
6 pages
Jangan Hapus 1
No ratings yet
Jangan Hapus 1
14 pages
Individual Assignment (10%) SKEM4173 Artificial Intelligence Session 2017/2018-2
No ratings yet
Individual Assignment (10%) SKEM4173 Artificial Intelligence Session 2017/2018-2
1 page
AI&ML-Syllabus
No ratings yet
AI&ML-Syllabus
3 pages
6th Sem Scheme and Syllabus Cse 2021
No ratings yet
6th Sem Scheme and Syllabus Cse 2021
36 pages
Presentation of The Course Theory of Decisions
No ratings yet
Presentation of The Course Theory of Decisions
4 pages
attachment_1_26
No ratings yet
attachment_1_26
4 pages
Using HMMs and Bagged Decision Trees To Leverage R
No ratings yet
Using HMMs and Bagged Decision Trees To Leverage R
17 pages
DSA Lab
No ratings yet
DSA Lab
4 pages
2_syllabus
No ratings yet
2_syllabus
3 pages
M.L Lab Syllabus Copy
No ratings yet
M.L Lab Syllabus Copy
4 pages
B.Tech CSE 8th sem
No ratings yet
B.Tech CSE 8th sem
10 pages
7cseaimlsyll
No ratings yet
7cseaimlsyll
11 pages
Metaheuristic Algorithms On Feature Selection: A Survey of One Decade of Research (2009-2019)
No ratings yet
Metaheuristic Algorithms On Feature Selection: A Survey of One Decade of Research (2009-2019)
26 pages
Group Assignment
No ratings yet
Group Assignment
4 pages
ICT501 DBMS SoW OCT2024
No ratings yet
ICT501 DBMS SoW OCT2024
9 pages
John Ivanov Cosmos Application Beth
No ratings yet
John Ivanov Cosmos Application Beth
10 pages
7th sem syallbus copy
No ratings yet
7th sem syallbus copy
10 pages
Machine Learning Syllabus Copy
No ratings yet
Machine Learning Syllabus Copy
4 pages
MECH551 - Nanotechnology Coursework 2021-22
No ratings yet
MECH551 - Nanotechnology Coursework 2021-22
5 pages
Notes
No ratings yet
Notes
11 pages
END321_StochasticModels_Spring2025
No ratings yet
END321_StochasticModels_Spring2025
2 pages
MDM 2024 Dethi
No ratings yet
MDM 2024 Dethi
4 pages
IS 471/MIS 671 - Big Data
No ratings yet
IS 471/MIS 671 - Big Data
6 pages
CS6398 Midterm Quiz 2 Attempt Review PDF
No ratings yet
CS6398 Midterm Quiz 2 Attempt Review PDF
7 pages
6csdsyll
No ratings yet
6csdsyll
48 pages
21cs52 CN Syllabus
No ratings yet
21cs52 CN Syllabus
3 pages
1 22csu601-Aiml Syllabus
No ratings yet
1 22csu601-Aiml Syllabus
4 pages
BIDSS Preliminary Updated
No ratings yet
BIDSS Preliminary Updated
8 pages
Ac 2007-1906: Materials Selection Exercises Based On Current Events
No ratings yet
Ac 2007-1906: Materials Selection Exercises Based On Current Events
18 pages
6cessyll
No ratings yet
6cessyll
50 pages
CT Assignment 1 Guideline
No ratings yet
CT Assignment 1 Guideline
4 pages
E-Note_33535_Content_Document_20250322050519PM
No ratings yet
E-Note_33535_Content_Document_20250322050519PM
4 pages
EE 446 Embedded System Design
No ratings yet
EE 446 Embedded System Design
2 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Metaheuristic: Fundamentals and Applications
From Everand
Metaheuristic: Fundamentals and Applications
Fouad Sabry
No ratings yet
Acr 8 Project Sagip
No ratings yet
Acr 8 Project Sagip
4 pages
Level Test 2022-23
No ratings yet
Level Test 2022-23
41 pages
ESL Curriculum Guide - A Resource For Teachers
No ratings yet
ESL Curriculum Guide - A Resource For Teachers
92 pages
SSP - S7LT IIe F 6 - LIVING THINGS OTHER THAN PLANTS ANIMALS BELDEROL
No ratings yet
SSP - S7LT IIe F 6 - LIVING THINGS OTHER THAN PLANTS ANIMALS BELDEROL
7 pages
Chapter 1. Measurement Assessment and Ev
No ratings yet
Chapter 1. Measurement Assessment and Ev
6 pages
TEFL How To Teach Speaking
No ratings yet
TEFL How To Teach Speaking
4 pages
DLP WRB 10 2
No ratings yet
DLP WRB 10 2
5 pages
En8Rc-Iiig-3.1.12: Examine Biases (For or Against) Made by The Author
0% (1)
En8Rc-Iiig-3.1.12: Examine Biases (For or Against) Made by The Author
2 pages
Mini Research Technology in Elt Kel.6
No ratings yet
Mini Research Technology in Elt Kel.6
12 pages
Teacher Education in Pakistan
No ratings yet
Teacher Education in Pakistan
4 pages
Matrix of Proposed Trainings - Seminars-Lac Sessions For Sy 2021-2022
No ratings yet
Matrix of Proposed Trainings - Seminars-Lac Sessions For Sy 2021-2022
3 pages
Food Sculpture
No ratings yet
Food Sculpture
6 pages
SSRN Id3586783 PDF
No ratings yet
SSRN Id3586783 PDF
7 pages
English Language Collection Rangoon PDF
No ratings yet
English Language Collection Rangoon PDF
100 pages
Rethinking Special Education A Comprehensive Appro
No ratings yet
Rethinking Special Education A Comprehensive Appro
7 pages
The Consequences of Cheating in Exams.
No ratings yet
The Consequences of Cheating in Exams.
15 pages
Jerry Valentine IPILvl 1 Apr 10
No ratings yet
Jerry Valentine IPILvl 1 Apr 10
1 page
Homework and Technology by Shaun Wilden
No ratings yet
Homework and Technology by Shaun Wilden
33 pages
Iseek
No ratings yet
Iseek
3 pages
Effectiveness of Personal Interaction in A Learner-Centered Paradigm Distance Education Class Based On Student Satisfaction
No ratings yet
Effectiveness of Personal Interaction in A Learner-Centered Paradigm Distance Education Class Based On Student Satisfaction
2 pages
DLL 2nd g7 Pe Badminton
100% (3)
DLL 2nd g7 Pe Badminton
2 pages
Reading Activities Training
No ratings yet
Reading Activities Training
11 pages
001a Soberano, Ma. Jinky C.
No ratings yet
001a Soberano, Ma. Jinky C.
8 pages
The Relationship Between Gender and Reading Comprehension at College Level
No ratings yet
The Relationship Between Gender and Reading Comprehension at College Level
18 pages
Education 110 Classroom Observation Guidelines
No ratings yet
Education 110 Classroom Observation Guidelines
10 pages
Jacobs 1832016 B Je Sbs 28810
No ratings yet
Jacobs 1832016 B Je Sbs 28810
16 pages
Arellano University: Watching Video Lesson in Lms
No ratings yet
Arellano University: Watching Video Lesson in Lms
2 pages
Predicting The Main Purpose of A Text
No ratings yet
Predicting The Main Purpose of A Text
9 pages
Eco Internship Report Writing
No ratings yet
Eco Internship Report Writing
5 pages

PROJECT

Uploaded by

PROJECT

Uploaded by

Data Mining Course: MAFE312IU

HCMC International University Date Start: 2021-05-03

Problem 1.1: Association Rules (elective) (50 points)

Problem 1.2: Collaborative Filtering (elective) (50 points)

Problem 1.3: K-means Clustering (elective) (50 points)

Problem 1.4: Naive Bayes Classifier (elective) (50 points)

Problem 1.5: Decision Trees (elective) (50 points)

Problem 1.6: Latent Semantic Analysis (mandatory) (50 points)

You might also like