PROJECT
PROJECT
Mini-Project
(Individual/Group submission)
Specification
This mini-project aims to provide practical activities of data mining design and implementation for
students. Through doing the mini-project, students can learn data mining techniques, provide and
implement solutions for solving data mining problems. Students form groups of 2-3 students; each
group chooses one elective problem and one mandatory program below to work with, then submits
reports with deliverables by the deadline. Groups confirm members and problems by 06/05/2021.
Each group uses the Frequent Itemsets method (the Apriori algorithm) and the learned lessons to
mine the online retail dataset available at http://archive.ics.uci.edu/ml/datasets/Online+Retail.
The solution must be evaluated by the test dataset and real self-generated dataset.
Each group uses the Collaborative Filtering method and the learned lessons to mine the movie
dataset available at https:// grouplens.org/datasets/movielens/100k/. The solution must be eval-
uated by the test dataset and real self-generated dataset.
Each group uses the K-means Clustering method and the learned lessons to mine the digital hand-
written dataset available at http://yann.lecun.com/exdb/mnist/. The solution must be evaluated
by the test dataset and real self-generated dataset.
Each group uses the Naive Bayes Classifier method and the learned lessons to mine the spam
filtering Ling-Spam dataset available at https://aclweb.org/aclwiki/Spam filtering datasets. The
solution must be evaluated by the test dataset and real self-generated dataset.
Each group uses the Decision Trees method and the learned lessons to mine the animal Zoo dataset
available at http://archive.ics.uci.edu/ml/datasets/Zoo. The solution must be evaluated by the
test dataset and real self-generated dataset.
Each group uses the Latent Semantic Analysis method and the learned lessons to seek similar
documents from the text dataset to the provided queries. The field-specific dataset is provided
by the Instructor. The solution must include text data pre-processing, term-document matrix
generation, and singular value decomposition application.
Deliverables Requirements
D1: Data pre-processing, data processing with algorithm and implementation, data post-
processing (50%)
D2: A report describing D1. Note that the solutions including design, algorithm, implemen-
tation, ... must be presented in diagrams, flowcharts, pseudo-code, ... while code snippets
should be included in the appendices. (35%)
D3: A demonstration presenting D1 (15%)