0% found this document useful (0 votes)
91 views

PROJECT

This document outlines the specifications for a mini-project assignment in a data mining course. Students will work in groups of 2-3 people to complete one elective and one mandatory problem. The elective problems involve applying different data mining techniques like association rules, collaborative filtering, k-means clustering, naive bayes classification, and decision trees to various datasets. The mandatory problem involves applying latent semantic analysis. Groups must submit deliverables including data processing documentation, a report describing their solution, and a demonstration. Deliverables are due by the specified deadline.

Uploaded by

de santos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

PROJECT

This document outlines the specifications for a mini-project assignment in a data mining course. Students will work in groups of 2-3 people to complete one elective and one mandatory problem. The elective problems involve applying different data mining techniques like association rules, collaborative filtering, k-means clustering, naive bayes classification, and decision trees to various datasets. The mandatory problem involves applying latent semantic analysis. Groups must submit deliverables including data processing documentation, a report describing their solution, and a demonstration. Deliverables are due by the specified deadline.

Uploaded by

de santos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Mining Course: MAFE312IU

HCMC International University Date Start: 2021-05-03


Assoc. Prof. Dr. Tran Manh Ha Date End: 2021-06-03

Mini-Project
(Individual/Group submission)

Specification

This mini-project aims to provide practical activities of data mining design and implementation for
students. Through doing the mini-project, students can learn data mining techniques, provide and
implement solutions for solving data mining problems. Students form groups of 2-3 students; each
group chooses one elective problem and one mandatory program below to work with, then submits
reports with deliverables by the deadline. Groups confirm members and problems by 06/05/2021.

Problem 1.1: Association Rules (elective) (50 points)

Each group uses the Frequent Itemsets method (the Apriori algorithm) and the learned lessons to
mine the online retail dataset available at http://archive.ics.uci.edu/ml/datasets/Online+Retail.
The solution must be evaluated by the test dataset and real self-generated dataset.

Problem 1.2: Collaborative Filtering (elective) (50 points)

Each group uses the Collaborative Filtering method and the learned lessons to mine the movie
dataset available at https:// grouplens.org/datasets/movielens/100k/. The solution must be eval-
uated by the test dataset and real self-generated dataset.

Problem 1.3: K-means Clustering (elective) (50 points)

Each group uses the K-means Clustering method and the learned lessons to mine the digital hand-
written dataset available at http://yann.lecun.com/exdb/mnist/. The solution must be evaluated
by the test dataset and real self-generated dataset.

Problem 1.4: Naive Bayes Classifier (elective) (50 points)

Each group uses the Naive Bayes Classifier method and the learned lessons to mine the spam
filtering Ling-Spam dataset available at https://aclweb.org/aclwiki/Spam filtering datasets. The
solution must be evaluated by the test dataset and real self-generated dataset.

Problem 1.5: Decision Trees (elective) (50 points)

Each group uses the Decision Trees method and the learned lessons to mine the animal Zoo dataset
available at http://archive.ics.uci.edu/ml/datasets/Zoo. The solution must be evaluated by the
test dataset and real self-generated dataset.

Problem 1.6: Latent Semantic Analysis (mandatory) (50 points)

Each group uses the Latent Semantic Analysis method and the learned lessons to seek similar
documents from the text dataset to the provided queries. The field-specific dataset is provided
by the Instructor. The solution must include text data pre-processing, term-document matrix
generation, and singular value decomposition application.

Deliverables Requirements

ˆ D1: Data pre-processing, data processing with algorithm and implementation, data post-
processing (50%)
ˆ D2: A report describing D1. Note that the solutions including design, algorithm, implemen-
tation, ... must be presented in diagrams, flowcharts, pseudo-code, ... while code snippets
should be included in the appendices. (35%)
ˆ D3: A demonstration presenting D1 (15%)

You might also like