0% found this document useful (0 votes)

22 views2 pages

Ldats2470 Project

Groups of 2-3 students must register and submit a report analyzing a dataset of their choosing that contains at least 200 observations and 10 numerical variables without missing values. The report is due May 12, 2023 and must be under 10 pages, including the analysis, conclusions, and any source codes or data files as separate attachments. Acceptable methodologies for the analysis include clustering with k-means or EM, classification with linear or nonlinear SVM, and dimension reduction with PCA or kernel PCA. The document provides instructions for performing and reporting the results of these analyses on the chosen dataset.

Uploaded by

Mathieu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views2 pages

Ldats2470 Project

Uploaded by

Mathieu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

LDATS2470 2022/23

Project

Build groups of two or three students and register this group in moodle before Thurs-
day, March 16, 2023, at 6pm. Write a report of maximum 10 pages in total. Deposit a
zip file before the deadline, Friday, May 12, 2023, at 3pm. The file must be deposited
via your group on moodle and contain your report in pdf format, the data file, and the
source codes of your programs in separate files. The first page of the report should contain
the group name and its members. You can write in English or in French.

1. Choose your own dataset, which must satisfy the following criteria:

(a) at least 200 observations

(b) at least 10 numerical variables
(c) no missing values

Formulate an objective of your analysis. What is the research question that you
would like to answer? Be original. Ideally, your dataset is original. If it has been
used elsewhere, make sure that your analysis of that dataset is original. To check
this, compilatio will be used.
Send the title, description and dataset (or a link) of your project to the teacher
before March 23, 2023 for approval.

2. Structure: In the introduction, explain the idea and objectives of your project. Give
a short graphical or numerical summary and overview of your data set, including
an explanation of the different variables. After the main part of the analysis, in
the conclusions, summarize your main results, and whether you have attained your
objectives.

3. Depending on your objective and dataset, choose one of the following methodologies:

1
(a) clustering: k-means, kernel k-means, and EM clustering
(b) classification: linear and nonlinear SVM
(c) dimension reduction: linear and nonlinear PCA

The tasks for the individual methodologies are explained in the following.

4. Clustering: Use your numerical variables to perform a clustering for a given number
of clusters, and using k-means, kernel k-means, and a Gaussian mixture estimated
by EM. For the latter, assume first diagonal covariance matrices, and if possible (de-
pending on your data set) obtain results also for EM clustering with full covariance
matrices. Compare your results with at least one other number of clusters, k. There
is no simple statistical criterion, so argue which clustering method and which k you
prefer depending on your data set and the interpretability of the clusters. Visualize
the results by projecting the data into the plane spanned by the first two principal
components, and indicate the different clusters in the graph. Try to interpret the
results.

5. Classification: Begin by performing a linear SVM, i.e. SVM with a linear kernel. Is
the data linearly separable? Report the number of support vectors, the value of the
objective function, and the proportion of correct classifications. Then do a nonlinear
SVM using a kernel such as the RBF. Optimize the parameters (capacity and kernel)
by cross-validation, e.g. five- or ten-fold. Report the number of support vectors,
the value of the objective function, and the proportion of correct classifications.
Visualize the results by projecting the data into the plane spanned by the first two
principal components. Interpret your results.

6. Dimension reduction: Using the numerical variables, perform a linear PCA by select-
ing the number dimensions, and representing the data graphically in the principal
component space. Show the duality with MDS by calculating the euclidean dis-
tances between the data and performing MDS on this distance matrix using the
same number of dimensions as for PCA. Finally, perform a kernel PCA choosing a
suitable kernel. Show the projection of the data onto the kernel principal compo-
nents graphically. Compare PCA and kernel PCA, e.g. in terms of percentage of
variance explained. Interpret the results.

ClickHouse原理解析与应用实践
No ratings yet
ClickHouse原理解析与应用实践
501 pages
Esbensen K Swarbrick B Westad F Anderson M Whitcomb P Multiv
No ratings yet
Esbensen K Swarbrick B Westad F Anderson M Whitcomb P Multiv
948 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
XSTK Project PDF
No ratings yet
XSTK Project PDF
26 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Unit_5(Dimensionality_Reduction)
No ratings yet
Unit_5(Dimensionality_Reduction)
96 pages
Day School 03
No ratings yet
Day School 03
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Principles of Multivariate Analysis
No ratings yet
Principles of Multivariate Analysis
6 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Multivariate
100% (1)
Multivariate
78 pages
Assignment 3A 2024
No ratings yet
Assignment 3A 2024
4 pages
Data Mining Project - Abinaya John
No ratings yet
Data Mining Project - Abinaya John
42 pages
AML Non Evaluative Assignment 2 Fe82d2aded8429c766345d5b671eaee1
No ratings yet
AML Non Evaluative Assignment 2 Fe82d2aded8429c766345d5b671eaee1
2 pages
Multivariate Analysis
100% (2)
Multivariate Analysis
57 pages
10-2 Data analysis and pre-processing part 4 PDF
No ratings yet
10-2 Data analysis and pre-processing part 4 PDF
23 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
A Short Course in Multivariate Statistical Methods With R
No ratings yet
A Short Course in Multivariate Statistical Methods With R
11 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
MLSP Exp2
No ratings yet
MLSP Exp2
7 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Assignment 1 A
No ratings yet
Assignment 1 A
12 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Amit_Khilare_Used_Device_Data_PM_Project
No ratings yet
Amit_Khilare_Used_Device_Data_PM_Project
25 pages
CC02 Group6 Report
No ratings yet
CC02 Group6 Report
36 pages
Assignment 2 Final
No ratings yet
Assignment 2 Final
7 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Homework 1
No ratings yet
Homework 1
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
ml2020 Pythonlab03
No ratings yet
ml2020 Pythonlab03
5 pages
MBC W1-2 Notes
No ratings yet
MBC W1-2 Notes
21 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
Unit-3
No ratings yet
Unit-3
28 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Mvda - Question Bank
No ratings yet
Mvda - Question Bank
14 pages
DOC-20241117-WA0016.
No ratings yet
DOC-20241117-WA0016.
60 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
CS202 Assignment - 4- GIKI
No ratings yet
CS202 Assignment - 4- GIKI
3 pages
ML Chapter 4 Part3
No ratings yet
ML Chapter 4 Part3
82 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
ML SummaryFINAL
No ratings yet
ML SummaryFINAL
48 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
ADAPT-Floor Pro RC Design Example
100% (2)
ADAPT-Floor Pro RC Design Example
18 pages
Hot Set Oven
No ratings yet
Hot Set Oven
2 pages
Modbus RTU Made Simple With Detailed Descriptions and Examples
No ratings yet
Modbus RTU Made Simple With Detailed Descriptions and Examples
11 pages
Ship Dynamics in Waves - Rostok
No ratings yet
Ship Dynamics in Waves - Rostok
51 pages
Problem # 8.3: Ab P H
No ratings yet
Problem # 8.3: Ab P H
4 pages
Pendekatan Covarian Based SEM Dengan Estimasi Bollen-Stine
No ratings yet
Pendekatan Covarian Based SEM Dengan Estimasi Bollen-Stine
8 pages
Z Va25 Quotations
No ratings yet
Z Va25 Quotations
17 pages
Thermal Boundary Layer - 1-2
No ratings yet
Thermal Boundary Layer - 1-2
3 pages
Membrane Structure and Function
No ratings yet
Membrane Structure and Function
52 pages
What Is The Forex Technical Analysis?
No ratings yet
What Is The Forex Technical Analysis?
37 pages
Airmassandfrontsawebquest
60% (5)
Airmassandfrontsawebquest
5 pages
Constraints Drawing Onshape
No ratings yet
Constraints Drawing Onshape
1 page
Physical Sciences Grade 12 Term 3 Week 4_2020
No ratings yet
Physical Sciences Grade 12 Term 3 Week 4_2020
12 pages
Grape Growing in Tennessee
No ratings yet
Grape Growing in Tennessee
28 pages
Lecture 4 Milling Broaching and Gear Manufacturing
No ratings yet
Lecture 4 Milling Broaching and Gear Manufacturing
67 pages
Disinfectant-Validation-Protocol-and-Report
No ratings yet
Disinfectant-Validation-Protocol-and-Report
25 pages
Gaskell 6th Solutions
No ratings yet
Gaskell 6th Solutions
229 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
30 pages
Controller design of Four Tank System
No ratings yet
Controller design of Four Tank System
36 pages
Basic Java Programs
No ratings yet
Basic Java Programs
18 pages
Spang 2020
No ratings yet
Spang 2020
16 pages
RB Boiler Product Specs
No ratings yet
RB Boiler Product Specs
4 pages
CRD Is Best Suited For Experiments With A Small Number of Treatments
No ratings yet
CRD Is Best Suited For Experiments With A Small Number of Treatments
14 pages
Dax and DAX Syntax Expalined
No ratings yet
Dax and DAX Syntax Expalined
6 pages
Sangean ATS-606A User
No ratings yet
Sangean ATS-606A User
67 pages
Small Signal Transistors TO-39 Case
No ratings yet
Small Signal Transistors TO-39 Case
1 page
How Long Ago 15 March 2023 - Google Search
No ratings yet
How Long Ago 15 March 2023 - Google Search
1 page
Chapter 4 - Coursera 10 No Correct
No ratings yet
Chapter 4 - Coursera 10 No Correct
6 pages
Jacket Piping Notes
No ratings yet
Jacket Piping Notes
2 pages

Ldats2470 Project

Uploaded by

Ldats2470 Project

Uploaded by

LDATS2470 2022/23

(a) at least 200 observations

You might also like