0% found this document useful (0 votes)

67 views42 pages

Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023

The document discusses exploratory data analysis and descriptive statistics. It defines exploratory data analysis as investigating data to understand its structure and relationships without predetermined hypotheses. The key goals are to uncover patterns, detect anomalies, and develop valid models and hypotheses for future confirmatory analysis. Descriptive statistics like measures of central tendency and dispersion are used to summarize quantitative variables, while frequencies and percentages describe qualitative variables. Graphical methods are also important for representation and exploration of data.

Uploaded by

guillermobur2310

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views42 pages

Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023

Uploaded by

guillermobur2310

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Dr.

Lázaro Bustio Martínez

[email protected]
Otoño 2023

Estadístic
a
descriptiv
a
• Elementos de la estadística descriptiva
• Análisis exploratorio de datos

Agenda • Identificar los elementos de estadística

descriptiva y aplicarlos a un dataset
pequeño.
Introduction

• In all research, and before drawing conclusions about the objectives and hypotheses
proposed, it is necessary to carry out a preliminary and exploratory analysis of the data in
order to detect errors in the coding of the variables, eliminate inconsistencies, evaluate
the magnitude and type of missing data, learn about the basic characteristics of the
distribution of the variables (normality, equality of variances, presence of outliers,
linearity, etc.) and make progress on the relationships between them.
Introduction Tipo de
variable
Índices analíticos Representaciones
gráficas
Cuantitativa media, mediana, histograma,
• Most of these objectives are achieved by moda, desviación gráfico de caja
performing a descriptive analysis of the típica, rango,
amplitud
variables. Specifically, it is used measures of intercuartílica,
central tendency and dispersion to describe prueba de
normalidad
the characteristics of quantitative variables and
tables of frequencies and percentages for
Cualitativa frecuencias, diagrama de
qualitative variables. porcentajes, barras, diagrama
moda, etc. de líneas,
diagrama de
sectores
• Based on insights developed
at Bell Labs in the 60’s.
• Technique for visualizing and
Explorator summarizing data.

y Data • What can the data tell us? (in

contrast to “confirmatory”
Analysis data analysis)
• Introduced many basic
1977 techniques:
• 5-number summary, box
plots, stem and leaf
diagrams.
• 5 number summary:
• Extremes (min and max)
• Median and quartiles
Aim of the EDA

01 02 03 04 05 06 07
Maximize Uncover Extract Detect Test Develop valid Determine
insight into a underlying important outliers and underlying models optimal factor
dataset structure variables anomalies assumptions settings (Xs)
Aim of the EDA
• The goal of EDA is to open-mindedly explore data.
• Tukey: EDA is detective work… Unless detective finds the clues, judge or jury
has nothing to consider.
• Here, judge or jury is a confirmatory data analysis
• Tukey: Confirmatory data analysis goes further, assessing the strengths of the
evidence.
• With EDA, we can examine data and try to understand the meaning of
variables. What are the abbreviations stand for.
Exploratory vs Confirmatory Data Analysis

EDA CDA
• No hypothesis at first • Start with hypothesis
• Generate hypothesis • Test the null hypothesis
• Uses graphical methods (mostly) • Uses statistical models

• Descriptive Statistics • Inferential Statistics

• Graphical • EDA and theory driven
• Data driven
Pipleline of EDA

Generate good Data Based on the Try to identify Handle missing Decide on the Decide on the
research restructuring: You research confounding observations need of hypothesis based
questions may need to questions, use variables, transformation on your research
make new appropriate interaction (on response questions
variables from graphical tools relations and and/or
the existing ones. and obtain multicollinearity, explanatory
Instead of using two descriptive if any. variables).
variables, obtaining statistics. Try to
rates or percentages of
them
understand the
Creating dummy
data structure,
variables for categorical relationships,
variables anomalies,
unexpected
behaviors.
After EDA

CONFIRMATORY DATA ANALYSIS: VERIFY GET CONCLUSIONS AND PRESENT YOUR

THE HYPOTHESIS BY STATISTICAL RESULTS NICELY.
ANALYSIS
Classification of EDA*

• Exploratory data analysis is generally cross-classified in two ways. First, each method is
either non-graphical or graphical. And second, each method is either univariate or
multivariate (usually just bivariate).
• Non-graphical methods generally involve calculation of summary statistics, while graphical
methods obviously summarize the data in a diagrammatic or pictorial way.
• Univariate methods look at one variable (data column) at a time, while multivariate
methods look at two or more variables at a time to explore relationships. Usually, our
multivariate EDA will be bivariate (looking at exactly two variables), but occasionally it will
involve three or more variables.
• It is almost always a good idea to perform univariate EDA on each of the components of a
multivariate EDA before performing the multivariate EDA.

*Seltman, H.J. (2015). Experimental Design and Analysis. http://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf

Exploratory data analysis tools

• Specific statistical functions and techniques you can perform with EDA tools include:
• Clustering and dimension reduction techniques, which help create graphical displays of high-
dimensional data containing many variables.
• Univariate visualization of each field in the raw dataset, with summary statistics.
• Bivariate visualizations and summary statistics that allow you to assess the relationship between
each variable in the dataset and the target variable you’re looking at.
• Multivariate visualizations, for mapping and understanding interactions between different fields in
the data.
• K-means Clustering is a clustering method in unsupervised learning where data points are assigned
into K groups, i.e., the number of clusters, based on the distance from each group’s centroid. The
data points closest to a particular centroid will be clustered under the same category. K-means
Clustering is commonly used in market segmentation, pattern recognition, and image compression.
• Predictive models, such as linear regression, use statistics and data to predict outcomes.
Types of exploratory data analysis

• Other common types of multivariate graphics include:

• Scatter plot, which is used to plot data points on a horizontal and a vertical axis to
show how much one variable is affected by another.
• Multivariate chart, which is a graphical representation of the relationships between
factors and a response.
• Run chart, which is a line graph of data plotted over time.
• Bubble chart, which is a data visualization that displays multiple circles (bubbles) in a
two-dimensional plot.
• Heat map, which is a graphical representation of data where values are depicted by
color.
Example

• Data from the Places Rated Almanac (Boyer and Savageau, 1985) 9 variables from 329
metropolitan areas in the USA:
• Climate mildness
• Housing cost
Questions:
• Health care and environment 1. How is climate related to location?
• Crime 2. Are there clusters in the data (excluding
• Transportation supply location)?
• Educational opportunities and effort 3. Are nearby cities similar?
• Arts and culture facilities 4. Any relation between economic outlook and
• Recreational opportunities crime?
• Personal economic outlook 5. What else???
• + latitude and longitude of each city
What is data?

• Categorical (Qualitative)
• Nominal scales – number is just a symbol that identifies a quality
• 0=male, 1=female
• 1=green, 2=blue, 3=red, 4=white
• Ordinal – rank order
• Quantitative (continuous and discrete)
• Interval – units are of identical size (i.e., Years)
• Ratio – distance from an absolute zero (i.e., Age, reaction time)
What is a measurement?

• Every measurement has 2 parts:

• The True Score (the actual state of things in the world)
• and
• ERROR! (mistakes, bad measurement, report bias, context effects, etc.)
• X=T+e
Organizing your data in a spreadsheet

Subject condition score

1 before 3
• Stacked data: 1 during 2
• Multiple cases (rows) for each subject 1 after 5
2 before 3
2 during 8
2 after 4
3 before 3
3 during 7
3 after 1

Subject before during after

• Unstacked data: 1 3 2 5
• Only one case (row) per subject 2 3 8 4

3 3 7 1
Variable Summaries

Indices of central tendency:

• Mean – the average value

• Median – the middle value
• Mode – the most frequent value

Indices of Variability:

• Variance – the spread around the mean

• Standard deviation
• Standard error of the mean (estimate)
Subject before during after
1
2
3
3
2
8
7
4
The Mean
3 3 7 3
• Mean = sum of all scores divided by number of
4 3 2 6 scores:
5 3 8 4
6 3 1 6
7 3 9 3
8 3 3 6
9 3 9 4
10 3 1 7

Sum = 30 50 50
n= 10 10 10
Mean = 3 5 5
The Variance: Sum of the
squared deviations divided
by number of scores
• In probability theory and statistics, variance is the
expectation of the squared deviation of a random
variable from its mean. Variance is a measure of
dispersion, meaning it is a measure of how far a set
of numbers is spread out from their average value.
The Variance: Sum of the squared deviations
divided by number of scores

Before Before During During After - After –

Subject before during after -mean – mean2 - mean – mean2 mean mean2
1 3 2 7 0 0 -3 9 2 4
2 3 8 4 0 0 3 9 -1 1
3 3 7 3
0 0 2 4 -2 4
4 3 2 6
0 0 -3 9 1 1
5 3 8 4
0 0 3 9 -1 1
6 3 1 6
0 0 -4 16 1 1
7 3 9 3
0 0 4 16 -2 4
8 3 3 6
0 0 -2 4 1 1
9 3 9 4
0 0 4 16 -1 1
10 3 1 7
0 0 -4 16 2 4
Sum = 30 50 50
0 0 0 108 0 22
n= 10 10 10
10 10 10
Mean = 3 5 5
VAR = 0 10.8 2.2
Variance continued

 

8.00  
8.00 8.00

  

6.00   
6.00 6.00
before

4.00   
4.00 4.00

mean             

 
2.00 2.00 2.00

 

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00

subject subject subject

• Means and variances are ways to describe a distribution of
scores.
• Knowing about your distributions is one of the best ways to
understand your data.
Distribution • A NORMAL (aka Gaussian) distribution is the most common
assumption of statistics; thus, it is often important to check
if your data are normally distributed.
What is “normal” anyway?

• With enough measurements, most variables are distributed normally.

• But in order to fully describe data we need to introduce the idea of a standard deviation.

leptokurtic

platokurtic
Standard deviation

• Variance, as calculated earlier, is arbitrary.

• What does it mean to have a variance of 10.8? Or 2.2? Or 1459.092? Or 0.000001?
• Nothing. But if you could “standardize” that value, you could talk about any variance (i.e.,
deviation) in equivalent terms.
• Standard Deviations are simply the square root of the variance.
Standard deviation

• The process of standardizing deviations goes like this:

1. Score (in the units that are meaningful)

2. Mean
3. Each score’s deviation from the mean
4. Square that deviation
5. Sum all the squared deviations (Sum of Squares)
6. Divide by n (if population) or n-1 (if sample)
7. Square root – now the value is in the units we started with!!!
Interpreting standard deviation (SD)

• First, the SD will let you know about the distribution of scores around the mean.
• High SDs (relative to the mean) indicate the scores are spread out
• Low SDs tell you that most scores are very near the mean.

High SD Low SD
Interpreting standard deviation (SD)

• Second, you can then interpret any individual score in terms of the SD.
• For example:
• mean = 50, SD = 10
• versus mean = 50, SD = 1
• A score of 55 is:
• 0.5 Standard deviation units from the mean (not much)
• OR
• 5 standard deviation units from mean (a lot!)
Standardized scores (Z)

• Third, you can use SDs to create standardized scores

• Force the scores onto a normal distribution by putting each score into units of SD.
• Subtract the mean from each score and divide by SD:
Standardi
zed
normal • ALL Z-scores have a mean
of 0 and SD of 1. Nice and
distributio simple.
• From this we can get the
n proportion of scores
anywhere in the
distribution.
The trouble with normal

• We violate assumptions about statistical tests if the distributions of our variables are not
approximately normal.
• Thus, we must first examine each variable’s distribution and adjust when necessary, so
that assumptions are met.
• Examine every variable for:
• Out of range values
• Normality

Following • Outliers
• It is necessary to get a
table of each variable
with each value and its
Checking frequency of occurrence.

data • Best way to examine

categorical variables is by
checking their
frequencies
Subject before during after
Visual display of
1

2
3.1 2.3

8.8
7

4.2
univariate data
3.2
3 2.8 7.1 3.2 • Now the example data from before has decimals
4 3.3 2.3 6.7 • What kind of data is that?
5 3.3 8.6 4.5
• Precision has increased
6 3.3 1.5 6.6

7 2.8 9.1 3.4

8 3 3.3 6.5

9 3.1 9.5 4.1

10 3 1 7.3
Subject before during after
Visual display of
1

2
3.1 2.3

8.8
7

4.2
univariate data
3.2
3 2.8 7.1 3.2 • Histograms
4 3.3 2.3 6.7
• Stem and Leaf plots
5 3.3 8.6 4.5
• Boxplots
6 3.3 1.5 6.6

7 2.8 9.1 3.4

• QQ Plots
8 3 3.3 6.5 • …and many many more.
9 3.1 9.5 4.1

10 3 1 7.3
So…what do you do?

IF YOU FIND A MISTAKE, FIX IT. IF YOU FIND AN OUTLIER, TRIM IF YOUR DISTRIBUTIONS ARE
IT OR DELETE IT. ASKEW, TRANSFORM THE DATA.
Dealing with Outliers

• First, try to explain it.

• In a normal distribution 0.4% are outliers (>2.7 SD) and 1 in a million is an extreme outlier
(>4.72 SD).
• For analyses you can:
• Delete the value – crude but effective
• Change the outlier to value ~3 SD from mean
• “Winsorize” it (make = to next highest value)
• “Trim” the mean – recalculate mean from data within interquartile range
Scales of Graphs

It is very important to pay attention to the scale Compare the following graphs created from
that you are using when you are plotting. identical data
Scales of Graph
18 30

-10

-2 -20

before during after followup before during after followup

3
M
e
a
n 2
before during after followup
Steps in Data Exploration and Processing

1. Identification of variables and data types

2. Analyzing the basic metrics
3. Non-Graphical Univariate Analysis
4. Graphical Univariate Analysis
5. Bivariate Analysis
6. Variable transformations
7. Missing value treatment
8. Outlier treatment
9. Correlation Analysis
10. Dimensionality Reduction
Summary

01 02 03
Examine all your Use visual displays Transform each
variables thoroughly whenever possible variable as necessary
and carefully before to deal with
you begin analysis mistakes, outliers,
and distributions
Tarea

• Desarrollar el ejemplo que se describe en el artículo “Exploratory data analysis in Python”,

de Tanu N. Prabhu
• https://towardsdatascience.com/exploratory-data-analysis-in-python-c9a77dfa39ce
• Documentar el ejercicio realizado en Github.

Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
It0089 Finalreviewer
100% (1)
It0089 Finalreviewer
143 pages
Introductory Statistics 9th Edition Neil Weiss
No ratings yet
Introductory Statistics 9th Edition Neil Weiss
298 pages
Psychology 117 Study Guide
100% (3)
Psychology 117 Study Guide
41 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Data Science - Module 2 (Updated)
No ratings yet
Data Science - Module 2 (Updated)
94 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
Unit 1
No ratings yet
Unit 1
72 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Statistics
No ratings yet
Statistics
87 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
43 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Telyu 05
No ratings yet
Telyu 05
53 pages
Chapter Five
No ratings yet
Chapter Five
48 pages
Unit 4
No ratings yet
Unit 4
152 pages
ES031 M1 DataCollection&Presentation
No ratings yet
ES031 M1 DataCollection&Presentation
64 pages
Lecture Notes
No ratings yet
Lecture Notes
37 pages
EDA - Module 4
No ratings yet
EDA - Module 4
35 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
MPC 006 2024-25 For SSC and All Educational Needs
No ratings yet
MPC 006 2024-25 For SSC and All Educational Needs
27 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
1.9 Data and Data Analysis
No ratings yet
1.9 Data and Data Analysis
31 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Foundations or Research Analysis
No ratings yet
Foundations or Research Analysis
31 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Unit 2
No ratings yet
Unit 2
20 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Unit .......
No ratings yet
Unit .......
45 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Math6 q3 DLP Surface Area Cot Latest Format
100% (2)
Math6 q3 DLP Surface Area Cot Latest Format
7 pages
Stats Reviewer
No ratings yet
Stats Reviewer
5 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Cba101 MT
No ratings yet
Cba101 MT
4 pages
It Is Also Including Hypothesis Testing and Sampling
No ratings yet
It Is Also Including Hypothesis Testing and Sampling
12 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
ISOM Cheat Sheet 1
No ratings yet
ISOM Cheat Sheet 1
6 pages
Glossary of Terms
No ratings yet
Glossary of Terms
7 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
No ratings yet
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
1 page
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
NPTEL Lecture Series
No ratings yet
NPTEL Lecture Series
30 pages
Nodal Analysis
No ratings yet
Nodal Analysis
25 pages
Business Analytics CHAPTER 2
No ratings yet
Business Analytics CHAPTER 2
3 pages
1.3 - Steps in Scientific Investigation
No ratings yet
1.3 - Steps in Scientific Investigation
8 pages
Đề Prediction Tháng 6 Cooking SAT
No ratings yet
Đề Prediction Tháng 6 Cooking SAT
28 pages
Predicate Logic (FOL) Ans Sementic Net
100% (1)
Predicate Logic (FOL) Ans Sementic Net
87 pages
Discrete Fourier Analysis 1st Edition M W Wong Auth PDF Download
No ratings yet
Discrete Fourier Analysis 1st Edition M W Wong Auth PDF Download
48 pages
MAMathEd 220 Number Theory
No ratings yet
MAMathEd 220 Number Theory
136 pages
Storytown 1st Edition Harcourt School Publishers All Chapter Instant Download
100% (9)
Storytown 1st Edition Harcourt School Publishers All Chapter Instant Download
59 pages
Lez03 Taglio
No ratings yet
Lez03 Taglio
69 pages
UPPSC Combined State / Upper Subordinate Services (UPPCS) Preliminary Exam Syllabus
No ratings yet
UPPSC Combined State / Upper Subordinate Services (UPPCS) Preliminary Exam Syllabus
6 pages
Viva Questions and Answers
No ratings yet
Viva Questions and Answers
14 pages
Time Sereis in R
No ratings yet
Time Sereis in R
76 pages
Fair Weather Atmosphere For Power Generation
No ratings yet
Fair Weather Atmosphere For Power Generation
14 pages
1 Kundur IntroDSP Handouts
No ratings yet
1 Kundur IntroDSP Handouts
13 pages
Week 1 - Basic Machine Organization and Program Execution
No ratings yet
Week 1 - Basic Machine Organization and Program Execution
49 pages
Even or Odd Checker
No ratings yet
Even or Odd Checker
4 pages
1 s2.0 S135983680100052X Main
No ratings yet
1 s2.0 S135983680100052X Main
12 pages
15 122 hw2
No ratings yet
15 122 hw2
10 pages
CNC Lab Manual - 250423 - 143928PDF - 250423 - 144039
No ratings yet
CNC Lab Manual - 250423 - 143928PDF - 250423 - 144039
5 pages
Unsupervised Domain Adaptation by Backpropagation
No ratings yet
Unsupervised Domain Adaptation by Backpropagation
11 pages
Heinemann Maths Zone 9 - Chapter 3
No ratings yet
Heinemann Maths Zone 9 - Chapter 3
38 pages
Sophisticated Imitation in Cyclic Games: Josef Hofbauer and Karl H. Schlag
No ratings yet
Sophisticated Imitation in Cyclic Games: Josef Hofbauer and Karl H. Schlag
21 pages
Linear and Planar Variable Reluctance Motors For Flexible Manufacturing Cells
No ratings yet
Linear and Planar Variable Reluctance Motors For Flexible Manufacturing Cells
4 pages
Data Mentah Mix
No ratings yet
Data Mentah Mix
5 pages
3 D Coordinate Transformations 1
No ratings yet
3 D Coordinate Transformations 1
16 pages
Particulate
No ratings yet
Particulate
6 pages
B.tech CSE IV Elective Prefrences
No ratings yet
B.tech CSE IV Elective Prefrences
4 pages
Payback NPV Irr Arr
No ratings yet
Payback NPV Irr Arr
7 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet

Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023

Uploaded by

Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023

Uploaded by

Dr.

Lázaro Bustio Martínez

Agenda • Identificar los elementos de estadística

y Data • What can the data tell us? (in

• Descriptive Statistics • Inferential Statistics

CONFIRMATORY DATA ANALYSIS: VERIFY GET CONCLUSIONS AND PRESENT YOUR

*Seltman, H.J. (2015). Experimental Design and Analysis. http://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf

• Other common types of multivariate graphics include:

• Every measurement has 2 parts:

Subject condition score

Subject before during after

Indices of central tendency:

• Mean – the average value

• Variance – the spread around the mean

Before Before During During After - After –

subject subject subject

• With enough measurements, most variables are distributed normally.

• Variance, as calculated earlier, is arbitrary.

• The process of standardizing deviations goes like this:

1. Score (in the units that are meaningful)

• Third, you can use SDs to create standardized scores

data • Best way to examine

7 2.8 9.1 3.4

9 3.1 9.5 4.1

7 2.8 9.1 3.4

• First, try to explain it.

before during after followup before during after followup

1. Identification of variables and data types

• Desarrollar el ejemplo que se describe en el artículo “Exploratory data analysis in Python”,

You might also like