0% found this document useful (0 votes)

32 views31 pages

Foundations or Research Analysis

This document defines data and provides an overview of how to classify and analyze different types of data. It discusses: 1) Two broad classifications of data based on source: primary data collected directly and secondary data collected from other sources. 2) Statistical classifications of categorical and measurement data, and how each is measured. 3) Scaling theory classifications of nominal, ordinal, interval, and ratio data based on the type of information and mathematical operations they allow. 4) Descriptive statistics measures used to analyze data, including measures of central tendency (mean, median, mode), dispersion (range, quartile deviation, mean absolute deviation, standard deviation), and skewness.

Uploaded by

Dr. Arunava Mookherjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views31 pages

Foundations or Research Analysis

Uploaded by

Dr. Arunava Mookherjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

What is Data?

• Observations of a set of variables

• Lowest level of abstraction from which information is derived

• Each Discipline has evolved it’s own method of classification of data

• Two Broad Classification of Data Based on Source

– Primary Data:
• Data Collected from Primary Source
– Secondary Data:
• Data Collected From Secondary Source

1
Classification :: Statistics
• Categorical Data
– The Objects are grouped into categories based on some Qualitative Trait
– The resultant data are merely labels or categories
– Example:
• Hair Color: Brown / Black / Red
• Smoking Status: Favor / Neutral / Against
• Measurement Data
– The Objects are “measured” on some Quantitative Trait
– The resultant data is a set of numbers
– Example:
• Age of the Students
• JEMAT Score
• Number of Students Not Attending Class

2
Categorical Data
• Nominal Data
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female : 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe

3
Measurement Data
• Discrete Data
– Only Certain Values are Possible
– There are gaps between the possible value
– Are generated through the process of Counting
– Example:
• Number of students in the class
• Number of Employees Absent from Work
• Continuous Data
– Any value within an interval is possible with a suitable measuring device
– Theoretically, the number can be accurate to any desired number of
decimal places
– Are generated through the process of Measurement
– Example:
• Height in cm
• Time to complete the assignment

4
Classification :: Scaling Theory
• Nominal Data ORDER DISTANCE ORIGIN
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female : 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe

ORDER DISTANCE ORIGIN

5
Classification :: Scaling Theory
• Interval Data ORDER DISTANCE ORIGIN
– Quantitative Data but does not has any real zero point
– Allows comparison within the scale but cannot compare outside the scale
– Used in Social Research, but most researcher not clear about Interval
scale
– Example:
• Definitely Will Buy / Probably Will Buy / May or May not Buy / Probably Will not
Buy / Definitely Will not Buy
• Ratio Data
– Quantitative Data but has real zero point
– Allows conversion and preservation on the magnitude in another scale
– Example:
• Distance in Kms

ORDER DISTANCE ORIGIN

6
Why understand Data?
• The type of Analysis depends on the Type of data you
have collected
• General Guideline is a follows:

– Nominal Data Mode, Chi-Square

– Ordinal Data + Median / Percentiles

– Interval Data + Mean / SD / Correlation / Regression /

ANOVA

– Ratio Scale + Geometric Mean / Harmonic Mean /

Coefficient of Variation /
Logarithms

7
Some Points to Remember
• Tend to use Interval Scales
• Data need not be comparable with other studies
• Data has to make sense in your context
• Students fail to understand the importance of Data
– Wrong Approach
• “Data Collect Kore Niyechi… Ebar Ki Kori”
– Right Approach
• “Amar Ki Data Dorkar? Kano Daokar? Kothay Pabo? Kibhabe
Analyse Kore Uttor Pabo”

8
Descriptive Statistics
:: A Quick Review

9
Measures of Central Tendency
• Central tendency is “loosely” defined as the concept of
location of the center of a distribution of data
• Three basic measures
– Arithmetic Mean
– Median
– Mode

10
Arithmetic Mean
• Advantages:
– Easy to Compute
– Affected by every value in the set of observations
– Defined by rigid mathematical formulation
– It is relatively reliable
– It represents the “center of gravity” of the data
• Disadvantages:
– Unduly affected by small and / or large values
– Cannot be calculated for data with open ended class
– Is a good measure only when the distribution is fairly symmetric

11
Median
• Advantages
– Refers to the “Middle Value” of the distribution
– It is a “positional measure”
– Useful in case of open ended class
– Not seriously affected by Extreme Values
– Most appropriate for dealing with Qualitative Rank Data
– Has a series of related positional measures like Quartiles, Deciles,
Percentiles
• Disadvantages:
– It does not take every value into consideration
– It is not capable of algebraic treatment
– It is erratic if the number of items are smalle

12
Mode
• Advantages:
– It is the most typical or representative value of a distribution
– Not unduly affected by extreme values
– It can be used to describe qualitative phenomenon
• Disadvantages:
– Mode may not be there in a distribution or may be present more
than once in a distribution
– Not capable of algebraic treatment
– It is not rigidly defined for calculation

13
Relation Between the 3 Measures
• In moderately skewed distribution:
Mode = 3 Median – 2 Mean

14
Measures of Dispersion
• Dispersion is defined as the degree to which data tends to
spread about a central value
• Four Absolute & Relative Measures
– Range Coefficient of Range
– Quartile Deviation Coefficient of Quartile Deviation
– Mean Absolute Deviation Coefficient of MAD
– Standard Deviation Coefficient of Variation

• Range and QD are positional measures of dispersion

• AD and SD are calculation measures of dispersion

15
Range
• Range

• Advantages
– Simplest to understand and compute
• Disadvantages:
– Not based on each and every item in the data
– Does not take into account the shape of distribution
– Cannot be computed in case of open ended classes

16
Quartile Deviation
• Inter Quartile Range (IQR)

• Quartile Deviation (Semi IQR)

• Coefficient of QD

17
Quartile Deviation
• Advantages:
– Can measure variation in open ended distributions
– It is extremely useful in case of erratic or badly skewed data
– It is not affected by extreme values
• Disadvantages:
– Ignores 50% of the data
– Is not capable of mathematical manipulation
– Is not considered as a measure of dispersion:
• Effectively shows the distance between two positional points

18
Mean Absolute Deviation
• Mean Absolute Deviation (MAD) defined as:

• Coefficient of MAD defined as:

= MAD / Median or MAD / Mean
• Advantages:
– Simple to understand and compute
– Based on each and every item in the data
– Less affected by extreme values than other measured
• Disadvantage:
– It is not capable of mathematical treatment

19
Standard Deviation
• Defined as “Root Mean Squared Deviation from Mean”

• Coefficient of Variation

20
Standard Deviation
• Advantages:
– Best Measure of Dispersion
– Possible to calculate the combined standard deviation of two or
more groups
– Chebycheff’s Theorem (1821-1894)
• What so ever be the distribution at least 75% of the values will fall
within +/- 2 sd from the mean of the distribution and at least 89% will
fall within +/- 3 sd from the mean of the distribution
– Has relation with other measures:
• QD = 0.667 SD
• MD = 0.80 SD

21
Skewness
• Refers to the asymmetry in the shape of the distribution

• Important to test skewness in data analysis as skewed

data suggest that the assumption of normality is violated

22
Skewness - Measures
• Karl Pearson’s Measure of Skewness:
Mean – Mode OR
3(Mean – Median)
Standard Deviation Standard Deviation
- Skewness coefficient > 0 is positively skewed
- Skewness coefficient < 0 is negatively skewed
- Skewness coefficient = 0 is symmetrical

• Bowley’s Measure

• Moments Measure

23
Kurtosis
• Kurtosis means “Bulginess”
• Refers to the degree of flatness or peaked-ness in the
region about the mode of the distribution:
– Lepto-Kurtic : If the curve is more peaked than Normal Curve
– Meso-Kurtic : If the curve is the same as the Normal Curve
– Platy-Kurtic : If the curve is less peaked than Normal Curve

• Presence of Kurtosis does not violate normality

• Important to check Kurtosis because it shows the
distribution of data around the mode

24
KURTOSIS - Measures

• Kurtosis

Excess Kurtosis
Kurtosis

25
Interpretation
• A normal distribution has kurtosis exactly 3 (excess
kurtosis exactly 0). Any distribution with kurtosis ≈3
(excess ≈0) is called mesokurtic.
• A distribution with kurtosis <3 (excess kurtosis <0) is
called platykurtic. Compared to a normal distribution, its
central peak is lower and broader, and its tails are shorter
and thinner.
• A distribution with kurtosis >3 (excess kurtosis >0) is
called leptokurtic. Compared to a normal distribution, its
central peak is higher and sharper, and its tails are longer
and fatter.
Kurtosis: Leptokurtic
Kurtosis: Mesokurtic
Kurtosis: Platykurtic
Uses of Skewness and Kurtosis
• Most stock prices and asset returns are positive or
negative skew. Skewed data can be used to determine
whether a given or future data point can be more or less
than the mean. Basically related to asymmetries (or
risks) in information. Higher risks lead to higher
returns
• Kurtosis is used to describe volatility around the mean.
For example, if past data yields leptokurtic distribution,
the stock will have a relatively low amount of variance.
This further implies the return values are close to the
mean hence less volatile. Platykurtic distribution
expect more volatilty (or losses ) in the future.
What is Descriptive Statistics?
• The following Needs to Be Reported:
– Arithmetic Mean
– Median
– Mode
– Standard Deviation
– Variance
– Kurtosis
– Skewness
– Range
– Minimum
– Maximum
– Sum
– Count

Safari
No ratings yet
Safari
385 pages
Datascience With Python
100% (1)
Datascience With Python
110 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Data Management ( 1) (1)_compressed (1) (3)
No ratings yet
Data Management ( 1) (1)_compressed (1) (3)
46 pages
Stats
No ratings yet
Stats
109 pages
Marketing Project - Dettol
100% (5)
Marketing Project - Dettol
39 pages
Ch01_ICS422_04
No ratings yet
Ch01_ICS422_04
84 pages
TDA1
No ratings yet
TDA1
57 pages
Data Analysis and Statistical Treatment
No ratings yet
Data Analysis and Statistical Treatment
99 pages
Module-6-Assignment-2
No ratings yet
Module-6-Assignment-2
13 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
Project Management Methodology-Batch - 17082020-7AM
No ratings yet
Project Management Methodology-Batch - 17082020-7AM
81 pages
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
72 pages
Descriptive Statistics (1)
No ratings yet
Descriptive Statistics (1)
63 pages
RM-EBBA-class-8-CH0-11-Quatitative-analysis
No ratings yet
RM-EBBA-class-8-CH0-11-Quatitative-analysis
37 pages
IS5740 W02
No ratings yet
IS5740 W02
37 pages
Last_minute_statistics_Revision_sscjsosi_Abhishek
No ratings yet
Last_minute_statistics_Revision_sscjsosi_Abhishek
31 pages
STATISTICS
No ratings yet
STATISTICS
98 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Analytics compendium (incl stats)
No ratings yet
Analytics compendium (incl stats)
31 pages
Stat Distributions
No ratings yet
Stat Distributions
24 pages
1.9 Data and data analysis
No ratings yet
1.9 Data and data analysis
31 pages
What Statistical Analysis Should I Use?: Sunday, June 4, 2017 04:22 AM
No ratings yet
What Statistical Analysis Should I Use?: Sunday, June 4, 2017 04:22 AM
364 pages
BUSINESS AND STATISTICS
No ratings yet
BUSINESS AND STATISTICS
29 pages
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
No ratings yet
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
24 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
Measure of Skewness
No ratings yet
Measure of Skewness
121 pages
Module 8
No ratings yet
Module 8
28 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
Chapter Seven: Project Implementation Tools
No ratings yet
Chapter Seven: Project Implementation Tools
78 pages
Data Analysis
No ratings yet
Data Analysis
30 pages
Statistics - Reviewer
No ratings yet
Statistics - Reviewer
12 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Phys 1011lab
71% (7)
Phys 1011lab
50 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
HSMC-501: Introduction To Industrial Management Lec 1 & 2
No ratings yet
HSMC-501: Introduction To Industrial Management Lec 1 & 2
23 pages
Qualitative Research in Management
No ratings yet
Qualitative Research in Management
23 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Statistics
No ratings yet
Statistics
30 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
Module 2 - Probability Concepts and Applications
No ratings yet
Module 2 - Probability Concepts and Applications
67 pages
Handout-A-Preliminaries (Advance Statistics)
No ratings yet
Handout-A-Preliminaries (Advance Statistics)
29 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
24 pages
Z Score
No ratings yet
Z Score
7 pages
BA20 Session2 M
No ratings yet
BA20 Session2 M
40 pages
A View About The Determinants of Change in Share Prices A Case From Karachi Stock Exchange (Banking)
No ratings yet
A View About The Determinants of Change in Share Prices A Case From Karachi Stock Exchange (Banking)
17 pages
Introduction 2
No ratings yet
Introduction 2
42 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
BAA Class Notes
No ratings yet
BAA Class Notes
16 pages
8409 Statistics
No ratings yet
8409 Statistics
17 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Sta 32101 Questions-Descriptives
No ratings yet
Sta 32101 Questions-Descriptives
7 pages
Slide2 Tutorials - Sensitivity Analysis Tutorial
No ratings yet
Slide2 Tutorials - Sensitivity Analysis Tutorial
10 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Class 11 - Advertising
No ratings yet
Class 11 - Advertising
109 pages
Lognormal PDF
No ratings yet
Lognormal PDF
34 pages
History of Indian Advertis Ing: Class: 12
No ratings yet
History of Indian Advertis Ing: Class: 12
133 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Full File at Http://testbankshop - eu/OM-5-5th-Edition-Collier-Test-Bank
No ratings yet
Full File at Http://testbankshop - eu/OM-5-5th-Edition-Collier-Test-Bank
12 pages
Quantitative Methods
No ratings yet
Quantitative Methods
9 pages
Math 221 Week 1 Quiz
No ratings yet
Math 221 Week 1 Quiz
10 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Market Segmentation, Targeting, and Positioning
100% (1)
Market Segmentation, Targeting, and Positioning
31 pages
ISDS 361A - Cheat Sheet Exam 1.pdf
No ratings yet
ISDS 361A - Cheat Sheet Exam 1.pdf
2 pages
Chap 4 Research Method and Technical Writing
No ratings yet
Chap 4 Research Method and Technical Writing
33 pages
MM 302 - Digital & Social Media Marketing: Lectu Re: 7& 8
No ratings yet
MM 302 - Digital & Social Media Marketing: Lectu Re: 7& 8
31 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
MM 302 - Digital & Social Media Marketing: Lectur Es: 1& 2
No ratings yet
MM 302 - Digital & Social Media Marketing: Lectur Es: 1& 2
28 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
Settin G THE Object Ives: Class: 5
No ratings yet
Settin G THE Object Ives: Class: 5
21 pages
Consumer Behaviour Chap 1&2
No ratings yet
Consumer Behaviour Chap 1&2
41 pages
Marketing Mix: Product
No ratings yet
Marketing Mix: Product
18 pages
Beagle Hole 2017
No ratings yet
Beagle Hole 2017
5 pages
Succession Questions Worksheet
No ratings yet
Succession Questions Worksheet
2 pages
ge8 statistics
No ratings yet
ge8 statistics
2 pages
January 2012 QP - S1 Edexcel
No ratings yet
January 2012 QP - S1 Edexcel
13 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Formula Sheet (1) Descriptive Statistics: Quartiles (n+1) /4 (n+1) /2 (The Median) 3 (n+1) /4
No ratings yet
Formula Sheet (1) Descriptive Statistics: Quartiles (n+1) /4 (n+1) /2 (The Median) 3 (n+1) /4
13 pages
Chapter 5.pptx Risk and Return
No ratings yet
Chapter 5.pptx Risk and Return
25 pages
Chapter-3ni Kamote Chua
No ratings yet
Chapter-3ni Kamote Chua
29 pages
MM 302 - Digital & Social Media Marketing
No ratings yet
MM 302 - Digital & Social Media Marketing
12 pages
GRR Studies Diagrams
100% (1)
GRR Studies Diagrams
27 pages
National Institute of Technology, Durgapur: Assignment On Network Analysis Principles of Management
No ratings yet
National Institute of Technology, Durgapur: Assignment On Network Analysis Principles of Management
8 pages
Regression Analysis: Basic Statistics
No ratings yet
Regression Analysis: Basic Statistics
26 pages
Hatching Success Red Wattled Lapwings
No ratings yet
Hatching Success Red Wattled Lapwings
4 pages
Second Midterm: Part I - Multiple Choice Questions (3 Points Each)
No ratings yet
Second Midterm: Part I - Multiple Choice Questions (3 Points Each)
6 pages
MM 302 - Digital & Social Media Marketing: Lectu Re: 11
No ratings yet
MM 302 - Digital & Social Media Marketing: Lectu Re: 11
9 pages
Six Sigma Introduction
No ratings yet
Six Sigma Introduction
23 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
Branding Packaging
No ratings yet
Branding Packaging
8 pages
Measures of Central Tendency: Mean, Median and Mode
No ratings yet
Measures of Central Tendency: Mean, Median and Mode
26 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
2 Cumulative Effect of Tol
79% (14)
2 Cumulative Effect of Tol
30 pages
2016, Pedrosa Et Al, The Effectiveness of Comprehensive Voice, J Voice
No ratings yet
2016, Pedrosa Et Al, The Effectiveness of Comprehensive Voice, J Voice
9 pages
ISO 21501-4 Perspectiva Metrológica
No ratings yet
ISO 21501-4 Perspectiva Metrológica
8 pages
4 3 Measure of Dispersion
No ratings yet
4 3 Measure of Dispersion
8 pages
New Product Development
No ratings yet
New Product Development
17 pages
Educ 201
No ratings yet
Educ 201
2 pages
Understanding Data: Dr. Rohit Vishal Kumar
No ratings yet
Understanding Data: Dr. Rohit Vishal Kumar
25 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet

Foundations or Research Analysis

Uploaded by

Foundations or Research Analysis

Uploaded by

What is Data?

• Observations of a set of variables

• Each Discipline has evolved it’s own method of classification of data

• Two Broad Classification of Data Based on Source

ORDER DISTANCE ORIGIN

ORDER DISTANCE ORIGIN

– Nominal Data Mode, Chi-Square

– Ordinal Data + Median / Percentiles

– Interval Data + Mean / SD / Correlation / Regression /

– Ratio Scale + Geometric Mean / Harmonic Mean /

• Range and QD are positional measures of dispersion

• Quartile Deviation (Semi IQR)

• Coefficient of MAD defined as:

• Important to test skewness in data analysis as skewed

• Presence of Kurtosis does not violate normality

You might also like