0% found this document useful (0 votes)
16 views

Statistics Day 1a - Types of Data, Graphical Representation, Correlation, Data Modeling & Index Numbers

Uploaded by

Neeraj Nagaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Statistics Day 1a - Types of Data, Graphical Representation, Correlation, Data Modeling & Index Numbers

Uploaded by

Neeraj Nagaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Welcom

e to
Course
Busine
ss
Statisti
cs
Why Business Statistics?
Introduction to
Probability
& Statistics
Why should we care about Probability &
Statistics? - I
Why should we care about Probability &
Statistics?
Why should we care about Probability &
Statistics? - II
Why should we care about Probability &
Statistics? - III
What will you learn in this course?

Examples:
• If you flip a coin 100 times, what is the probability of
getting at most 10 heads?
• What is the probability of getting a four of a heart in a
deck of cards?
Today’s Lecture
• By the end of this session, the students
should be able to
• Define the data
• Know the different types of data
• Know the different ways to present the
data systematically using Diagrammatic
and Graphic Representation
Definition of Data

Any observation collected in respect of any


characteristic or event is called data.
Information
• Raw data carry/convey little meaning, when it is
considered alone.

• The data is minimized, processed/analyzed and then


presented systematically. So that it is converted into
Information.

• It is important to note that data, that is not converted


into information is of little value for evaluation and
planning and can not be used by those who are
involved in decision making.
Types of
Data
Two broad kinds of data are: qualitative data
and quantitative data

Data

Categorical Numerical
(Quantitative
(Qualitative )
)
Discrete Continuous
Discrete and Continuous
Data
• Numerical data could be either discrete or
continuous
• Continuous data can take any numerical value
(within a range); For example, weight, height,
etc.
• There can be an infinite number of possible
values in continuous data
• Discrete data can take only certain values by
a finite ‘jumps’, i.e., it ‘jumps’ from one value
to another but does not take any intermediate
value between them (For example, number of
students in the class)
Comparison of continuous and
discrete data
• Continuous data is more precise than discrete
• Continuous data is more informative than
discrete
• Continuous data can remove estimation and
rounding of measurements
• Continuous data is often more time
consuming to obtain
• Discrete should also be converted to
continuous data when possible as to obtain a
higher level of information and detail
Examples of conversion of discrete
to continuous data
Types of
•Data
Based on their mathematical properties, data
are divided into four groups: NOIR
 Nominal
 Ordinal
 Interval
 Ratio
• They are ordered with their increasing
 accuracy
 powerfulness of measurement
 preciseness
 wide application of statistical techniques
Types of
Data
Data Presentation
• Principals of data presentation
a) To arrange the data in such a way that it
should create interest in the reader’s mind at
the first sight.
b) To present the information in a compact and
concise form without losing important
details.
c) To present the data in a simple form so as to
draw the conclusion directly by viewing at
the data.
d) To present it in such away that it can help in
further statistical analysis.
Presentation of data

Tabula Graphical
r

Simple Complex table For quantitative data For qualitative data


table 1. Histogram 1. Bar chart
2. Frequency polygon 2. Pictogram
3. Frequency curve 3. Pie chart
4. Line chart
5. Scatter diagram
Tabulation
Tables are the devices, that are used to present the data in a simple form. It is
probably the first step before the data is used for analysis or interpretation.

General principals of designing tables


a) The tables should be numbered e.g table 1, table 2 etc.
b) A title must be given to each table, which should be brief and self
explanatory.
c) The headings of columns or rows should be clear and concise.
d) The data must be presented according to size or importance
chronologically, alphabetically, or geographically.
e) If percentages or averages are to be compared, they should be placed as
close as possible.
f) No table should be too large
g) Most of the people find a vertical arrangement better than a horizontal
one
because, it is easier to scan the data from top to bottom than from left to
right
h) Foot notes may be given, where necessary, providing explanatory notes or
additional information.

Types of tables
Simple Table
When characteristics with values are presented in the form of
table, it is known as simple table e.g

Table 4.4: Infant mortality rate of selected countries in 2004

Name of country Infant mortality rate


Pakistan 90
Bangladesh 60
Sri Lanka 26
India 60
Complex table - Frequency
distribution table

• In the frequency distribution table, the data


is first split up into convenient groups
(class interval) and the number of items
(frequency) which occur in each group is
shown in adjacent columns.
• Hence it is a table showing the frequency
with which the values are distributed in
different groups or classes with some
defined characteristics.
Rules for construction of
frequency table
1)The class interval should not be too large or too
small
2)The number of classes to be formed more than 8
and less than 15
3)The class interval should be equal and uniform
throughout the classification.
4)After construction of table, proper and clear
heading should be given to it
5)The base or source of data should be
mentioned
with the pattern of analysis in footnote at the end
of table
Frequency distribution
table
Table 3: Age distribution of polio patients
Age Number of Patients
0-4 35
5-9 18
10-14 11
15-19 8
20-24 6
Relative Frequency, Cumulative
Frequency table
Charts and Diagrams
Charts and diagrams are useful methods of presenting simple data.

• They have powerful impact on imagination of people.


• Gives information at a glance.
• Diagrams are better retained in memory than statistical table.
• However graphs cannot be substituted for statistical table,
because the graphs cannot have mathematical treatment
where as tables can be treated mathematically.
• Whenever graphs are compared, the difference in the scale
should be noted.
• It should be remembered that a lot of details and accuracy of
original data is lost in charts and diagrams, and if we want the
real study, we have to go back to the original data.
Common diagrams
• Pie chart
• Simple bar diagram
• Multiple bar diagram
• Histogram
• Frequency polygon
• Frequency curve
• Scatter diagram
• Line diagram
• Pictogram
Bar

charts
The data presented is categorical
• Data is presented in the form of rectangular bar of equal
breadth.
• Each bar represent one variant /attribute.
• Suitable scale should be indicated and scale starts from
zero.
• The width of the bar and the gaps between the bars
should be equal throughout.
• The length of the bar is proportional to the magnitude/
frequency of the variable.
• The bars may be vertical or horizontal.
Bar
charts
Year Wise Enrollment of students in Government school
300
300
260
230
250
200
200 160
150
150 120
100
100 70

50

0
One Two Three Four Five Six Eight Nin
Seven e
No. of Students
Multiple Bar Charts
• Also called compound bar charts
• More then one sub-attribute of variable can be
expressed
6
0
Population
5 Land
0
Percentageof World

4
0

3
0
Total

2
0 Asi Europe Africa Latin USSR North
a Oceania
1 America
0
America
Histogram
• Used for Quantitative Continuous
Variables
• It is used to present variables which have
no gaps e.g age, weight, height, blood
pressure, blood sugar etc.
• It consist of a series of blocks. The
class intervals are given along
horizontal axis and the frequency along
the vertical axis
Histogram
Frequency
• Frequency polygon
polygon is an area diagram of frequency
distribution over a histogram.
• It is a linear representation of a frequency table and histogram,
obtained by joining the mid points of the histogram blocks.
• Frequency is plotted at the central point of a group

percentage total frequency

250

200

150 percentage total


frequency
100

50

0
59-69 69-79 79-89 89-99 99- 109- 119- 129-
109 119 129 139
Line diagram
• Line diagrams are used to show the trend of events with the passage of
time.
Pie charts
• Most common way of presenting data
• The value of each category is divided by the total
values and then multiplied by 360 and then each
category is allocated the respective angle to
present the proportion it has.
• It is often necessary to indicate percentages in
the segment as it may not be sometimes very
easy virtually, to compare the areas of segments.
Pie Charts
Pictogram

• Popular of presenting data to those who cannot


method
understand orthodox charts.
• Small pictures or symbols are used to present the
data,e.gof a doctor
picture to represent the population physician.
• Fraction of the picture can be used to represent numbers
smaller than the value of whole symbol
Scatter diagram
• Scatter diagrams show the relationship between
the two variables e.g a positive correlation/
association between the intake of fat and sugar
in the average diets of 41 countries.
• If the dots cluster round a straight line, it shows
evidence of a relationship of a linear nature.
• If there is no such cluster, it is probable that
there is no relationship between the variables.
Scatter Graphs
Scatter graphs are used to show whether there is a relationship between two sets
of data. The relationship between the data can be described as either:

1. A positive correlation. As one quantity increases so does the other.

2. A negative correlation. As one quantity increases the other decreases.

3. No correlation. Both quantities vary with no clear relationship.

A negative
positive correlation
correlation isis characterised by a straight line
line with
with aa negative
positive gradient.
gradient.
Heigh

Sales
Soup

Shoe
Size
t

Shoe Temperatur Annual


Size e Income
Correlation
• State in each case whether you would expect to
find a positive/a negative/no correlation
1. Ages of husbands and wives
2. Shoe size & intelligence
3. Insurance companies' profit & the no. of
claims they have to pay,
4. Years of education & income
5. Amount of rainfall & yield of crop
Simple Correlation coefficient (r)
 The value of r ranges between ( -1) and ( +1)
 The value of r denotes the strength of the
association as illustrated by the following
diagram.

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
Exercise
Exercise in Excel

 Using CORREL function


 Using Correlation Matrix
Data Modeling
Data Modeling
• Data modeling is the analysis of data objects and
their relationships to other data objects. Data
modeling is often the first step in database design
and object-oriented programming as the designers
first create a conceptual model of how data items
relate to each other.

47
Data Modeling – Entities or Concepts,
Attributes & Relationships
• A simple example - An employee of a company gets paid on a
monthly basis for a particular job role they perform. The company
wants to find a way to capture all the information relating to the
employee, their salary and payment details. The company is in
need of a database to capture all this data in.

48
Data Modeling – Meta Data

• Finally, the last step when doing data modeling is


to define the meta data. Meta data is data that
describes data. Examples of meta data includes
usage information about your data. For
example, if a record in your newly created
database was added, the meta data would
describe when the record was added, by which
user and perhaps a flag to confirm it is free from
errors.

49
Index numbers
Index Number

 An index number is a statistical value that measures the


change in a variable with respect to time
 Two variables that are often considered in this analysis
are price and quantity
 With the aid of index numbers, the average price of
several articles in one year may be compared with the
average price of the same quantity of the same articles
in a number of different years
 There are several sources of ‘official’ statistics that
contain index numbers for quantities such as food prices,
clothing prices, housing, wages and so on
Simple index numbers
 We will examine index numbers that are constructed
from a single item only
 Such indexes are called simple index numbers
 Current period = the period for which you wish to find the
index number
 Base period = the period with which you wish to
compare prices in the current period
 The choice of the base period should be considered very
carefully
 The choice itself often depends on economic factors
1. It should be a ‘normal’ period with respect to the relevant index
2. It should not be chosen too far in the past
Simple Price Index
• To form a price index, one time period is chosen as
a base, and the price for every period is expressed
as a percentage of the base period price

Pi
Ii  100
Pbase

where
Ii = index number for year i
Pi = price for year i
Pbase = price for the base year
Index Numbers: Example
 Airplane ticket prices from 1995 to 2003:
Index
Year Price (base year
= 2000)
1995 272 85.0
1996 288 90.0 P 288
I1996  P1996 100  320(100)  90
2000
1997 295 92.2
1998 311 97.2 Base Year:
1999 322 100.6
2000 320 100.0 P 320
I2000  P2000 100  320(100)  100
2000
2001 348 108.8
2002 366 114.4
2003 384 120.0 P 384
I2003  P2003 100  320(100)  120
2000
Index Numbers: Interpretation

 Prices in 1996 were 90%


P1996 288
I1996  P 100  320(100)  90 of base year prices
2000

 Prices in 2000 were 100%


P 320 of base year prices (by
I2000  P2000 100  320(100)  100
2000
definition, since 2000 is
the base year)

 Prices in 2003 were 120%


P 384 of base year prices
I2003  P2003 100  320(100)  120
2000

You might also like