0% found this document useful (0 votes)
17 views72 pages

1-Inroduction Statistics and Queuing Theory

The document provides an overview of statistics, emphasizing its importance in data collection, analysis, and decision-making. It distinguishes between descriptive and inferential statistics, detailing their applications across various fields such as business, health, and social sciences. Additionally, it covers concepts related to data types, variables, sampling methods, and the significance of representative samples in research.

Uploaded by

eurokhan0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views72 pages

1-Inroduction Statistics and Queuing Theory

The document provides an overview of statistics, emphasizing its importance in data collection, analysis, and decision-making. It distinguishes between descriptive and inferential statistics, detailing their applications across various fields such as business, health, and social sciences. Additionally, it covers concepts related to data types, variables, sampling methods, and the significance of representative samples in research.

Uploaded by

eurokhan0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Statistics &

Queuing Theory
Course No: MAT0541202

Topic 1: Introduction

Tariq Bin Amir


Statistics and its importance
 Statistics is the study of the collection, organization,
summarization, analysis, interpretation, and presentation
of data as well as drawing valid conclusions and making
reasonable decisions on basis of such analysis

 A general process of investigation consists of the steps:


1. Research question identification.
2. Data collection on the topic.
3. Data analysis.
4. Derive a conclusion based on the analysis of data.
Statistics is the study of how best to collect data (related to
step 2), analyze data (related to step 3), and draw
conclusions from data (related to step 4).
Statistics and its importance

1. Collecting Data Data Why?


e.g., Survey Analysis
2. Presenting Data
e.g., Charts & Tables
Decision-
3. Characterizing Data
Making
e.g., Average

© 1984-1994 T/Maker Co.


Statistics and its importance
Two areas of statistics:
1. Descriptive statistics
 Collecting, presenting, and describing data
 Involves $
• Collecting Data 50
• Presenting Data 25
• Characterizing Data
0
 Purpose Q1 Q2 Q3 Q4
• Describe Data
X = 30.5 S2 = 113
Statistics and its importance
Two areas of statistics (cont.):
2. Inferential statistics
 Drawing conclusions and/or making decisions concerning a
population based only on sample data
 It allows to make predictions (“inferences”) from data
 Two main areas of inferential statistics
• Estimating parameters: Taking a statistic from sample data
and using it to say something about a population parameter
• Hypothesis Testing: This is where you can use sample data to
answer research questions. For example, you might be interested
in knowing if a new cancer drug is effective.
 Purpose
• Make decisions about population characteristics
Statistics and its importance
Area we use statistics:
 Business and Industry
 Statistics to Start or start a Business
 Statistics to manufacturing
 Statistics to marketing
 Statistics to Engineering
 Statistical Computing
 Health and Medicine
 Learning
 Statistic for teachers
 Result
Statistics and its importance
Area we use statistics (cont.):
 Research
 Social Statistics
 Child-bearing, Child and elderly populations , Population
 health, nutrition and educational level in country.
 to identify the strength of working people.
 to planning the future
 Housing , Human settlements
 identify problems in housing planning.
 to settle the problems in slums
Statistics and its importance
Area we use statistics (cont.):
 Social Statistics (cont.)
 Education, Literacy
 study about the currant education system in country.
 develop the subject planning.
 future employment planning
 Health
 to provide health facilities
 Income and economic activity , Unemployment
 to understand about savings and investment
 introduce future investing systems
 Natural Resources
Statistical data
 Data are the facts and figures collected, summarized,
analyzed, and interpreted.
 Example: IBM’s sales revenue is $100; stock price
$80
 The data collected in a particular study are referred to
as the data set.
 Example: The sales revenue and stock price data
for a number of firms including IBM, Dell, Apple, etc
Statistical data
 The elements are the entities on which data are collected.
 Example: IBM, Dell, Apple, etc. in the previous setting
 A variable is a characteristic about each individual
elements.
 Example: Sales revenue, stock price (of a company)
 The set of measurements collected for a particular
element is called an observation.
 Example: Sales revenue, stock price for 2003
 The total number of data values in a data set is the
number of elements multiplied by the number of variables.
Statistical data

Variables
Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram AMEX 73.10 0.86


EnergySouth OTC 74.00 1.67
Keystone NYSE 365.70 0.86
LandCare NYSE 111.40 0.33
Psychemedics AMEX 17.60 0.13

Data Set
Statistical data
Types of Data:
 Quantitative data are measurements that are recorded
on a naturally occurring numerical scale.
 Qualitative data are measurements that cannot be
measured on a natural numerical scale; they can only be
classified into one of a group of categories.
Statistical data
Types of Data (cont.):
 Quantitative data:
 Measured on a numeric scale.
 Number of defective items in a lot.
 Salaries of CEOs of oil companies.
 Ages of employees at a company.

12 4
943
52
120 21 8
71 3
Statistical data
Types of Data (cont.):
 Qualitative data:
 Classified into categories.
College major of each student in a class.
 Gender of each employee at a company.
 Method of payment (cash, check, credit card).

$ Credit
Variable and Constants
Variable:
 A characteristic about each individual element of a
population or sample.
 Vary from person to person,object to object
 Example: A college dean is interested in learning
about the average age of faculty. Identify the basic
terms in this situation.
The population is the age of all faculty members at the college. A
sample is any subset of that population. For example, 10
faculty members.
The variable is the “age” of each faculty member.
One data would be the age of a specific faculty member.
Variable and Constants
Variable Types:

Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Variable and Constants
Variable Types (cont.):
 Numerical/Quantitative variable:
 A variable that quantifies an element of a population.
 accepts numerical values.
 Arithmetic operations like addition, subtraction, average,
etc are meaningful.
 Qualitative, or Attribute, or Categorical Variable:
 A variable that categorizes or describes an element of a
population.
 no arithmetic operation
 categories may be represented by numbers Like, fmale = 0,
female = 1.
Variable and Constants
Variable Types (cont.):
 Example:
 The residence hall for each student in a statistics class.
(Categorical)
 The amount of gasoline pumped by the next 10
customers at the local Unimart. (Numerical)
 The amount of radon in the basement of each of 25
homes in a new development. (Numerical)
 The color of the baseball cap worn by each of 20
students. (Categorical)
 The length of time to complete a mathematics homework
assignment. (Numerical)
Variable and Constants
Numerical/Quantitative Variable Types:
 Discrete Variable:
 possible to count/enumerate all possible values e.g.
number of rooms in a house.
 in general, countable data is an example of discrete
variable e.g. population in each division in Bangladesh.
 non-negative whole numbers.
 Continuous Variable:
 A quantitative variable that can assume an uncountable
number of values
 are usually associated with measurements e.g. height.
 can accept any number of infinite values within a given range.
Variable and Constants
Qualitative, or Attribute, or Categorical variable
Types:
 Nominal Variable:
 categories do not have an inherent ordering.
 e.g. do you prefer to write early in the morning or before
going to bed at night? The answers can be {morning,
night}.
 Ordinal Variable:
 categories have an inherent ordering.
 e.g. how satisfied you are with a customer service? The
answers can be {very satisfied, satisfied, neutral,
unsatisfied}.
Data
Level/scale of Measurement
 The scale determines the amount of information
contained in the data.
 To measure a variable there are various ways. We
classify the ways into four scales of measurement
 Nominal scale
 Ordinal scale
 Interval scale
 Ratio scale
Data
Level/scale of Measurement (cont.)

Nominal scale – numbers are assigned to the


categories or variable values for identification only.
We can set numerical values for the names but can
not order them meaningfully. Examples of nominal
variables:

-Where a person lives in the U.S. (Northeast, South,


Midwest, etc.)
-Gender (Male, Female)
-Nationality (American, Mexican, French)
-Race/ethnicity (African American, Hispanic, White,
Asian American)
Data
Level/scale of Measurement (cont.)

Ordinal scale– numbers are assigned to the


categories or variable values for identification as
well as ranking. The magnitude is used only for
comparison and not for any mathematical operation

Examples of ordinal variables:


-Agreement (strongly disagree, disagree, neutral,
agree, strongly agree)
-Rating (excellent, good, fair, poor)
-Frequency (always, often, sometimes, never)
Data
Level/scale of Measurement (cont.)
 Interval scale-numbers are arranged to the variable
values in such a way that the level of measurement is
broken down on a scale of equal units and the zero
value on the scale is not absolute zero
 The difference between a temperature of 100 degrees and
90degrees is the same difference as between 90 degrees and
80 degrees.

 IQ score
Data
Level/scale of Measurement (cont.)
 Ratio scale data have all the properties of interval
data and the ratio of two values is meaningful and also
has a clear definition of 0.0.
Ratio scale has absolute zero
 Variables like height, weight, enzyme activity are ratio
variables.

 Example: Econ & Finance majors salaries are 1.24 times


History major salaries and are 1.46 times Psychology
major salaries
Comparison

 Ratios are meaningful only on an absolute scale, where 0 is


meaningful.

 Ratios also depends on the starting point of the scale


Interval Scales vs. Ratio Scales
 Temperature, expressed in F or C, is not a ratio variable. A
temperature of 0.0 on either of those scales does not mean '
no heat'.

 However, temperature in Kelvin is a ratio variable, as 0.0


Kelvin really does mean 'no heat'.

 Another counter example is pH. It is not a ratio variable, as


pH=0 just means 1 molar of H+. and the definition of molar is
fairly arbitrary. A pH of 0.0 does not mean 'no acidity' (quite
the opposite!).
Interval Scales vs. Ratio Scales
 When working with ratio variables, but not interval
variables, you can look at the ratio of two measurements.

 A weight of 4 grams is twice a weight of 2 grams, because


weight is a ratio variable. A temperature of 100 degrees C is
not twice as hot as 50 degrees C, because temperature C is
not a ratio variable.

 A pH of 3 is not twice as acidic as a pH of 6, because pH is


not a ratio variable.
Population
 A Population is the collection or set of all items or
individuals of interest
Or
 Population is the entire set of individuals or objects
having some common characteristics selected for a
research study
 Examples: All likely voters in the next election
All patients admitted to ICU
All tax receipts over this year
 Two kinds of populations: finite or infinite.
Sample
 A Sample is a subset of the population
Or
 Sample is the selected group of people or elements
from which data are collected for a study
 Sample-
 It is a unit that is selected from population
 Represents the whole population
 Purpose to draw the inference
 Examples: 1000 voters selected at random for interview
A few patients selected of heart disease
Random receipts selected for audit
 Sampling Frame – Listing of population from which a
sample is chosen
Population vs. Sample

Population Sample

a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
Population vs. Sample
Why Sample?

 Less time consuming than a census


 Less costly to administer than a census

 It is possible to obtain statistical results of a


sufficiently high precision based on samples.

Example:
-Testing a new vaccine on entire population would be risky.
Instead trials are conducted on a sample group
-The daily users of a website keep changing, so web analysts
take a sample from a specific period
Sample
 3 factors that influence sample representativeness
 Sampling procedure
 Sample size
 Participation (response)
 When might you sample the entire population?
 When your population is very small
 When you have extensive resources
 When you don’t expect a very high response
Sample
 What is Good Sample?
The sample must be:
1. representative of the population;
2. appropriately sized (the larger the better);
3. unbiased;
4. random (selections occur by chance);
Sample

 Sampling Frame – Listing of population from which a


sample is chosen

 A representative sample exhibits characteristics


typical of those possessed by the population of
interest.

 A random sample of n experimental units is a


sample selected from the population in such a way
that every different sample of size n has an equal
chance of selection.
Sample
Random sampling vs. selective sampling
 Sampling randomly helps resolve the bias problem.
 Each case in the population has an equal chance of being
included.
 The most basic random sample is called a simple random
sample and which is equivalent to using a raffle to select
cases.
Non response bias
 Even if I sample randomly for e.g. surveys, non-response
may be high.
 If only 30% of randomly chosen people provide a
response, it is unclear whether the results are
representative of the entire population.
Sample

Simple Random sample:


 Every sample of size n has an equal chance of
selection.
Simple Random Samples
 Applicable when population is small, homogeneous &
readily available
 Every object in the population has an equal chance/
probability of being selected
 Objects are selected independently
 Samples can be obtained from a table of random
numbers or computer random number generators

 A table of random number or lottery system is used to


determine which units are to be selected.
Simple Random Samples
 Advantages:
 Minimal knowledge of population needed
 Easy to analyze data
 Disadvantages:
 Low frequency of use
 Does not use researchers’ expertise
 Larger risk of random error
 If sampling frame large, this method impracticable
 Minority subgroups of interest in population may not
be present in sample in sufficient numbers for study
Convenience Sampling
 A type of nonprobability sampling which involves the sample
being drawn from that part of the population which is close to
hand. That is, readily available and convenient.
 The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample
because it would not be representative enough.
 For example, if the interviewer was to conduct a survey at
a shopping center early in the morning on a given day, the
people that he/she could interview would be limited to those
given there at that given time, which would not represent the
views of other members of society in such an area, if the
survey was to be conducted at different times of day and
several times per week.
Convenience Sampling
Use results that are easy to get
Data Collection
Obtaining Data:
 Data from a published source
 book, journal, newspaper, Web site
 Data from a designed experiment
 researcher exerts strict control over units
 Data from a survey
 a group of people are surveyed and their responses are
recorded

 Data collected observationally


 units are observed in natural setting and variables of interest
are recorded
Data Collection
Source of Data:
 Primary Sources: The data collector is the one using
the data for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data

 Secondary Sources: The person performing data


analysis is not the data collector.
 Analyzing census data
 Examining data from print journals or data published on the
internet
Data Presentation
Data Presentation
Tabulation:
 Tables are the devices, that are used to present the data in
a simple form. It is probably the first step before the data is
used for analysis or interpretation.
 General principals of designing tables
 The tables should be numbered e.g table 1, table 2 etc.
 A title must be given to each table.
 The headings of columns or rows should be clear and concise.
 The data must be presented according to size or importance
chronologically, alphabetically, or geographically.
 If percentages or averages are to be compared, they should
be placed as close as possible.
 No table should be too large
Data Presentation
Tabulation (cont.):
 Most of the people find a vertical arrangement better than a
horizontal one because, it is easier to scan the data from top to
bottom than from left to right
 Foot notes may be given, where necessary, providing
explanatory notes or additional information.
 Types of tables
 Simple tables: Measurements of single set are presented
 Complex tables: Measurements of multiple sets are
presented
Data Presentation
 Simple tables
 When characteristics with values are presented in the
form of table, it is known as simple table
 Example:
Table 1
Infant mortality rate of selected countries in 2004
Data Presentation
 Complex tables
Frequency Distribution Table
 In the frequency distribution table, the data is first split
up into convenient groups (class interval) and the
number of items (frequency) which occur in each group
is shown in adjacent columns.
 Hence it is a table showing the frequency with which the
values are distributed in different groups or classes with
some defined characteristics.
Frequency Distribution Table
Rules for construction of frequency table
 The class interval should not be too large or too small
 The number of classes to be formed more than 8 and
less than 15
 The class interval should be equal and uniform through
out the classification.
 After construction of table, proper and clear heading
should be given to it
 The base or source of data should be mentioned with
the pattern of analysis in footnote at the end of table
Frequency Distribution Table
 Example:
Table 2
Age distribution of polio patients
Frequency Distribution Table
 Example:
Graphical Presentation of Data
Charts and Diagrams
 Charts and diagrams are useful methods of presenting
simple data.
 They have powerful impact on imagination of people.
 Gives information at a glance.
 Diagrams are better retained in memory than statistical table.
 However graphs cannot be substituted for statistical table,
because the graphs cannot have mathematical treatment where
as tables can be treated mathematically.
 Whenever graphs are compared , the difference in the scale
 should be noted.
 A lot of details and accuracy of original data is lost in charts and
diagrams.
Graphical Presentation of Data
Common diagrams
 Pie chart
 Simple bar diagram
 Multiple bar diagram
 Component bar diagram or subdivided bar diagram
 Histogram
 Frequency polygon
 Frequency curve
 Scatter diagram
 Statistical maps
Graphical Presentation of Data
Histogram
 A way to visualize the distribution of a numerical
variable.
 Used for Quantitative, Continuous Variables.
 Data are "binned" into intervals and heights of the bars
represent the number of cases that fall into each
interval.
 Provides a view of data density.
 Higher bars mean data are relatively common there
e.g. majority of the countries have average life
expectancy between 65-85 years.
Graphical Presentation of Data
Histogram (cont.)

life expectancy histogram


left skewed (long left tail).
uni-modal

income histogram
right skewed (long right tail).
uni-modal.
Graphical Presentation of Data
Histogram (cont.)

 distributions are said to be skewed to the side of the long tail.


 if no skewness is apparent, the distribution is said to be symmetric.
Graphical Presentation of Data
Histogram (cont.)
 Modality
 like skewness, modality is also related to the shape of a histogram.
 refers to the number of prominent peaks.
 You may hear phrase like “number of pronounced modes" as well.

uni-modal bi-modal Multimodal Uniform


one prominent peak two prominent peaks More than two no prominent peak
prominent peaks
Graphical Presentation of Data
Histogram (cont.)
 Bin width
 the chosen bin width can alter the story that the histogram is telling.
 too large bin width → we may lose interesting details.
 too narrow bin width → difficult to get the overall picture of the distribution.
 ideal bin width depends on the data I am working with,

too wide too narrow “just" right.


Graphical Presentation of Data
Frequency polygon
 Frequency polygon is an area diagram of frequency
distribution over a histogram.
 It is a linear representation of a frequency table and
histogram, obtained by joining the mid points of the
histogram blocks.
 Frequency is plotted at the central point of a group
Graphical Presentation of Data
Normal frequency distribution curve (cont.)
 Example
Graphical Presentation of Data
Scatterplot
 is used to visualize the relationship between two
numerical variables.
 the variable (explanatory) we suspect might be affecting
the other variable (response) is placed on the X axis and
the response variable is on the Y axis.
 to understand the relationship between the two variables,
we need to visualize a line or a curve going through the
points.
Graphical Presentation of Data
Scatterplot (cont.)
 Example:
Graphical Presentation of Data
Scatterplot (cont.)

strong: little scatter


weak: lots of scatter

can be individual
or a group of
cases. Make
sure they are not
data entry errors.
Graphical Presentation of Data
Dot plot
 is a one variable scatter plot.
 useful if you want to investigate each variable
separately.

stacked version
Graphical Presentation of Data
Bar charts
 The data presented is categorical
 Data is presented in the form of rectangular bar of
equal breadth.
 Each bar represent one variant /attribute.
 Suitable scale should be indicated and scale starts
from zero.
 The width of the bar and the gaps between the bars
should be equal throughout.
 The length of the bar is proportional to the
magnitude/frequency of the variable.
 The bars may be vertical or horizontal.
Graphical Presentation of Data
Bar charts (cont.)
Graphical Presentation of Data
Multiple Bar charts
 Also called compound bar charts
 More then one sub-attribute of variable can be
expressed
Graphical Presentation of Data
Pie Charts
 Most common way of presenting data
 The value of each category is divided by the total
values and then multiplied by 360 and then each
category is allocated the respective angle to present
the proportion it has.
 It is often necessary to indicate percentages in the
segment as it may not be sometimes very easy
virtually, to compare the areas of segments.
Graphical Presentation of Data
Pie Charts (cont.)
 Example
Graphical Presentation of Data
Map Diagram
 When statistical data refers to geographic or administrative
areas, it is presented either as statistical map or dot map.
 useful for highlighting the spatial distribution.

life expectancy and income are lower in Africa but both are higher in Europe.

You might also like