MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
com
MTH302 Short Notes for Final Term (Chapter 23-45)
STATISTICAL DATA:
TYPES OF CLASSIFICATION:
METHODS OF PRESENTATION:
• Text
”The majority of population of Punjab is located in rural areas.”
• Semi-tabular
Data in rows
• Tabular
Tables with rows and columns
• Graphic
Charts and graphs
TYPES OF GRAPHS:
• Column Graphs
• Line Graphs
• Circle Graphs (Sector Graphs)
• Conversion Graphs
• Travel Graphs
• Statistical Graphs
• Frequency Tables
For More Visit VUAnswer.com
• Histograms
• Frequency distributions
• Cumulative Distributions
LINE GRAPHS:
Line graphs are the most commonly used graphs. Here the data of one variable
(say Height) is plotted against data of the other variable (say Age).
MEAN:
The most common average is the mean. The mean is used for things like marks
and scores (e.g. sport), and is found by adding all the scores and dividing by the
number of scores.
Example:
Marks
58 69 73 67 76 88 91 and 74 (8 marks).
Sum = 596
Mean = 596/8 = 74.5
Please note that the mean is affected by extreme values.
MEDIAN:
Another typical value is the median. The median is the middle value when the
data are arranged in order.The median is easier to find than the mean, and unlike
the mean it is not affected by values that are unusually high or low
Example:
Data
3 6 11 14 19 19 21 24 31 (9 values)
The median is the middle score, or the mean of the two middle scores,
when the scores are placed in order. In the above data there are 9 values.
The middle value is 19.
When there is no middle value, the median is obtained by taking the average of
the two middle values.
MODE:
• Contingency Tables
• Side by Side Bar charts
Data: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46,
53, 58
First class: Lower limit is 10. Higher limit is 20. We read it as “10 but
under 20”. In reality a value greater than 19.5 will be treated as above 20.
Frequency: Looking through the data shows that there are three values
between 10 and 20. Hence frequency is 3. Similarly, frequency in other
intervals can be found as follows:
20 - 30 : 6
30 - 40 : 5
40 - 50 : 4
50 - 60 : 2
Total : 20
Relative frequency:
There are 3 observations in class interval 10 – 20. The relative frequency is 3/20 =
0.15.
Percentage Frequency:
Cumulative Frequency:
If we add frequency of the second interval to the frequency of the second interval,
then the cumulative frequency for the second interval is obtained. The cumulative
frequency of the last interval is 100% as all observations have been added.
For More Visit VUAnswer.com
GRAPHING NUMERICAL DATA: THE HISTOGRAM:
When frequency is plotted in the form of bars or columns for each class interval a
Histogram is obtained. (Average, Averagea)
• Arithmetic Mean
• Arithmetic Mean for Grouped Data
• Weighted Mean
• Median
• Median for Grouped Data
• Median for Discrete Data
• Graphic Location of Median
• Quintiles (Quartiles, Deciles, Percentiles)
• Quintiles from Grouped Data
• Quintiles from Discrete Data
• Graphic Location of Quintiles
• Mode
• Mode from Grouped Data
• Mode from Discrete Data
• Empirical Relation Between mean, Median and Mode
AVERAGE:
AVERAGEA:
Calculates the average (arithmetic mean) of the values in the list of arguments. In
addition to numbers, text and logical values such as TRUE and FALSE are
included in the calculation.
AVERAGEA(value1,value2,...)
MEDIAN:
The median is the number in the middle of a set of numbers; that is, half the
numbers have values that are greater than the median, and half have values that
are less.
MEDIAN(number1,number2,...)
MODE:
For More Visit VUAnswer.com
Returns the most frequently occurring, or repetitive, value in an array or range of
data. Like MEDIAN, MODE is a location measure.
MODE(number1,number2,...)
COUNT FUNCTION:
Counts the number of cells that contain numbers and also numbers within the list
of arguments. Use COUNT to get the number of entries in a number field that's in
a range or array of numbers.
COUNT(value1,value2,...)
FREQUENCY:
Calculates how often values occur within a range of values, and then returns a
vertical array of numbers. For example, use FREQUENCY to count the number
of test scores that fall within ranges of scores. Because FREQUENCY returns
an array, it must be entered as an array formula.
FREQUENCY(data_array,bins_array)
20-24 1 22 22
24-29 4 27 108
30-34 8 32 256
35-39 11 37 407
40-44 15 42 630
45-49 9 47 423
50-54 2 52 104
TOTAL 50 1950
CUMULATIVE % POLYGON-OGIVE:
From the % cumulative frequency polygon that starts from the first limit (not mid
point as in the case of relative frequency polygons ) can be drawn. Such a polygon
is called Ogive. The maximum value in an Ogive is always 100%. Ogives are
determining cumulative frequencies at different values (not limits).
GEOMETRIC MEAN:
Geometric mean is defined as the root of product of individual values. Typical
syntax is as under:
G=(x1.x2.x3 ... xn)^1/n
Example:
Find GM of 130, 140, 160
GM = (130*140*160)^1/3
= 142.8
HARMONIC MEAN:
Harmonic mean is defines as under:
HM=n / (1/x1+1/x2+ .... 1/xn)
= n / Sum(1/xi)
Example:
Find HM of 10, 8, 6
HM = 3/(1/10+1/8+ 1/6)
= 7.66
QUARTILES:
Quartiles divide data into 4 equal parts
For More Visit VUAnswer.com
1st Quartile Q1=1 (n+1)/4
2nd Quartile Q2= 2(n+1)/4
3rd Quartile Q3= 3(n+1)/4
Grouped data
Qi= ith Quartile = l + h / f [Sum f / 4*i – cf)
l = lower boundary
h = width of CI
cf = cumulative frequency
DECILES:
Deciles divide data into 10 equal parts
PERCENTILES:
Percentiles divide data into 100 equal parts
EMPIRICAL RELATIONSHIPS:
Symmetrical Distribution
mean = median = mode
Example:
mode = 15, mean = 18, median = ?
Median = 1/3[mode + 2 mean]
= 1/3[15 + 2(18)]
= [15+36]/3 = 51/3 = 17
Winsorized MEAN:
Replace each observation below first quartile with value of first quartile Replace each
observation above the third quartile with value of 3rd quartile
DISPERSION OF DATA:
The degree to which numerical data tend to spread about an average is called the
dispersion of data
DISPERSION OF DATA:
MEANS:
The most common measure of central tendency is the mean.
THE MEAN:
It is the sum of all values divided by the number. In the case of mean of a sample,
the number n is the total sample size.
EXTREME VALUES:
An important point to remember is that arithmetic mean is affected by extreme values.
THE MEDIAN:
The Median is derived after ordering the array in ascending order. If the number
of observations is odd, it is the middle value otherwise it is the the average of the
the two middle values. It is not affected by extreme values.
THE MODE:
The mode is the value that occurs most frequently.
MIDRANGE:
Midrange is the average of slimmest and largest value. In other words it is half of
a range. Midrange is affected by extreme values as it is based on smallest and
largest values
QUARTILES:
Quartiles are not exclusively measures of central tendency. However, they are
useful for dividing the data in 4 equal parts.
QUARTILE DEVIATION:
MEASURES OF VARIATION:
In measures of variation, there are the sample and population standards deviation and
variance the most important measures. The coefficient of variation is the ratio of standard
deviation to the mean in %.
INTERQUARTILE RANGE:
Interquartile range is the difference between the ist and 3rd quartile.
VARIANCE:
Variance is the one of the most important measures of dispersion. Variance gives
the average square of deviations from the mean. In the case of the population, the
sum of square of deviations is divided by N the number of values in the
population. In the case of variance for the sample the number of observations less
1 is used.
STANDARD DEVIATION:
Standard deviation is the most important and widely used measure of dispersion.
The square root of square of deviations divided by the number of values for the
population and number of observations less 1 gives the standard deviation.
COEFFICIENT OF VARIATION:
Coefficient of variation (CV) shows the dispersion of the standard deviation about
the mean. In the slide you see two stocks A and B with CV=10% and 5%
respectively.
MEAN DEVIATION:
Other useful measures are Deviation about the Mean and median. The formulas
For More Visit VUAnswer.com
for normal or grouped data are as follows:
CORRELATION:
SCATTER DIAGRAM:
The first step in regression analysis is to plot the values of the dependent and
independent variable in the form of a scatter diagram
INTERCEPT:
Use the INTERCEPT function when you want to determine the value of the
dependent variable when the independent variable is 0 (zero). For example, you
can use the INTERCEPT function to predict a metal's electrical resistance at 0°C
when your data points were taken at room temperature and higher.
There are different types of regression models. The simplest is the Simple Linear
Regression Model or a relationship between variables that can be represented by a straight
line equation.
REGRESSION EQUATION
The formula for the regression equation is as under:
Equation of Least Squares Regression line
y – ym = (r.s(y)/s(x)).(x-xm)
For More Visit VUAnswer.com
Example
Based on analysis of data the following values have been worked out:
xm = 4;
ym = 80;
s(x) = 2^1/2;
s(y) = 200^1/2;
r = 0.8
SAMPLING DISTRIBUTION:
It is possible to construct a sampling distribution for r similar to those for sampling
distributions for means and percentages.
SEASONABLE VARIATIONS:
Seasonal Variations are regarded as constant amount added to or subtracted from the
trends. This is a reasonable assumption as seasonal peaks and troughs are roughly of
constant size.
PERMUTATIONS
An arrangement of all or some of a set of objects in a definite order is called permutation.
BINOMDIST
Returns the individual term binomial distribution probability. Use BINOMDIST
in problems with a fixed number of tests or trials.
BINOMDIST(number_s,trials,probability_s,cumulative)
Number_s is the number of successes in trials.
Trials is the number of independent trials.
Probability_s is the probability of success on each trial.
Cumulative is a logical value that determines the form of the function.
NEGBINOMDIST:
Returns the negative binomial distribution. NEGBINOMDIST returns the
probability that there will be number_f failures before the number_s-th
success, when the constant probability of a success is probability_s.
CRITBINOM:
Returns the smallest value for which the cumulative binomial distribution is
greater than or equal to a criterion value.
POISSON DISTRIBUTION:
STANDARDISATION:
Process of calculating z from x is called Standardisation. z indicates how
many standard deviations the point is from the mean
SAMPLING DISTRIBUTION: