Data Analysis
Data Analysis
𝑠𝑢𝑚 𝑜𝑓 𝑛 𝑛𝑢𝑚𝑏𝑒𝑟𝑠
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 =
𝑛
5 + 9 + 1 + 7 + 0 + 9 + 2 33
{5, 9, 1, 7, 0, 9, 2} 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 = = = 4.71
7 7
To determine the mean number of children per
household in a community, Tabitha surveyed 20
families at a playground. For the 20 families surveyed,
the mean number of children per household was 2.4.
Which of the following statements must be true?
C) The sampling method is flawed and may produce a biased estimate of the
mean number of children per household in the community.
𝑥1 + ⋯ 𝑥𝑛 𝑥1 + 𝑐 + ⋯ 𝑥𝑛 + 𝑐 𝑥1 + ⋯ 𝑥𝑛 + 𝑛𝑐
=𝑥 = =𝑥+𝒄
𝑛 𝑛 𝑛
{6, 2, 4, 5, 1, 5} 4+5
{1, 2, 4, 5, 5, 6} M= = 4.5
2
Consider the set {4,x,9,15,15,27,32}.
𝑛 = 7 then no matter what is x the median is 15.
The median of the numbers in list R is the middle number when the numbers are listed in order
from least to greatest, that is, the 8th number. Since the median of the numbers in list R is equal to
the least integer in list T, the 8 greatest integers in R are the 8 least integers in T, and the number
of different integers in the combined list is 15 + 21 – 8, or 28.
In evenly spaced sets: mean = median.
The mean and median of the set are equal to the average of the FIRST and LAST terms.
(Or any two terms that are symmetric about the center of the set)
5 + 30 15 + 20 10 + 25
The average of the set {5, 10, 15, 20, 25, 30} is = = = 17.5
2 2 2
601+101
The average of the set {101, 111, 121. .. 581, 591, 601} is equal to = 351
2
{5, 6, 1, 9, 7, 4, 6, 3} Mode =6
{6, 2, 4, 2, 9, 7, 7, 9, 5, 2, 7} Mode = 2, 7
{8, 1, 5, 4, 9, 2} Mode= 8, 1, 5, 4, 9, 2
The modes of a set of 9 numbers are x, y, and z, and the average
(arithmetic mean) of the 9 numbers is 20. Three of the 9 numbers
are 2x + 5, 2y, and 2z - 3. What is the value of 4(x + y + z)?
Weighted average (Average of 2 sets)
𝑛𝑎+𝑚𝑏
Total average of A, B =
𝑛+𝑚
𝑎 𝐴𝑣𝑔 𝑜𝑓 𝐴, 𝐵 𝐴𝑣𝑔 𝑜𝑓 𝐴, 𝐵 𝑏
𝑎+𝑏
2
Each employee of a certain company is in either Department X or
Department Y, and there are more than twice as many employees in
Department X as in Department Y. The average (arithmetic mean) salary
is $25,000 for the employees in Department X and $35,000 for the
employees in Department Y. Which of the following amounts could be
the average salary for all of the employees of the company?
A) $26,000
B) $29,000
C) $30,000
D) $31,000
Since 𝑋 > 2𝑌, there are more employees with the lower average salary, the average
salary of all employees must be less than the average of $25,000 and $35,000, which
is $30,000. Therefore the answer choices is A.
Quartiles and Percentiles
Like the median M, quartiles and percentiles are numbers that divide the data into roughly equal
groups after the data have been ordered from the least value L to the greatest value G.
There are three quartile numbers that divide the data into four roughly equal groups.
The first quartile 𝑸𝟏 , the second quartile 𝑸𝟐 (which is simply the median M), and the third
quartile 𝑸𝟑 divide a group of data into four roughly equal groups as follows.
3+2 8 + 10
𝑄1 = = 2.5 𝑄3 = =9
2 2
𝑄2 = M
𝑸𝟐 is the median of all data and then 𝑸𝟏 is the median of the numbers lesser than M and 𝑸𝟑 is the
median of the numbers greater than M.
sometimes a data value is so unusually small or large in comparison with the rest of the data. Such
data are called outliers.
The middle half of data is called interquartile and the interquartile range is defined as, 𝑸𝟑 – 𝑸1 .
There are 99 percentiles numbers that divide the data into 100 roughly equal groups.
Percentiles are mostly used for very large lists of numerical data ordered from least to greatest.
The 99 percentiles 𝑷𝟏, 𝑷𝟐, 𝑷𝟑, … , 𝑷𝟗𝟗 divide the data into 100 groups. Consequently,
A survey was taken to find the number of children in each of 25 families. A list of the values collected
in the survey follows.
1204133120452323241230231
Number of Frequency Relative Frequency is the number of times that the category or
children frequency value appears in the data.
5
corresponding frequency or relative frequency.
4
0
0 1 2 3 4 5
Number of children
The chart above depicts the number of electoral votes assigned to each
of the six New England states. What is the average (arithmetic mean)
number of electoral votes, to the nearest tenth, assigned to these states?
segmented bar graph
Bar graphs are also used to compare different groups using the same categories.
FALL 2009 ENROLLMENT AT FIVE COLLEGES FALL 2009 ENROLLMENT AT FIVE COLLEGES
8000 8000
Full-time
7000 7000 Part-time
6000 6000
5000 5000
Enrollment
4000 4000
3000 3000
2000 2000
1000 1000
0 0
COLLEGE A COLLEGE B COLLEGE C COLLEGE D COLLEGE E COLLEGE A COLLEGE B COLLEGE C COLLEGE D COLLEGE E
COLLEGES
The chart shows year‐end values for Darnella’s
investments. For just the stocks, what was the
increase in value from year‐end 2000 to year‐end
2003 ?
(A) $1,000
(B) $2,000
(C) $3,000
(D) $4,000
(E) $5,000
Using bar graph to compare numerical data
7000
6000
5000
Enrollment
4000
3000
2000
1000
0
College A College B College C College D College E
Fall 2009 Spring 2010
Circle Graphs (pie charts)
They illustrate how a whole is separated into parts.
Usually is used to show relative frequency.
College A
15%
College E
The area of each sector is proportional to the percent of
28%
the whole that the sector represents,
College B the measure of the central angle of a sector is
16%
proportional to the percent of 360 degrees that the sector
represents.
College D
College C
23%
18%
1
25% 𝑜𝑓 1200 = 𝑜𝑓 1200 = 300
4
𝑂 = 2𝐺
⇒ 𝑂 = 200, 𝐺 = 100
The annual budget of a certain college is to be shown
on a circle graph. If the size of each sector of the
graph is to be proportional to the amount of the
budget it represents, how many degrees of the circle
should be used to represent an item that is 15 percent
of the budget?
(A) 15°
(B) 36°
(C) 54°
(D) 90°
(E) 150°
Histograms
When a list of data is large, it is useful to organize it by grouping the values into intervals, often called
classes.
To do this,
divide the entire interval of values into smaller intervals of equal length
and then count the values that fall into each interval.
Histograms are useful for identifying the general shape of a distribution of data.
Scatterplots
A Scatterplot has points that show the relationship between two sets of data.
Such data are called bivariate data.
Sales and temperature for 12 Ice creams
$700
(25,610) A scatterplot makes it possible to
$600 observe an overall pattern, or trend.
$500 The more points closer the trend
line, the finding a relation between
$400
two set of data and make prediction is
Sales
A time plot (sometimes called a time series) is a graphical display useful for showing
changes in data collected at regular intervals of time.
4000
3500
3000
Enrollment
2500
2000
1500
1000
500
0
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
In what year was the percent increase in the value of a share of stock B
the greatest?
Since the slope of the graph B is steepest in 2007 (between January 1, 2007 and January 1,
2008), the rate of growth was greatest then.
What was the average yearly increase in the value of a share of stock A from
2005 to 2010?
Over the 5-year period from January 1, 2005, to January 1, 2010, the value of a share of stock A rose
from $30 to $45, an increase of $15. The average yearly increase was $15 ÷ 5 years or $3 per year.
boxplots or box-and-whisker plots
L 𝑄1 𝑄2 𝑄3 G
2
+ 𝑎2 − 𝑚 2 + ⋯ + 𝑎𝑛 − 𝑚 2
𝑎1 − 𝑚
𝑆𝐷 =
𝑛
The process of subtracting the mean from each value and then dividing the result by the
standard deviation is called standardization.
In any group of data, most of the data are within about 3 standard deviations above or below the mean.
Variance := 𝜎 2 = 𝑆𝐷2
The term set is informally a collection of objects that have some property (members can not be repeated).
If A and B are sets and all of the members of A are also members of B, then A is a subset of B.
A list is like a finite set that the members are ordered and can be repeated.
The subsets of the set {𝒘, 𝒙, 𝒚} are {𝒘}, {𝒙}, {𝒚}, {𝒘, 𝒙},
{𝒘, 𝒚}, {𝒙, 𝒚}, {𝒘, 𝒙, 𝒚}, and { } (the empty subset). How
many subsets of the set {𝒘, 𝒙, 𝒚, 𝒛} contain 𝒘 ?
(A) Four
(B) Five
(C) Seven
(D) Eight
(E) Sixteen
If S and T are sets, then the intersection of S and T is the set of all elements that are in both S
and T and is denoted by S ∩ T.
The union of S and T is the set of all elements that are in S or T, or both, and is denoted by S ∪ T.
If sets S and T have no elements in common, they are called disjoint or mutually exclusive.
U=universal set
Venn diagram
A B
𝑁=out of A,B
𝑨 ∪ 𝑩 = 𝑨 + 𝑩 − |𝑨 ∩ 𝑩|
𝑈 = 𝐴∪𝐵 +𝑁
Each of 25 people is enrolled in history, mathematics, or both. If 20
are enrolled in history and 18 are enrolled in mathematics, how
many are enrolled in both history and mathematics?
There are 87 balls in a jar. Each ball is painted with at least one of two colors, red or green. It is
observed that 2/7 of the balls that have red color also have green color, while 3/7 of the balls
that have green color also have red color. What fraction of the balls in the jar have both red and
green colors?
(A) 6/14
(B) 2/7
(C) 6/35
R G
(D) 6/29
(E) 6/42 B
𝑅∪𝐺 =𝑅+𝐺−𝐵
7 7 29
2 3 ⇒𝑅∪𝐺 = 𝐵+ 𝐵−𝐵 ⇒ 87 = 𝐵
𝐵= 𝑅= 𝐺 2 3 6
7 7
𝐵 6
=
87 29
In a certain production lot, 40 percent of the toys are red and the remaining
toys are green. Half of the toys are small and half are large. If 10 percent of
the toys are red and small, and 40 toys are green and large, how many of the
toys are red and large?
There is 5 possibility.
bus train plane bus train plane bus train plane bus train plane bus train plane
There is 15 possibilities.
Multiplication principle
If an operation consists of 𝒌 steps, of which the first can be done in 𝒏𝟏 ways, for each of these the
second step can be done in 𝒏𝟐 ways, for each of the first two the third step can be done in 𝒏𝟑 ways,
and so forth, then the whole operation can be done in 𝒏𝟏 𝒏𝟐 … 𝒏𝒌 ways.
A quality control inspector wishes to select a part for inspection from each of four different bins containing
4, 3, 5, and 4 parts, respectively. In how many different ways can she choose the four parts?
In how many different ways can one answer all the questions of a true–false test
consisting of 20 questions?
suppose 𝒏 objects are to be ordered from 1𝑠𝑡 to 𝑛𝑡ℎ , and we want to count the number of ways the
objects can be ordered.
𝑛 × 𝑛−1 × 𝑛−2 ×. . . × 2 × 1 ≔ 𝒏! (called n factorial)
In how many different ways can the five starting players of a basketball team be introduced to the public?
There are 5! = 5 ・ 4 ・ 3 ・ 2 ・ 1 = 120 ways in which they can be introduced.
How many different three-digit positive integers can be formed using the digits 1, 2, 3, 4, 5, 6, 7 if none
of the digits can occur more than once in the integer?
7 6 × 5 7!
× =
7−3 !
Suppose that 𝒌 objects will be selected from a set of 𝒏 objects, where 𝒌 ≤ 𝒏, and the k objects
will be placed in order from 𝟏𝒔𝒕 to 𝒌𝒕𝒉 .
𝒏!
𝑛 × 𝑛−1 × 𝑛−2 ×. . . × 𝑛−𝑘+1 =
𝒏−𝒌 !
The number of ways to select and order k objects out of n objects is denoted by
𝒏!
𝑷 𝒏, 𝒌 =
𝒏−𝒌 !
How many different permutations are there of the letters in the word “book”?
Let distinguish between the two 𝑜’s by labeling them 𝑜1 and 𝑜2 , → {𝑏, 𝑜1 , 𝑜2 , 𝑘}
The number of permutations of 𝑛 objects of which 𝑛1 are of one kind, 𝑛2 are of a second kind, . . . ,
𝑛𝑘 are of a 𝑘 𝑡ℎ kind, and 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 = 𝑛 is
𝒏!
𝒏𝟏 ! 𝒏𝟐 ! … 𝒏𝒌 !
How many different anagrams (meaningful or nonsense) are possible for the
word MASSASAVGA?
We have one M, four A’s, three S’s, one V and one G thus the number of different anagrams is
10!
1! × 1! × 1! × 3! × 4!
In how many ways can two paintings by Monet, three paintings by Renoir, and two paintings
by Degas be hung side by side on a museum wall if we do not distinguish between the
paintings by the same artists?
7!
= 210
2! 3! 2!
Combinations
A combination is a selection of r objects taken from n distinct objects without regard to the order of
selection.
In how many different ways can a person gathering data for a market research organization select three
of the 20 households living in a certain apartment complex?
If we care about the order in which the households are selected, the answer is
𝑃 20, 3 = 20 × 19 × 18 = 6,840
6 6! 6!
= = = 15
2 2! 6 − 2 ! 2! 4!
n
=1
0
n
=n
1
n
=1
n
n n
=
r n−r
n 𝑛 𝑛−1
=
2 2
“or” and “and”
4 3
= 6 × 3 = 18
2 1
In how many different ways can the letters of the word 'LEADING' be arranged in such a way that the vowels
always come together?
Out of 7 consonants and 4 vowels, how many words of 3 consonants and 2 vowels can be formed?
In a group of 6 boys and 4 girls, four children are to be selected. In how many different ways can
they be selected such that at least one boy should be there?
How many 3-digit numbers can be formed from the digits 2, 3, 5, 6, 7 and 9,
which are divisible by 5 and none of the digits is repeated?
probability
How likely something is to happen.
# of ways event can happen
Probability of an event happening=
# of total outcomes
#(𝐴)
𝑃 𝐴 =
#(𝑆)
-When a coin is tossed, there are two possible outcomes: Heads or tails
1 1
𝑃 H = and 𝑃 𝑇 =
2 2
-When a single die is thrown, there are six possible outcomes 1, 2, 3, 4, 5, 6.
1 1 1
𝑃 1 = , 𝑃 2 = , …, 𝑃 6 =
6 6 6
3
𝑃 𝑒𝑣𝑒𝑛 = 𝑃 2,4,6 =
6
-There are 5 marbles in a bag: 4 are blue, and 1 is red. What is 4
𝑃 𝑏𝑙𝑢𝑒 =
the probability that a blue marble gets picked? 5
𝟎 ≤ 𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 ≤ 𝟏
𝟏
𝟐
Sarah cannot completely remember her four-digit ATM pin
number. She does remember the first two digits, and she
knows that each of the last two digits is greater than 5. The
ATM will allow her three tries before it blocks further
access. If she randomly guesses the last two digits, what is
the probability that she will get access to her account?
(A) 1/2 (B) 1/4 (C) 3/16 (D) 3/18 (E) 1/32
𝒂, 𝒃 are two integers that 𝟏 < 𝒂 < 𝟑 , −𝟏 < 𝒃 < 𝟑,
what is the probability when 𝟑𝒂 − 𝟒𝒃 is less than 1?
Complementary event 1
𝑃 𝑟𝑒𝑑 =
There are 5 marbles in a bag: 4 are blue, and 1 is red. 5
4
𝑃 𝑏𝑙𝑢𝑒 =
5
the complement of selecting blue marbles is selecting marbles that are not blue.
1 4
𝑃 𝑟𝑒𝑑 + 𝑃 𝑏𝑙𝑢𝑒 = + =1
5 5
If in a class the probability of picking men is 0.35, what is the probability of picking women?
580
2 580 × 579 600 × 600 36 36 72
700 = ≈ = ≈ = ≈ 0.7
700 × 699 700 × 700 49 50 100
2
There are 27 students in Mr. White’s homeroom. What is the
probability that at least 3 of them have their birthdays in the same
month?
𝟑
(A)
𝟐𝟕
𝟑
(B)
𝟏𝟐
𝟏
(C)
𝟐
(D) 1
If one number is randomly selected from each of the following sets, what is the probability
that the product of three numbers is even?
{2,3}, {5,8,12,18,20}, {4, 6, 9, 10}
# 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑑𝑑
𝑃 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑑𝑑 =
# 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
1 39
# 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑒𝑣𝑒𝑛 = 𝟏 − 𝑃 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑑𝑑 = 𝟏− =
40 40
Mutually exclusive events: Two events cannot happen together
A ball is randomly selected form a box contains red and green balls.
Throw a die.
𝑷 𝑬 ∪ 𝑭 = 𝑷 𝑬 + 𝑷 𝑭 − 𝑷(𝑬 ∩ 𝑭)
→ 𝑃 𝐸 ∪ 𝐹 = 𝑃 𝐸 + 𝑃(𝐹)
There are 𝒏 phone lines. The probability that each one has problem is 0.3, if
the probability that at least one of them don’t have problem is more than
0.99, what is the least possible value of 𝒏?
A) 2
B) 4
C) 6
D) 8
E) 10
Near a certain exit of I-17, the probabilities are 0.23 and 0.24,
respectively, that a truck stopped at a roadblock will have faulty
brakes or badly worn tires. Also, the probability is 0.38 that a truck
stopped at the roadblock will have faulty brakes and/or badly worn
tires. What is the probability that a truck stopped at this roadblock
will have faulty brakes as well as badly worn tires?
If 𝐵 is the event that a truck stopped at the roadblock will have faulty brakes and
𝑇 is the event that it will have badly worn tires, we have
A survey was taken to find the number of children in each of 25 families. A list of the values collected
in the survey follows.
# of Frequency Relative
1204133120452323241230231
children frequency
Assume we have chosen these 25 families randomly from wide range of families.
0 3 12%
We call the number of children a variable X;
since families are randomly chosen, We call X a random variable. 1 5 20%
2 7 28%
What is the probability that 𝑋 = 3?
3 6 24%
6
There is 25 families and 6 of them have 3 children. →𝑃 𝑋=3 = = 24% 4 3 12%
25
What is the probability that X > 3? 5 1 4%
3 1 4 Total 25 100%
𝑋 > 3 → 𝑋 = 4 𝑜𝑟 𝑋 = 5 → 𝑃 𝑋 > 3 = 𝑃 𝑋 = 4 + 𝑃 𝑋 = 5 = + =
25 25 25
What is the probability that X < 4?
𝑋 < 4 𝑎𝑛𝑑 𝑋 ≥ 4 𝑎𝑟𝑒 𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒. → 𝑃 𝑋 <4 +𝑃 𝑋 ≥4 =1
4 4 21
𝑃 𝑋 ≥4 =𝑃 4 +𝑃 5 = = 16% →𝑃 𝑋 <4 =1−𝑃 𝑋 ≥4 =1− = = 84%
25 25 25
Lets make a histogram of children in 25 families.
Relative frequency of X = 𝑷(𝑿)
Probability Distribution of the Random Variable X
X P(X)
Relative frequency
Probability
0 0.12
1 0.2
2 0.28
3 0.24
4 0.12
5 0.4
Number of children 1
Value of X
The mean of data is :
0 3 +1 5 +2 7 +3 6 +4 3 +5 1 3 5 7 6 3 1
𝑀= =0 +1 +2 +3 +4 +5
25 25 25 25 25 25 25
= 0𝑃 0 + 1𝑃 1 + 2𝑃 2 + 3𝑃 3 + 4𝑃 4 + 5𝑃 5 = 𝟐. 𝟏𝟔
Then mean sometimes is called expected value that is the sum of 𝑋𝑃(𝑋)’s.