0% found this document useful (0 votes)
17 views

COMP4610 Notes Set 1

This document provides an introduction to statistics and probability theory. It defines statistics as dealing with collecting, organizing, analyzing and presenting data, while data science uses scientific methods to analyze data and solve problems. The document then discusses randomness and probability theory, provides a brief history of statistics and probability, gives examples to illustrate the definition of probability as the number of successful outcomes divided by the total number of possible outcomes, and introduces key terminology like sample space, events, elementary events and compound events.

Uploaded by

G M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

COMP4610 Notes Set 1

This document provides an introduction to statistics and probability theory. It defines statistics as dealing with collecting, organizing, analyzing and presenting data, while data science uses scientific methods to analyze data and solve problems. The document then discusses randomness and probability theory, provides a brief history of statistics and probability, gives examples to illustrate the definition of probability as the number of successful outcomes divided by the total number of possible outcomes, and introduces key terminology like sample space, events, elementary events and compound events.

Uploaded by

G M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

COMP4610 Statistics for Data Science

Notes Set 1: Introduction


Definition 0.1. Statistics is a field of study that deals with the collection, organiza-
tion, analysis, interpretation and presentation of data in a meaningful form.
Definition 0.2. Data science is a composite field of study that uses scientific meth-
ods, processes, algorithms and programming systems to mine information from data
and to solve problems.

1 Randomness
Probability theory is the mathematical theory dealing with the description of exper-
iments involving randomness.
Consider this experiment:

Drop a ball in a vacuum from a height d. How long it takes for the ball to reach the
ground?

Use
1
d = ut + at2
2
where
u: initial velocity
a : acceleration = g(9.8ms−2 ).
So,
1
d = gt2 .
2
So the ball will fall to the ground in
s
2d
t= seconds,
g
this will happen every time.

Consider instead this experiment:


Flip a coin 3 times in succession and observe the outcome of each flip.
No mathematical formula will allow us to predict the result of this experiment with
certainty. We require a new approach to deal with random experiments.

1
2 A Brief History
2.1 Pre-17th Century
Intuitive concepts of probability in use for millennia, motivated by games of chance.
Specific games and problems considered. Paccioli and Cardano, 15th /16th century,
considered dice problems and ‘balla’.

2.2 17th Century


1654: French nobleman de Méré prompts correspondence between Fermat and Pas-
cal, the first general theory of games of chance

1657: A teacher of Leibniz, Christian Huygens, publishes a treatise on gambling


problems.

2.3 18th Century


The link to gambling drives rapid development. Jakob Bernoulli writes “Ars Con-
jectandi’, published posthumously in 1713. “Bernoulli trials” named for him.

Abraham de Moivre publishes “Doctrine of chances”. Thomas Bayes originates works


on inverse probability.

2.4 19th Century


1812: Laplace publishes ‘Théorié Analytique des Probabilities’, the first work to
move probability away from gambling applications.
Actuarial mathematics, the theory of errors and statistical mechanics developed as
applications.

2.5 20th Century


1933: Kolmogorov aximatises the theory, allowing it to be treated abstractly and
providing the first precise definition of probability after 3 centuries of effort.
Applications developed in physics, genetics, finance, psychology, economics, engi-
neering and otherwise.

2
3 A Working Definition of Probability
Example 3.1. At a football match, the team that wins the coin toss gets the kick off.
Our team captain always chooses “Heads”, and next month we play three matches.
What is the probability that he wins

(a) All 3 tosses?

(b) 2 tosses?

(c) 1 toss?

(d) No tosses?

We draw a tree diagram showing the possible results of three tosses of a coin.
Number of Heads
H 3


 T 2
H
  H 2
 
H
 T  T 1
  H 2
 H  T 1

Q

 H 1
Q
Q T 
T T 0

number of possible results 8

number of heads (H) 0 1 2 3


1 3 3 1
Probability 8 8 8 8

Question: Does it matter if the captain does not always choose ‘Heads’ ?
Answer: No, we can replace ‘H’ with ’success’ (probability equal).

Example 3.2. A card is drawn at random from a conventional deck of 52 cards.


What is the probability that the card is a Diamond?
13
Solution 3.3. Since there are 13 diamonds, the answer is 52
= 14 .

3
Assumptions:
The coin is fair: it is equally likely to produce heads and tails.
All cards in the deck are equally likely to be drawn.
Additionally, in both examples the experiment had a finite number of possible out-
comes
Definition 3.4. If an experiment has n possible outcomes and r of these outcomes
are deemed ‘Success’ then the probability of success in a single run of the experiment
is nr . That is

Number of successful outcomes


Probability of success =
Number of possible outcomes
Example 3.5. A card is drawn at random from a conventional pack of 52 cards.
What is the probability of drawing
(a) A red card;

(b) An ace;

(c) A picture card.


26
Solution 3.6. (a) There are 26 red cards: 52
= 12 .
4 1
(b) There are 4 aces: 52
= 13
.
12 3
(c) There are 12 picture cards: 52
= 13
.
Example 3.7. Two fair dice are thrown. What is the probability that
(a) The total score is 2;

(b) The total score is 4.


Solution 3.8. The first die can show any 6 numbers, as can the second. So there
are 36 equally likely results.
(a) A total of 2 arises if and only if both dice show 1. This is just one of the 36
1
equally likely results. Hence the probability is 36 .

(b) A total 4 arises if the pairs are (1, 3), (2, 2) or (3, 1). Note that (1, 3) is distinct
3 1
from (3, 1). Since there are three successful results, the probability is 36 = 12
Exercise 3.9. Represent the possible results using a tree diagram.

4
4 Terminology and Notation
Definition 4.1. An experiment or a trial is any process that, when repeated, gen-
erates a set of results or observations. For example, tossing a coin, drawing a card,
drawing two cards.
Definition 4.2. An outcome is the result of carrying out a trial. For example, Heads,
H, Q, 2♠, 3♣, 4♥.
Definition 4.3. A sample space is the set of all possible outcomes of a random
experiment denoted by S.
Definition 4.4. An event is a subset of the sample space S.
We return to the example 3.1 of the coin toss at the start of a football match.

The sample space of this experiment is the set of all sequences of length 3 repre-
senting the results of each flip. It can be represented as

HHT T HT HT T TTH
HHH T HH HT H TTT

If the captain calls heads three times, the following are events:
A: Captain wins three times.
B: Captain wins the second toss.
C: Captain wins exactly one toss.
S

 C 

HHT T HT HT T TTH
 
'$
A
HHH T HH HT H TTT
&%

5
Remark 4.5. The precise arrangement of the outcomes in S does not matter. In
set notation
A = {HHH}
B = {HHT, T HT, HHH, T HH}
C = {T HT, HT T, T T H}
Exercise 4.6. Draw the sample space of example 3.7.
Definition 4.7. An elementary event is an event consisting of a single outcome.
For example, set A in remark 4.5.
Definition 4.8. A compound event is an event consisting of more than one outcome.
For example, set B and set C in remark 4.5.
Using the language of sets, we can now make precise our working definition of
probability.

The probability of event A is the number of outcomes in A divided by the total


number of possible outcomes. This is equal to the number of outcomes in the sample
space S. Using the set notation n(S) to represent the number of outcomes in S, and
similarly n(A) to represent the number of outcomes in A. We see that the probability
of event A is n(A)
n(S)
and we write it as P(A).
Definition 4.9. If an experiment has a finite set of equally likely outcomes S, then
n(E)
P(E) =
n(S)
where E is an event.
Recall remark 4.5. We see that
n(B) 4 1
n(B) = 4, P(B) = = = .
n(S) 8 2
Similarly,
n(C) 3
n(C) = 3, P(C) = = .
n(S) 8

The sample space S is also an event, and


n(S)
P(S) = = 1.
n(S)

6
The empty set, ∅, is an event. For example, let us consider example 3.1 and define
a new event.
Z : Captain wins four times.

This event contains none of the outcomes of S. So


n(Z) 0
P(Z) = = = 0.
n(S) 8

Definition 4.10. An event of probability 1 is certain. An event of probability 0 is


impossible. For example event S is certain and event Z above is impossible.

A consequence of our definition of probability is that it must lie in the range [0, 1].

For a given experiment, S can be defined in several ways, as long as each outcome
is equally likely. For example, throw a die. Then we can define S = {1, 2, 3, 4, 5, 6}.
The event ‘throw a six’ is {6} and the probability of throwing a six is given by

n({6}) 1
P(throw a six) = = .
n(S) 6

The event ‘throw an odd number’ is {1, 3, 5} and the probability of throwing an odd
number is given by

n({1, 3, 5}) 3 1
P(throw an odd number) = = = .
n(S) 6 2

However, we can classify the outcomes as even (0) and odd (1). Then

S = {0, 1}.

Now, ’throw an odd number’ is {1}, so

n({1}) 1
P(throw an odd number) = = .
n(S) 2

7
5 Counting Methods
To calculate the probability of event E, it is essential to be able to count the number
of outcomes in E, n(E), and the number of outcomes in S, n(S) as straight forwardly
as possible. We use permutations and combinations as useful branch of mathematics.
Definition 5.1. (The Fundamental Principle of Counting)
If two operations A and B are carried out and there are m ways of carrying out A
and k ways of carrying (The Fundamental Principle of Counting)out B, then A and
B can be carried out in mk different ways.
Example 5.2. To save time in the mornings a mathematician has only three t-shirts,
red, white, and blue and four pairs of socks: blue, brown, white and black. How many
possible outfits does he have?
Solution 5.3. A red t-shirt can be combined with any of the socks. The same applies
to the white and blue t-shirts. Hence the number of outfits is 3 × 4 = 12.
Exercise 5.4. Construct a tree diagram to illustrate this.
Definition 5.5. A permutation of n distinct objects is a specific ordering of the
objects. For example, ABC, ACB, BAC, BCA, CAB, CBA are all permutations
of the letters A, B and C.
Theorem 5.6. (Permutations of n objects)
The number of permutations of n distinct objects, taken all together, is given by

n × (n − 1) × (n − 2) × · · · × 2 × 1.

Example 5.7. How many ways can 5 books be arranged on a shelf ?


Solution 5.8.
5 4 3 2 1
5 × 4 × 3 × 2 × 1 = 120.
Since n × (n − 1) × (n − 2) × · · · × 2 × 1 is often used, we write it as n!, read as ‘n
factorial’.
Example 5.9. A company assigns identification number (ID) numbers to its em-
ployees. The first three characters in this code are A, B, C in any order. This is
followed by the digits 1 to 5 in any order. Each letter and digit can occur only once.
How many employees can the company hire before it must change this system?

8
Solution 5.10. The first three positions can be filled in 3! ways, the next 5 in 5!
ways. The total number of ID’s is

3! × 5! = 6 × 120 = 720.

If the company hires more than this, it will need a different system.

In general, given two distinct sets of objects of size m and k respectively, where
each set can be arranged in any order, the total number of arrangements of the first
set followed by the second set is
m! × k!

Theorem 5.11. (Permutations of n objects taken r at a time)


The number of permutations of n distinct objects, taken r at a time is given by

n × (n − 1) × (n − 2) × · · · × (n − r + 1).

Example 5.12. I have 10 books, but my bookshelf will only fit five. How many ways
are there to fill the shelf ?

Solution 5.13.
10 9 8 7 6
10 × 9 × 8 × 7 × 6 = 30, 240.

Remark 5.14.
n × (n − 1) × · · · × (n − r + 1) × (n − r) × · · · × 2 × 1
n × (n − 1) × · · · × (n − r + 1) =
(n − r) × (n − r − 1) × · · · × 2 × 1
n!
= .
(n − r)!

We denote this n Pr . So,

n n!
Pr =
(n − r)!

But what if we don’t care about the order of the books on the shelf? How many
ways can the shelf be fitted then?

Since 5 books can be arranged in 5! ways, and we no longer care about the ar-
rangement, the answer is

9
10
P5 30, 240
=
5! 120

= 252.

Definition 5.15. The number of combinations of n objects taken r at a time is the


number of ways that r objects can be selected from the set of n objects without regard
for order. It is denoted
 
n n
Cr =
r
n
Pr
=
r!
where

n
Pr n × (n − 1) × · · · × (n − r + 1)
=
r! r × (r − 1) · · · × 2 × 1
n!
=
r!(n − r)!

Remark 5.16. We consider the following.

1. Order matters for permutations.

2. Order does not matter for combinations.

Example 5.17. Take permutations and combinations of size 2 from the letters
A, B, C.

Solution 5.18. We can list the permutations:

Permutations
AB BA
AC CA
BC CB

Here, the total number of permutations is given by

10
3 3!
P2 =
3 − 2)!
3!
=
1!

= 6

We can list the combinations:


Combinations
AB
AC
BC

Here, the total number of combinations is given by

3
3 P2
C2 =
2!
6
=
2

= 3

Exercise
1. (a) In how many ways can the letters the word M AT HEM AT ICS be ar-
ranged?
(b) If the letter of the word M AT HEM AT ICS are arranged in a line at
random, what is the probability that the arrangement begins with M M ?

2. How many different codes can be formed if each is to contain the four letters
IN F O followed by the four digits 2100?

11
3. The digits 0, 1, 2, 3, and 4 are to be used in a four digit password. How many
different passwords are possible

(a) if repetitions are allowed?


(b) if repetitions are not allowed?

4. An urn contains five balls whose colours are red, blue, black, brown and white.
A ball is selected, its colour is noted, and replaced. Then a second ball is
selected, and its colour is noted.

(a) How many different colour schemes are possible?


(b) If the first ball is not replaced, how many different colour schemes are
possible?

5. How many ways can 5 books be arranged on a shelf if they can be selected
from 10 books?

6. Two letters are chosen at random from the word T HEORY .

(a) Find the probability that all two letters are consonants.
(b) Find the probability that all two letters are vowels.

END OF NOTES SET 1

12

You might also like