0% found this document useful (0 votes)

10 views153 pages

UNIT II_ Statistics for Data Science_new (1)

Uploaded by

Kranium A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views153 pages

UNIT II_ Statistics for Data Science_new (1)

Uploaded by

Kranium A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 153

UNIT - II

Statistics for Data Science

Data Science
By
Shilpa Sonawani

SCHOOL OF COMPUTER ENGINEERING AND

TECHNOLOGY
Basic statistics

• Statistics: “a bunch of mathematics used

to summarize, analyze, and interpret a
group of numbers or observations.”
*It is a tool.
*Cannot replace your research design,
your research questions, and theory
or model you want to use.

2
Population and sample

• Population: any group of interest or any

group that researchers want to learn more
about.
– Population parameters (unknown to us):
characteristics of population
• Sample: a group of individuals or data
are drawn from population of interest.
– Sample statistics: characteristics of sample
3
Population and sample

• We are much more interested in the

population from which the sample was drawn.
– Example: 30 GPAs as a representative
sample drawn from the population of GPAs
of the freshmen currently in attendance at a
certain university or the population of
freshmen attending colleges similar to a
certain university.

4
Population and sample

Population

sampling inference

Sample

5
Primary & Secondary
•Data
Raw or Primary data: when data
collected having
lot of unnecessar y, irrelevant & un wanted
information
• Treated or Secondary data: when we
treat &
remove this unnecessar y, irrelevant & un
wanted
• Cooked data: when data collected not
information
genuinely and
is false and
fictitious 9
Ungrouped & Grouped
example if we
Data
Ungrouped data: when data presented or observed individually. For
observed no. of
children in 6
families
2, 4, 6, 4, 6, 4
Grouped data: when we grouped the identical data by frequency. For
example above
No. of children
data of children in 6 families Families
can be grouped as:
2 1
4 3
6 2
or alternatively we can make
classes:
2 -children
No. of 4 4
Frequency
10
5-7 2
Variable

A variable is something that can be changed,

such as a
characteristic or value. For example age,
height, weight,
blood pressure etc

11
Types of
Variable
Independent variable: is typically the variable
representing the
value being manipulated
Dependent or changed.
variable: is the observedFor example
result of the
smoking
independent
variable being manipulated.
Confounding For example
variable: is associated caboth
with of lung
exposure and
disease. For example age is factor for many events

12
Types of measurement

• Discrete: Quantitative data are called

discrete if the sample space
contains a finite or countably
infinite number of values.
–How many days did you smoke during
the last 7 days

1
0
Types of measurement

• Continuous: Quantitative data are called

continuous if the sample space
contains an interval or continuous span
of real numbers.
– Weight, height, temperature
– Height: 1.72 meters, 1.7233330 meters

1
1
Types of measurement

• Nominal
–Categorical variables. Numbers that are
simply used as identifiers or names
represent a nominal scale of
measurement such as female vs. male.

12
Types of measurement

• Ordinal
– An ordinal scale of measurement
represents an ordered series of
relationships or rank order. Likert-type
scales (such as "On a scale of 1 to 10, with
one being no pain and ten being high pain,
how much pain are you in today?")
represent ordinal data.

13
Types of measurement

• Interval: A scale that represents quantity

and has equal units but for which zero
represents simply an additional point of
measurement.
– The Fahrenheit scale is a clear example of the
interval scale of measurement. Thus, 60 degree
Fahrenheit or -10 degrees Fahrenheit
represent interval data.

14
Types of measurement

• Ratio: The ratio scale of measurement is

similar to the interval scale in that it
also represents quantity and has
equality of units. However, this scale
also has an absolute zero (no numbers
exist below zero). For example, height
and weight.

15
Types of measurement

• Qualitative vs. Quantitative variables

–Qualitative variables: values are texts
(e.g.,Female, male), we also call them
string variables.
–Quantitative variables: are numeric
variables.
Data Types

For each
dimension…

Numerical Categorical

Continuous Discrete Nominal Ordinal

Basic statistics

• Two types of
statistics
–Descriptive statistics
–Inferential statistics

18
Basic statistics

• Descriptive statistics:
–“are procedures used to
summarize, organize, and make
sense of a set of scores or
observations.”

19
Basic statistics

• Inferential statistics:
–“are procedures used that allow
researchers to infer or generalize
observations made with samples to
the larger population from which
they were selected.”

20
Why Descriptive statistics?
• Who is a better ODI batsmen - Sachin or Muralidharan?
• Batting average?
• Who is the reliable- Dhoni or Afridi?
• Score variance
• A triangular series among Aus, Eng & Newziland ; Who will win?
• Most number of wins - Mode
• I am going to buy shoes. Which brand has verity- Power or Adidas?
• Price range - Range
• We used Average, Variance, Mode, Range to make some inferences.
These are nothing but descriptive statistics
• Descriptive statistics tell us what happened in the past.
• Descriptive statistics avoid inferences but, they help us to get a feel
of the data.
• Some times they are good enough to make an inference. 5
Descriptive Statistics
• A statistic or a measure that describes the data
• Average salary of employees
• Describing data with tables and graphs
(quantitative or categorical variables)
• Numerical descriptions
• Center –measures of center of the data
• Variability–measures of variability of the data
• Bivariate descriptions (In practice, most studies have several
variables)
• Dependency measures(Correlation)
6
Descriptive statistics

• Use descriptive statistics to describe,

summarize, and organize set of
measurements.
• Use descriptive statistics to communicate
with other researchers and the public.
• Descriptive statistics: Central tendency
and Dispersion

23
Descriptive statistics

• Measures of Central tendency: we use

statistical measures to locate a single
score that is most representative of all
scores in a distribution.
–Mean
–Median
–Mode

24
Descriptive statistic

• The notations used to represent

population parameters and
sample statistics are different.
–For example
• Population size : N
• Sample size : n

25
Descriptive statistics

• Mean
– 𝑋 (or M) for sample mean and μ
for population mean

– 𝑋 (x bar) =𝑛
∑𝑥
– ∑x means sum of all individual scores of x1-
xn
– n means number of scores

26
Descriptive statistics

• Example 1: we want to know how 25

students performed in math tests.
• Data are in the next slide.

27
Descriptive statistics
Score (X) Frequency (f) fX
60 1 60
65 2 130
70 3 210
75 4 300
80 5 400
85 4 340
90 3 270
95 2 190
100 1 100
Sum 25 2000

28
Descriptive statistics

• How to calculate mean for those 25

scores?
• 𝑋 = ∑𝑓𝑥 = 2000 = 80.00
𝑛 25

29
Descriptive statistics

• Distribution of Example 1

Mean = 80

30
Descriptive statistics

• Median
– Data: 2, 3, 4, 5, 7, 10, 80. Mean of those
scores is 15.86.
– 80 is an outlier.
– Mean fails to reflect most of the data. We
use median instead of mean to remove the
influence of an outlier.
– Median is the middle value in a distribution
of data listed in a numeric order.
31
Descriptive statistics

• Median
–Position of median = 𝑛+1
2
–For odd –numbered sample size:
3,6,5,3,8,6,7. First place each score
in numeric order: 3,3,5,6,6,7,8.
Position 4. median = 6
32
Descriptive statistics

• Median
• For even-numbered sample size:
3,6,5,3,8,6. First place each score in
numeric order: 3,3,5,6,6,8.
5+
3.5. Median = 6 = 5.5
2
Position
• Example 2: we want to know average
salary of 36 cases.
33
Descriptive statistics
Salary Frequency
$20k 1
$25k 2
$30k 3
$35k 4
$40k 5
$45k 6
$50k 5
$55k 4
$200k 3
$205k 2
$210k 1
Total 36
34
Descriptive statistics

• Median = ?
• Position 18.5
• Which number is at position 18.5?
• Median = $45k

35
Descriptive statistics

• Mode
–The value in a data set that occurs
most often or most frequently.
–Example: 2,3,3,3,4,4,4,4,7,7,8,8,8.
Mode = 4

36
Range
unit 1 unit 2
• Max –Min 9.7 9.0
11.5 11.2
11.6 11.3
12.1 11.7
12.4 12.2

R: range(x) 12.6
13.1
12.5
13.2
13.5 13.8
13.6 14.0
14.8 15.5
16.3 15.6
26.9 16.2
16.4

37
Excercise

• Find the mean, median, mode, and range for

the following list of values: 13, 18, 13, 14, 13,
16, 14, 21, 13
Mean, Median, Mode, Range

• Question: Find the mean, median, mode, and

range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13

• Create sample data in python using randint

containing values between 7- 9 and size 20.
Calculate mean, median and mode of data in
Python notebook
Descriptive statistics

• Dispersion (Variability): a measure of

the spread of scores in a distribution.

40
Descriptive statistics

• Compare different distributions

41
Descriptive statistics

• Compare different distributions

42
Descriptive statistics

• Two sets of data have the same

sample size, mean, and median.
• But they are different in terms of
variability.

43
Descriptive statistics

• Dispersion
–Range
–Variance
–Standard deviation

44
Descriptive statistics
• Range
–It is the difference between the
largest value and smallest
value.
–It is informative for data
without outliers.
45
Descriptive statistics

• Variance
–It measures the average squared
distance that scores deviate from
their mean.
–Sample variance: s2 (population
variance σ2 sigma)

46
Descriptive statistics

• How to calculate variance?

2 ∑ 𝑠
–𝑠 = 𝑛−1 or 𝑠 :
𝑛−1
𝑥 −𝑥
ss means sum of
squares.
2
–n-1 means: degree of freedom: the
number of scores in a sample that
are free to vary.

47
Descriptive statistics

• Example: five scores: 5, 10, 7, 8, 15

–Mean = 9
–Let’s calculate variance
• SS = (5-9)2 + (10-9)2 + (7-9)2 + (8-9)2 + (15-
9)2 = 58
• Sample variance = 58/(5-1) = 14.5

48
Descriptive statistics

• Standard deviation (s, σ)

–It is the square root of variance.
–It is average distance that scores
deviate from their mean.

49
Descriptive statistics

• Summary
–When individual scores are close to mean,
the standard deviation (SD) is smaller.
When individual scores are spread out far
from the mean, the standard deviation is
larger.
–SD is always positive
–It is typically reported with mean.
51
Descriptive statistics

• Choosing proper measure of central

tendency depends on:
–the type of distribution
–the scale of measurement

52
Descriptive statistics

• Mean describes data that are

normally distributed and measures
on an interval or ratio scale.
• Median is used when the data are
not normally distributed.

53
Standard Deviation

• The Standard Deviation is a measure of how

spread out numbers are.
• Its symbol is σ (the greek letter sigma)
• The formula is easy: it is the square root of
the Variance. So now you ask, "What is the
Variance?"

SD is always positive and reported with Mean.

Variance

• The Variance is defined as:

– The average of the squared differences from the
Mean.
• To calculate the variance follow these steps:
– Work out the Mean (the simple average of the
numbers)
– Then for each number: subtract the Mean and
square the result (the squared difference).
– Then work out the average of those squared
differences.
55
Variance, Standard Deviation

A hen lays eight eggs. Each egg was weighed and recorded as follows:
• 60 g, 56 g, 61 g, 68 g, 51 g, 53 g, 69 g, 54 g.
• Calculate Mean and Standard Deviation.
Example with Solution
Coefficient of variation

• The coefficient of variation represents the

ratio of the standard deviation to the mean,
and it is a useful statistic for comparing the
degree of variation from one data series to
another, even if the means are drastically
different from one another.
• CV=Standard deviation /mean
Standard Deviation

• Standard deviation is a number that describes how

spread out the values are.
• A low standard deviation means that most of the
numbers are close to the mean (average) value.
• A high standard deviation means that the values are
spread out over a wider range.
• This time we have registered the speed of 7 cars:
• speed = [86,87,88,86,87,85,86] Standard Deviation = 0.9
• Meaning that most of the values are within the range of 0.9 from
the mean value, which is 86.4.
• speed = [32,111,138,28,59,77,97] ???
Example with Solution
Quartiles & Percentiles
• pth percentile: p percent of observations below it, (100 - p)%
above it.
• Like 95% of CAT percentile means🡪 5% are above & 95% are
below
• 1,2,3,4,5,6,7,8,9,10 - What is 25th percentile?
• 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 - What is
25th percentile? What is 80th percentile?

• p = 50: median
• p = 25: lower quartile (LQ)
• p = 75: upper quartile (UQ)
• Interquartile range IQR = UQ - LQ 21
Box Plots
• Quartiles portrayed graphically by box plots

22
Boxplots
• For numerical data

• However, boxplots cannot identify modes (e.g. unimodal, bimodal, etc.)

Box Plots Interpretation1`
• Box plots have box from LQ to UQ, with median marked. They
portray a five-number summary of the data: Minimum, LQ,
Median, UQ, Maximum
• Except for outliers identified separately
• Outlier = observation falling
below LQ – 1.5(IQR) or above UQ + 1.5(IQR)
• Ex. If LQ = 2, UQ = 10, then IQR = 8 and outliers above 10 +
1.5(8) = 22

66
Box Plot calculation and interpretation
The following data are the heights of 40 students in a statistics class.
59 60 61 62 62 63 63 64 64 64 65 65 65 65 65 65 65 65 65 66 66 67 67 68 68 69 70 70 70 70 70 71 71
72 72 73 74 74 75 77
Construct a box plot with the following properties; the calculator instructions for the minimum and
maximum values as well as the quartiles follow the example.
•Minimum value = 59
•Maximum value = 77
•Q1, First quartile = 64.5
•Q2, Second quartile or median= 66
•Q3, Third quartile = 70
1.Each quarter has approximately 25% of the data.
2.The spreads of the four quarters are 64.5 – 59 = 5.5 (first
quarter), 66 – 64.5 = 1.5 (second quarter), 70 – 66 = 4 (third
quarter), and 77 – 70 = 7 (fourth quarter). So, the second
quarter has the smallest spread and the fourth quarter has
the largest spread.
3.Range = maximum value – the minimum value = 77 – 59 =
18
4.Interquartile Range: IQR = Q3 – Q1 = 70 – 64.5 = 5.5.
5.The interval 59–65 has more than 25% of the data so it has
Summaries for numerical data

• Center/location: measures the “center” of the data

– Examples: sample mean and sample median

• Spread/Dispersion: measures the “spread” or

“fatness” of the data
– Examples: sample variance, interquartile range

• Order/Rank: measures the ordering/ranking of the

data
– Examples: order statistics and sample quantiles
Summary Type of Formula Notes
Sample

Continuou • Summarizes the

s “center” of the data
• Sensitive to outliers

Continuou • Summarizes the

s “spread” of the data
• Outliers may inflate
this value

Continuou ith largest value of the sample • Summarizes the

s order/rank of the
data
Continuou • Summarizes the
s “center” of the data
• Robust to outliers

Continuou • Summarizes the

s order/rank of the
data
• Robust to outliers
Sample Interquartile Continuou • Summarizes the
Range (Sample IQR) s “spread” of the data
• Robust to outliers
Descriptive statistics

• Normal distribution
–Probability: the frequency of times an
outcome is likely to occur divided by
the total number of possible
outcomes.
• It varies between 0 and 1.
• Example (next slide)

70
Descriptive statistics

• Probability
Fail Pass Total

Male 3 2 5

Female 1 4 5

Total 4 6 10

1. What is the probability of Fail? 4/10 =.4

2. What is the probability of Pass? 6/10 = .6
3. What is the probability of Fail among males? 3/5 = .6
4. What is the probability of Pass among females? 4/5 = .8
71
Descriptive statistics

• Normal distribution/Normal curve

–Data are symmetrically distributed
around mean, median, and
mode.
–Also called the symmetrical, Gaussian,
or bell-shaped distribution.

72
Descriptive statistics

• Normal curve

73
Descriptive statistics

• Normal curve

74
Descriptive statistics

• Characteristics of normal distribution

–The normal distribution is
mathematically defined.
–The normal distribution is theoretical.
–The mean, median, and mode are all
the same value at the center of the
distribution.
75
Descriptive statistics

• Characteristics of normal distribution

–The normal distribution is symmetrical.
–The form of a normal distribution is
determined by its mean and
standard deviation.
–Standard deviation can be any positive
value.
76
Descriptive statistics

• Characteristics of normal distribution

–The total area under the curve is equal
to 1.
–The tails of normal distribution are
always approaching to x axis, but
never touch it.

77
Histograms

• For numerical data

• A method to show the “shape” of the data by tallying

frequencies of the measurements in the sample

• Characteristics to look for:

– Modality: Uniform, unimodal, bimodal, etc.
– Skew: Symmetric (no skew), right/positive-skewed,
left/negative-skewed distributions
– Quantiles: Fat tails/skinny tails
– Outliers
Pandas/.hist()
• It does the grouping.
When using .hist() there is no need for the
initial .groupby() function! .hist() automatically groups your data into
bins. (By default, into 10 bins.)
Note: again, “grouping into bins” is not the same as “grouping by unique
values” — as a bin usually contains a range of values.
1. It does the counting. (No need for .count() function either.)

2. It plots a histogram for each column in your dataframe that has

numerical values in it.

• So plotting a histogram (in Python, at least) is definitely a

very convenient way to visualize the distribution of your
data.
Histogram

• Calculate Mean,
Median using
formula.
• Plot Histogram of
scores and find
mean and median
Histogram

• Find mean, median

of the given data
using formula.
• Plot histogram and
find mean and
median.
Sample data for test

• Create an array containing 10000 random

floats between 0 and 5: Between 0 and 5

• numpy.random.uniform(0.0, 5.0, 100000) , Hist?

• Data from normal data distribution, or
the Gaussian data distribution
• numpy.random.normal(5.0, 1.0, 100000) , Hist?

mean sd
Normal Distribution

• Normal distribution/Normal curve

– Also called the symmetrical, Gaussian, or bell-shaped distribution.
• Characteristics of normal distribution
– The normal distribution is mathematically defined.
– The normal distribution is symmetrical
– The form of a normal distribution is determined by its mean and standard deviation.
– Standard deviation can be any positive value.
– The tails of normal distribution are always approaching to x axis, but never touch it.
– The total area under the curve is equal to 1.
– We use normal distribution to locate probabilities for scores.
– The area under the curve can be used to determine the probabilities at different points.
– Example: About 95% of all scores lie within two standard deviation of the mean (Normal
scores: close to the mean). we have 95% chance of selecting a score that is within 2 standard
deviation of mean. 68-95-99.7 rule.
Normal Distribution to Standard Normal Distribution

• Convert a value to a Standard Score ("z-

score"):
– first subtract the mean,
– then divide by the Standard Deviation
doing that is called "Standardizing"
Standard normal distribution or Z distribution

• A normal distribution
with mean = 0, and
standard deviation = 1.
• A Z score is a value on
the x-axis of a standard
normal distribution.
• We can take any Normal
Distribution and convert
it to The Standard
Normal Distribution.
Z Score
• z = (x – μ) / σ
– x observation
– μ mean
– σ standard deviation
What is the probability that any observed value is less than 105? Greater
than 105?
x=105, μ=100, σ =5, Find Z score & Probability under the curve

Z = 105-100/5=1

Refer Z table : P(x<Z) = 0.84134 i. e 84 %

P(x>Z) = 0.15866 i. e 15%

https://www.mathsisfun.com/data/standard-normal-distribution-table.html
Z-score Example

A person is having two sons. He wants to know who scored better on their
standardized test with respect to the other test takers. Ram who earned an 1800 on
his SAT or Sham who scored a 24 on his ACT Exam ?

Here we cannot simply compare and tell who has done better as they are measured in
different scale.
So, his father will be interested to observe how many standard deviation of their
respective mean of their distribution Ram and Sham score.
Ram = (1800- 1500) / 300 =1 standard deviation above the mean
Sham = (24 – 21 ) / 5= 0.6 standard deviation above the mean
Now his father can conclude Ram indeed did a better score than Sham.
Example

• A survey of daily travel time had these results (in minutes):

26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32,
28, 34
• The Mean is 38.8 minutes, and the Standard Deviation is
11.4 minutes.
• Convert the values to the Standard Normal Distribution
and plot histogram.
Why Standardization?

• Dataset that have multiple features spanning

varying degrees of magnitude, range, and
units.
• This is a significant obstacle as a few machine learning
algorithms are highly sensitive to these features.
• For example, one feature is entirely in kilograms while
the other is in grams, another one is liters, and so on.
• Feature Scaling: normalization and
standardization
Feature Scaling

• CGPA scores of students (ranging from 0 to 5)

• future incomes (in thousands Rupees):
• Since both the features have different scales,
there is a chance that higher weightage is
given to features with higher magnitude.
– This will impact the performance of the machine
learning algorithm
– Scale your Data
Feature Scaling

• Normalization is a scaling technique in which values are shifted and

rescaled so that they end up ranging between 0 and 1. It is also
known as Min-Max scaling.

• Standardization is another scaling technique where the values are

centered around the mean with a unit standard deviation.

• Normalization is good to use when you know that the distribution of

your data does not follow a Gaussian distribution

• Standardization, on the other hand, can be helpful in cases where the

data follows a Gaussian distribution
Skewness

• It is the degree of distortion from the symmetrical bell

curve or the normal distribution. It measures the lack
of symmetry in data distribution.
• A symmetrical distribution will have a skewness of 0.

For a left skewed

For a right skewed
distribution,
distribution,
Mode >= Median
Mean >= Median
>=Mean
>= Mode
What to do when data is skewed?

• It is very difficult to interpret and analyze the

data which is skewed.
• Transformations to be applied in the data so
that its information will be preserved and at
the same time data will be get plotted under a
symmetrical curve.
• Transformation is decided based on the
characteristics of the data
When is the skewness too much?
Transformation

• Taking the square root of each data point and

plotting it again.
• Taking the cube root of each data point and
plotting it again.
• Taking the logarithm of each data point and
plotting it again.
• Taking the reciprocal of each data point and
plotting it again.
Kurtosis

• Kurtosis is all about the tails of the distribution

• It is used to describe the extreme values in one versus the
other tail.
• It is actually the measure of outliers present in the
distribution.
• High kurtosis in a data set is an indicator that data has
heavy tails or outliers.
– investigate why do we have so many outliers.
• Low kurtosis in a data set is an indicator that data has
light tails or lack of outliers
– need to investigate and trim the dataset of unwanted results
Kurtosis

• Mesokurtic: This distribution has kurtosis statistic similar to that

of the normal distribution.
• Leptokurtic (Kurtosis > 3): Distribution is longer, tails are fatter.
Peak is higher and sharper than Mesokurtic, which means that
data are heavy-tailed or profusion of outliers.
• Platykurtic: (Kurtosis < 3): Distribution is shorter, tails are
thinner than the normal distribution. The peak is lower and
broader than Mesokurtic, which means that data are light-tailed
or lack of outliers.
Lab : Histogram
https://www.spss-tutorials.com/skewness/
https://brownmath.com/stat/shape.htm
• Create a histogram on variable ‘actual’ in prdsale data
• How many modes?
• What is the skewness?
• What is its kurtosis?
• Create a histogram on variable ‘msrp’ in cars data
• How many modes?
• What is the skewness?
• What is its kurtosis?
• Create a histogram on variable ‘weight’ in cars data
• How many modes?
• What is the skewness?
• What is its kurtosis?
10
Compare the above three histograms.
Lab
• What is the mean of ‘msrp’ in cars data?
• Is it reflecting the average value of price?
• What is median of ‘msrp’ in cars data?
• Is it reflecting the average value of price?
• Run Proc Univariate on weight varaibale in cars data. Find
mean, Median & Mode.

100
Excercise
Contingency Tables
• Cross classifications of categorical variables in which rows (typically)
represent categories of explanatory variable and columns
represent categories of response variable.
• Counts in “cells” of the table give the numbers of individuals at the
corresponding combination of levels of the two variables

Example: Happiness and Family Income of 1993 families (GSS 2008 data:
“happy,” “finrela”)
Happiness
Income Very Pretty Nottoo Total
-------------------------------
Above Aver. 164 233 26 423
Average 293 473 117 883
Below Aver. 132 383 172 687
------------------------------
Total 589 1089 1993
315

102
Contingency tables
• Example: Percentage “very happy” is
• 39% for above average income (164/423 = 0.39)
• 33% for average income (293/883 = 0.33)
• What percent for below average income?

Happiness
Income Very Pretty Not Total
--------------------------------------------
oo
Above 164 (39%) 233 (55%) 26 (6%) 423
Average 293 (33%) 473 (54%) 117 (13%) 883
Below 132 (19%) 383 (56%) 172 (25%) 687
----------------------------------------------
• What can we conclude? Is happiness depending on Income? Or
Happiness is independent of Income? 27
• Inference questions for later chapters?
Correlation
• Correlation describes strength of association between
two variables
• Falls between -1 and +1, with sign indicating direction of
association (formula & other details later )
• The larger the correlation in absolute value, the stronger
the association (in terms of a straight line trend)
• Examples: (positive or negative, how strong?)
• Mental impairment and life events, correlation =
• GDP and fertility, correlation =
• GDP and percent using Internet, correlation =

104
Calculating Correlation

The most widely used formula to compute correlation coefficient

is Pearson's 'r':

In the above formula,

•xi, yi - are individual elements of the x and y series
•The numerator corresponds to the covariance
•The denominators correspond to the individual standard
deviations of x and y
Strength of Association
• Correlation 0 🡪No linear association
• Correlation 0 to 0.25 🡪Negligible positive
association
• Correlation 0.25-0.5 🡪 Weak
positive association
• Correlation 0.5-0.75 🡪Moderate positive
association
• Correlation >0.75 🡪Very Strong positive
association
• What are the limits for negative
correlation

106
Regression
• Regression analysis gives line predicting y using
x(algorithm & other details later )

• y = college GPA, x = high school GPA

• Predicted y = 0.234 + 1.002(x)

107
Calculating Covariance

In the above formula,

•x , y - are individual elements of the x and y series
i i

•x̄, y̅ - are the mathematical means of the x and y series

•N - is the number of elements in the series
The denominator is N for a whole dataset and N - 1 in the case of a
sample. As our dataset is a small sample of the entire Iris dataset,
we use N - 1.
Covariance and Correlation

• Both covariance and correlation are about

the relationship between the variables.
• Covariance defines the directional
association between the variables.
• Covariance values range from -
inf to +inf where a positive value denotes
that both the variables move in the same
direction and a negative value denotes
that both the variables move in opposite
directions.
Covariance and Correlation

• Correlation is a standardized statistical

measure that expresses the extent to which
two variables are linearly related (meaning how
much they change together at a constant rate).
• The strength and directional association of the
relationship between two variables are defined
by correlation and it ranges from -1 to +1.
• Similar to covariance, a positive value denotes
that both variables move in the same direction
whereas a negative value tells us that they
move in opposite directions.
Covariance and Correlation

• Both covariance and correlation are vital tools

used in data exploration for feature selection
and multivariate analyses.
• For example, an investor looking to spread the
risk of a portfolio might look for stocks with a
high covariance, as it suggests that their
prices move up at the same time.
• However, a similar movement is not enough
on its own. The investor would then use the
correlation metric to determine how strongly
linked those stock prices are to each other.
Lab

• Use corrwith() function to find the

correlation among two dataframe objects
along the column axis
Quantile-Quantile Plots (QQ Plots)

• When the quantiles of two variables are

plotted against each other, then the plot
obtained is known as quantile – quantile
plot or qqplot.
• This plot provides a summary of whether
the distributions of two variables are
similar or not with respect to the locations.
Quantile-Quantile Plots (QQ Plots)

• For numerical data: visually compare collected data with a

known distribution

• Most common one is the Normal QQ plots

– We check to see whether the sample follows a normal distribution
– This is a common assumption in statistical inference that your
sample comes from a normal distribution

• Summary: If your scatterplot “hugs” the line, there is good

reason to believe that your data follows the said distribution.
Making a Normal QQ plot

•
If your data is not normal…

• You can perform transformations to make it

look normal

• For right/positively-skewed data: Log/square

root

• For left/negatively-skewed data:

exponential/square
Comparing the three visual techniques

• Histograms • Boxplots QQ Plots

• Advantages: • Advantages: • Advantages:
– With properly-sized bins, – Can identify whether the – Don’t have to tweak with
histograms can summarize data came from a certain “graphical” parameters
any shape of the data distribution (i.e. bin size in histograms)
(modes, skew, quantiles, – Don’t have to tweak with
outliers) – Summarize skew,
“graphical” parameters quantiles, and outliers
• Disadvantages: (i.e. bin size in histograms)
– Difficult to compare side-by- – Can compare several
– Summarize quantiles measurements side-by-
side (takes up too much
space in a plot) • Disadvantages: side
– Depending on the size of the – Difficult to compare side- • Disadvantages:
bins, interpretation may be by-side – Cannot distinguish modes!
different – Difficult to distinguish
skews, modes, and
outliers
Scatterplots

•
Scatter Plot

• Scatter Plot: without grouping

variable (Q2)

121
Scatter Plot

• Scatter plot by gender

122
Box Plot

123
Box Plot

• Box plot of Q6 without Q2

124
Box Plot

• Box plot of Q6 by Q2

125
Lab
• Run proc univariate on a variable from sample data in sas
default library(prd sale / cars)
• Run proc means on actual & predicted variables from product
sales data
• What are the values of Range, Variance, SD
• What are 1,2,3 & 4 quartile values
• What is 95th percentile?
• Use “all” option to display the box plots

126
Lab
• Create a contingency table for product sales data
• Find contingency tables for
• Region by product type
• Division by Product type
• Find the correlation between actual sales and predicted sales.
• Find the correlation between weight & msrp in cars data

127
Binomial distribution
• Binomial distribution is a type of discrete probability distribution representing probabilities
of different values of the binomial random variable (X) in repeated independent N trials in
an experiment.
• Thus, in an experiment comprising of tossing a coin 10 times (N), the binomial random
variable (number of heads represented as successes) could take the value of 0-10 and the
binomial probability distribution is probability distribution representing the probabilities of
a random variable taking the value of 0-10.

• The probability that a random variable X with binomial distribution B(n,p) is equal to the
value k, where k = 0, 1,….,n, is given by the following formula:

• P(X = k) = n!k!(n−k)!pk(1−p)(n−k)n!k!(n−k)!pk(1−p)(n−k)

• The mean and the variance of the binomial distribution of an experiment with n number of
trials and the probability of success in each trial is p is following:
• Mean = np
• Variance = np(1-p)
Binomial distribution

•Rolling a die: Probability of getting the number of six (6) (0, 1, 2, 3…50) while rolling a die
50 times;
•Here, the random variable X is the number of “successes” that is the number of times six
occurs.
•The probability of getting a six is 1/6. The binomial distribution could be represented as
B(50,1/6).
• The diagram below represents the binomial distribution for 100 experiments.
Binomial distribution

• The necessary conditions and criteria to use binomial

distributions:
• Rule 1: Situation where there are only two
possible mutually exclusive outcomes (for example,
yes/no survey questions).
• Rule2: A fixed number of repeated experiments and trials
are conducted (the process must have a clearly defined
number of trials).
• Rule 3: All trials are identical and independent (identical
means every trial must be performed the same way as
the others; independent means that the result of one trial
does not affect the results of the other subsequent trials).
• Rule: 4: The probability of success is the same in every
one of the trials.
Binomial distribution

Where:
•P is the probability of success on any trail.
•q = 1- P – the probability of failure
•n – the number of trails/experiments
•x – the number of successes, it can take the values 0, 1, 2,
3, . . . n.
•nCx = n!/x!(n-x) and denotes the number of combinations
of n elements taken x at a time.
Examples of binomial distribution problems:

• The number of defective/non-defective products in a

production run.
• Yes/No Survey (such as asking 150 people if they watch ABC
news).
• Vote counts for a candidate in an election.
• The number of successful sales calls.
• The number of male/female workers in a company
Binomial distribution

• Let’s say that 80% of all business startups in the IT

industry report that they generate a profit in their first
year. If a sample of 10 new IT business startups is
selected, find the probability that exactly seven will
generate a profit in their first year.
• First, do we satisfy the conditions of the binomial
distribution model?
• There are only two possible mutually exclusive outcomes
– to generate a profit in the first year or not (yes or no).
• There are a fixed number of trails (startups) – 10.
• The IT startups are independent and it is reasonable to
assume that this is true.
• The probability of success for each startup is 0.8.
Binomial distribution

n = 10, p=0.80, q=0.20, x=7

The probability of 7 IT startups to generate a profit in their first year is:

This is equivalent to:

Interpretation/solution: There is a 20.13% probability that exactly

7 of 10 IT startups will generate a profit in their first year when
the probability of profit in the first year for each startup is 80%.
Poisson distribution
Poisson distribution is actually another probability distribution formula. As per binomial
distribution, we won’t be given the number of trials or
the probability of success on a certain trail. The average number of successes will be given in a
certain time interval. The average number of successes is called “Lambda” and denoted by the
symbol “λ”.

The formula for Poisson Distribution formula is given below:

Here,
λ is the average number
x is a Poisson random variable.
e is the base of logarithm and e = 2.71828 (approx).
Poisson distribution
Question: As only 3 students came to attend the class today,
find the probability for exactly 4 students to attend the classes tomorrow.
Solution:
Given,
Average rate of value(λ) = 3
Poisson random variable(x) = 4
Poisson distribution = P(X = x) =
P(X=4)=e−3⋅3**4/4!
P(X=4)=0.16803135574154
Poisson probability distribution
• Poisson probability distribution is used in situations where events occur
randomly and independently a number of times on average during an
interval of time or space.
• The random variable XX associated with a Poisson process is discrete and
therefore the Poisson distribution is discrete.
• These are examples of events that may be described as Poisson
processes:
• My computer crashes on average once every 4 months.
• Hospital emergencies receive on average 5 very serious cases every 24 hours.
• The number of cars passing through a point, on a small road, is on average 4 cars every 30
minutes.
• I receive on average 10 e-mails every 2 hours.
• Customers make on average 10 calls every hour to the customer help center
Poisson distribution

• Conditions for a Poisson distribution are

– Events are discrete, random and independent
of each other.
– The average number of times of occurrence
of the event is constant over the same period
of time.
– Probabilities of occurrence of event over fixed
intervals of time are equal.
– Two events cannot occur at the same time;
they are mutually exclusive.
Poisson distribution
the graphs of P(X)P(X) for several values of the average λ and we note that
the probability is maximum for xx close to the average λλ and decreases
as xx takes larger values which makes sense.
Central Limit Theorem
• The Central Limit Theorem is the sampling distribution of the
sampling means approaches a normal distribution as the sample
size gets larger, no matter what the shape of the data distribution.
• An essential component of the Central Limit Theorem is the
average of sample means will be the population mean.
• Similarly, if you find the average of all of the standard deviations
in your sample, you will find the actual standard deviation for your
population.
• Mean of sample is same as the mean of the population.
• The standard deviation of the sample is equal to the standard
deviation of the population divided by the square root of the
sample size.
• Central limit theorem is applicable for a sufficiently large sample
sizes (n ≥ 30). The formula for central limit theorem can be stated
as follows:
Central Limit Theorem

Where,
μ = Population mean
σ = Population standard deviation
μx¯¯¯ = Sample mean
σx = Sample standard deviation
¯¯¯

n = Sample size
Example
Question: The record of weights of the male population follows the normal
distribution.
Its mean and standard deviations are 70 kg and 15 kg respectively. If a researcher
considers
the records of 50 males, then what would be the mean and
standard deviation of the chosen sample?

Solution:
Mean of the population μ = 70 kg
Standard deviation of the population = 15 kg
sample size n = 50
Mean of the sample is given by:
μx¯¯¯ = 70 kg
Standard deviation of the sample is given by:

σx¯¯¯ = 15/√50
σx¯¯¯ = 2.122 = 2.1 kg (approx)
Confidence Interval

• Confidence, in statistics, is another way to describe

probability. For example, if you construct a confidence interval
with a 95% confidence level, you are confident that 95 out of
100 times the estimate will fall between the upper and lower
values specified by the confidence interval.
• Your desired confidence level is usually one minus the alpha (
a ) value you used in your statistical test:
• Confidence level = 1 − a
• So if you use an alpha value of p < 0.05 for statistical
significance, then your confidence level would be 1 − 0.05 =
0.95, or 95%.
Confidence Interval

• You can calculate confidence intervals for

many kinds of statistical estimates,
including:
• Proportions
• Population means
• Differences between population means or
proportions
• Estimates of variation among groups
Confidence Interval

• You survey 100 Brits and 100 Americans about their

television-watching habits, and find that both groups
watch an average of 35 hours of television per week.
• However, the British people surveyed had a wide
variation in the number of hours watched, while the
Americans all watched similar amounts.
• Even though both groups have the same point
estimate (average number of hours watched), the
British estimate will have a wider confidence interval
than the American estimate because there is more
variation in the data.
Confidence Interval
• the British people surveyed had a wide variation in the number of hours watched,
while the Americans all watched similar amounts.
• Even though both groups have the same point estimate (average number of hours
watched), the British estimate will have a wider confidence interval than the American
estimate because there is more variation in the data.
Confidence Interval
A confidence interval for a mean is a range of values that is likely to
contain a population mean with a certain level of confidence.
It is calculated as:
Confidence Interval = x +/- t*(s/√n)
where:
•x: sample mean
•t: t-value that corresponds to the confidence level
•s: sample standard deviation
•n: sample size
•For a two-tailed 95% confidence interval, the alpha value is 0.025, and the
corresponding critical value is 1.96.
Confidence level 90% 95% 99%
alpha for one- 0.1 0.05 0.01
tailed CI
alpha for two- 0.05 0.025 0.005
tailed CI
z-statistic 1.64 1.96 2.57
Example Confidence Interval

In the survey of Americans’ and Brits’ television watching habits, we can use the sample mean, sample standard deviation,
and sample size in place of the population mean, population standard deviation, and population size.
To calculate the 95% confidence interval, we can simply plug the values into the formula.
For the USA:
In the TV-watching example, the point estimate is the mean number of hours watched: 35.
You survey 100 Brits and 100 Americans about their television-watching habits, and find that both groups watch
an average of 35 hours of television per week.

So for the USA, the lower and upper bounds of the 95% confidence interval are 34.02 and 35.98.
For GB:

So for the GB, the lower and upper bounds of the 95% confidence interval are 33.04 and 36.96.
Lab Practice
• Compute minimum, 25th percentile, median,
75th, and max of a numeric series
• # Input

• state = np.random.RandomState(100)

• ser = pd.Series(state.normal(10, 5, 25))

• Perform Reshape on the series ser into a dataframe with 7
rows and 5 columns

# Input

ser = pd.Series(np.random.randint(1, 10, 35))

• Import only two columns from dataframe
• Compute the mean price of every fruit, while keeping the fruit as
another column instead of an index for the given dataframe

• df = pd.DataFrame({'fruit': ['apple', 'banana', 'orange'] *

3, ‘rating': np.random.rand(9),
'price': np.random.randint(0, 15, 9)})
• For prdsale dataset
• Plot line plot,bar chart, pie chart

150 SQL Vulnerable Websites 2017 List
50% (14)
150 SQL Vulnerable Websites 2017 List
3 pages
Filebound 7 Api Manual: Application Programming Interface
No ratings yet
Filebound 7 Api Manual: Application Programming Interface
82 pages
Amos Annotated Output Sem Cfa PDF
No ratings yet
Amos Annotated Output Sem Cfa PDF
31 pages
UNIT II - Statistics For Data Science - New
No ratings yet
UNIT II - Statistics For Data Science - New
153 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
72 pages
Data Management
No ratings yet
Data Management
48 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
Statistics
No ratings yet
Statistics
45 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Ssmda End Sem
No ratings yet
Ssmda End Sem
152 pages
Introduction and Descriptive Statistics
No ratings yet
Introduction and Descriptive Statistics
50 pages
Unit 2 DS pdf
No ratings yet
Unit 2 DS pdf
97 pages
Psyc 103 (Stats)
No ratings yet
Psyc 103 (Stats)
75 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
IE 211 - Chapter 1
No ratings yet
IE 211 - Chapter 1
92 pages
Probability and Statistics (Tutorial 1)
No ratings yet
Probability and Statistics (Tutorial 1)
35 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
Advanced Statistics1
No ratings yet
Advanced Statistics1
19 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
W1 Lesson 1 - Basic Statistical Concepts - Module PDF
No ratings yet
W1 Lesson 1 - Basic Statistical Concepts - Module PDF
11 pages
Intro
No ratings yet
Intro
67 pages
DOC-20250325-WA0014
No ratings yet
DOC-20250325-WA0014
63 pages
Statistics
No ratings yet
Statistics
68 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
11 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Introduction To Statistical Methods in Research
No ratings yet
Introduction To Statistical Methods in Research
30 pages
Biostatistics 1
No ratings yet
Biostatistics 1
19 pages
SS 104 - Lecture Notes Part 1 EDITED
No ratings yet
SS 104 - Lecture Notes Part 1 EDITED
8 pages
Statistics
No ratings yet
Statistics
11 pages
DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
15 pages
Math Test Prep File
No ratings yet
Math Test Prep File
88 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
17 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
ai- ssmda
No ratings yet
ai- ssmda
142 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Chapter1 Introduction To Statistics
No ratings yet
Chapter1 Introduction To Statistics
27 pages
Safari
No ratings yet
Safari
385 pages
Statistics
No ratings yet
Statistics
88 pages
Statistics
100% (4)
Statistics
124 pages
Biostatistics-1
No ratings yet
Biostatistics-1
120 pages
Basics of Statistics
No ratings yet
Basics of Statistics
32 pages
1.9 Data and data analysis
No ratings yet
1.9 Data and data analysis
31 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
C4 Descriptive Statistics
No ratings yet
C4 Descriptive Statistics
34 pages
Introduction To Quantitative Research
No ratings yet
Introduction To Quantitative Research
68 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
6938
No ratings yet
6938
41 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Statistics_ Def-wps Office
No ratings yet
Statistics_ Def-wps Office
14 pages
MATM111
No ratings yet
MATM111
8 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Statistical Theory and Its Solutions
From Everand
Statistical Theory and Its Solutions
Pasquale De Marco
No ratings yet
Email_Spam_Perceptron_Example
No ratings yet
Email_Spam_Perceptron_Example
9 pages
RAC CCA2
No ratings yet
RAC CCA2
1 page
Rabindranath_Tagore_Full_Biography
No ratings yet
Rabindranath_Tagore_Full_Biography
4 pages
AIML LAB 2
No ratings yet
AIML LAB 2
3 pages
peace ppt
No ratings yet
peace ppt
10 pages
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
Design Thinking
No ratings yet
Design Thinking
8 pages
Data preprocessing (1)
No ratings yet
Data preprocessing (1)
77 pages
Unit 2 Theory of Metal Cutting
No ratings yet
Unit 2 Theory of Metal Cutting
51 pages
Unit 1 Metal Forming
No ratings yet
Unit 1 Metal Forming
45 pages
Data Warehouse
No ratings yet
Data Warehouse
68 pages
BSC - ClassNote - 1 GIS
No ratings yet
BSC - ClassNote - 1 GIS
27 pages
Data Science Masters Program Online
No ratings yet
Data Science Masters Program Online
14 pages
Huyenchip Com 2023 04 11 LLM Engineering HTML
No ratings yet
Huyenchip Com 2023 04 11 LLM Engineering HTML
13 pages
Mod 4
No ratings yet
Mod 4
2 pages
Sergio Silva License Plate Detection ECCV 2018 Paper1
No ratings yet
Sergio Silva License Plate Detection ECCV 2018 Paper1
18 pages
Answer: - : I Use Waterfall Model For The Specification Given Above
No ratings yet
Answer: - : I Use Waterfall Model For The Specification Given Above
16 pages
Volume Shadow Copy Services
No ratings yet
Volume Shadow Copy Services
10 pages
Tanel Poder Drilling Deep Into Exadata Performance PDF
No ratings yet
Tanel Poder Drilling Deep Into Exadata Performance PDF
36 pages
25 Msc-Data Science
No ratings yet
25 Msc-Data Science
79 pages
Database Management Systems Notes
No ratings yet
Database Management Systems Notes
49 pages
Nutri Fit
No ratings yet
Nutri Fit
44 pages
Spatial Analyst Raster Data Analysis and DEMs v2
No ratings yet
Spatial Analyst Raster Data Analysis and DEMs v2
41 pages
Sample Q - A For Module 3 - 4
No ratings yet
Sample Q - A For Module 3 - 4
18 pages
ECSE321 Project Deliverable 2
No ratings yet
ECSE321 Project Deliverable 2
3 pages
Talend Data Integration Certified Developer Exam Actualtestdumps Actual Questions by Ramirez 15 04 2024 10qa
No ratings yet
Talend Data Integration Certified Developer Exam Actualtestdumps Actual Questions by Ramirez 15 04 2024 10qa
16 pages
Big Data Analitics Assignment One
No ratings yet
Big Data Analitics Assignment One
4 pages
Create Windows Accounts For Replication
No ratings yet
Create Windows Accounts For Replication
15 pages
Intelligent Data and Analytics Fabric
No ratings yet
Intelligent Data and Analytics Fabric
18 pages
Create Customer-Material Info Record (2YX) : Master Data Script SAP S/4HANA - 18-09-20
100% (1)
Create Customer-Material Info Record (2YX) : Master Data Script SAP S/4HANA - 18-09-20
8 pages
Lesson 7: Producing Readable Output With iSQL Plus: SQL Sample Questions
No ratings yet
Lesson 7: Producing Readable Output With iSQL Plus: SQL Sample Questions
15 pages
DBMS Week-6 Assignment
No ratings yet
DBMS Week-6 Assignment
6 pages
A Model For Auto-Tagging of Research Papers Based On Keyphrase Extraction Methods
No ratings yet
A Model For Auto-Tagging of Research Papers Based On Keyphrase Extraction Methods
6 pages
Oracle Settings For R3load Based System
No ratings yet
Oracle Settings For R3load Based System
3 pages
DATA WRANGLING
No ratings yet
DATA WRANGLING
4 pages
01 Lecture1 Intro To GIS ArcGIS
No ratings yet
01 Lecture1 Intro To GIS ArcGIS
36 pages
Application of GIS in Construction Management
No ratings yet
Application of GIS in Construction Management
10 pages