Data analytics 1
Data analytics 1
• Data
• Qualitative & Quantitative data
• Measurement and Scaling
• Descriptive statistical measures
• Normal Distribution curve
• Skewness, kurtosis
• Standard normal distribution
• Sampling Techniques
• Mean and S.D of sample mean
Data
Data Information
Data is unorganised and unrefined facts Information comprises processed, organised data presented in a
meaningful context
Data is an individual unit that contains raw materials which do not Information is a group of data that collectively carries a logical
carry any specific meaning. meaning.
Raw data alone is insufficient for decision making Information is sufficient for decision making
An example of data is a student’s test score The average score of a class is the information derived from the given
data.
Structured vs Unstructured data
Structured Data –
• The data which is to the point, factual, and highly organized
• It is quantitative in nature
• It is easy to search and analyse
• Structured data exists in a predefined format
• Structured data generally exist in tables like excel files and Google Docs spreadsheets
• SQL (structured query language) is used for managing the structured data
Unstructured Data
• All the unstructured files, log files, audio files, and image files are included
• Unstructured data is the data that lacks any predefined model or format
• It requires a lot of storage space, and it is hard to maintain security in it
• It cannot be presented in a data model or schema. That's why managing, analysing, or searching for unstructured data is
hard
• It is qualitative in nature and sometimes stored in a non-relational database or NO-SQL.
• Examples of human-generated unstructured data are Text files, Email, social media, media, mobile data, business
applications, and others
Quantitative vs Qualitative data
Quantitative data
• Surveys and questionnaires: This is an especially useful method for gathering large quantities of
data. If you wanted to gather quantitative data on employee satisfaction, you might send out a
survey asking them to rate various aspects of the organization on a scale of 1-10.
• Analytics tools: Data analysts and data scientists use specialist tools to gather quantitative data
from various sources. For example, Google Analytics gathers data in real-time, allowing you to
see, at a glance, all the most important metrics for your website—such as traffic, number of page
views, and average session length.
• Environmental sensors: A sensor is a device which detects changes in the surrounding
environment and sends this information to another electronic device, usually a computer. This
information is converted into numbers, providing a continuous stream of quantitative data.
• Manipulation of pre-existing quantitative data: Researchers and analysts will also generate new
quantitative data by performing statistical analyses or calculations on existing data. For example,
if you have a spreadsheet containing data on the number of sales and expenditures in USD, you
could generate new quantitative data by calculating the overall profit margin.
• Secondary data
Quantitative data analysis
• How you analyse your quantitative data depends on the kind of data you’ve gathered
and the insights you want to uncover. Statistical analysis can be used to identify trends in
the data, to establish if there’s any kind of relationship between a set of variables (e.g.
does social media spend correlate with sales), to calculate probability in order to
accurately predict future outcomes, to understand how the data is distributed—and
much, much more.
• Some of the most popular methods used by data analysts include:
• Regression analysis
• Monte Carlo simulation
• Factor analysis
• Cohort analysis
• Cluster analysis
• Time series analysis
Qualitative data
• Unlike quantitative data, qualitative data cannot be measured or counted. It’s
descriptive, expressed in terms of language rather than numerical values.
• Researchers will often turn to qualitative data to answer “Why?” or “How?”
questions. For example, if your quantitative data tells you that a certain website
visitor abandoned their shopping cart three times in one week, you’d probably
want to investigate why—and this might involve collecting some form of
qualitative data from the user.
• Qualitative data also refers to the words or labels used to describe certain
characteristics or traits—for example, describing the sky as blue or labeling a
particular ice cream flavour as vanilla.
• Ex - My best friend has curly brown hair, They have green eyes, My best friend
drives a red car
Types of Qualitative data
• Nominal data is used to label or categorize certain variables without giving them any type of
quantitative value. For example, if you were collecting data about your target audience, you
might want to know where they live. Are they based in the UK, the USA, Asia, or Australia? Each
of these geographical classifications count as nominal data. Another simple example could be
the use of labels like “blue,” “brown,” and “green” to describe eye color.
• Ordinal data is when the categories used to classify your qualitative data fall into a natural order
or hierarchy. For example, if you wanted to explore customer satisfaction, you might ask each
customer to select whether their experience with your product was “poor,” “satisfactory,”
“good,” or “outstanding.” It’s clear that “outstanding” is better than “poor,” but there’s no way
of measuring or quantifying the “distance” between the two categories.
• Ex - Interview transcripts or audio recordings, The text included in an email or social media post,
Product reviews and customer testimonials, Observations and descriptions; e.g. “I noticed that
the teacher was wearing a red jumper.”
How is Qualitative data generated?
• Interviews are a great way to learn how people feel about any given topic—be it
their opinions on a new product or their experience using a particular service.
Conducting interviews will eventually provide you with interview transcripts
which can then be analysed.
• Surveys and questionnaires are also used to gather qualitative data. If you
wanted to collect demographic data about your target audience, you might ask
them to complete a survey where they either select their answers from a
number of different options, or write their responses as freeform text.
• Observations: You don’t necessarily have to actively engage with people in order
to gather qualitative data. Analysts will also look at “naturally occurring”
qualitative data, such as the feedback left in product reviews or what people say
in their social media posts.
Qualitative data analysis
• CODIFICATION- Coding is the formal representation of analytical thinking. Codes are tags or
labels for assigning units of meaning to the descriptive or inferential information compiled
during a study. Codes are usually assigned in the form of an alphabet or alphabets, words,
phrases or sentences.
CODE MEANING
Prog Programme
Org Organization
Ob Observation
P Participant
Qualitative data analysis
• Categorization - It is a process of identifying patterns in the data; recurring ideas, themes, perspectives
and descriptions that depict the social world you are studying. As you read through field notes or listen to
interview rapes, your task is to identify salient themes, recurring ideas and patterns of belief that help
you respond to your research questions
• Content analysis - Documentary data (textual data, records, interview transcripts, excerpts from people's
speech, case histories, field notes or diary, biographies and observation records) have always been central
to social science analysis but modes of analysing them vary within the social sciences. Content analysis is
concerned with the classification, organization and comparison of content of the document or
communication.
Qualitative data analysis
• There are three approaches that a researcher may adopt in content analysis. They are: (i) characteristics of
content, (ii) procedures or causes of content, and (iii) audience or effects of content.
• In the first approach, the researcher is interested primarily in the characteristics of the content itself. He I
she may focus either on the 'substantive nature' of the content (practical aspect) or upon the 'form' of
the content(work composition, technique etc). For instance, if you are content analysing a historical
writing, you may concentrate upon the substantive aspect of the writing. Whereas, if you are content
analysing any archival material, you may concentrate both on the substantive dimension and form of the
content.
• In the second approach, the researcher attempts to draw valid inferences about the nature of the
procedures of the content or the causes of the symbolic material from the characteristics of the material
itself. For instance, if you are content analysing a video recorded situation, you may concentrate upon the
procedures adopted by the participants to create the situation.
• In the third approach to content analysis, the researcher interprets the content so as to reveal something
about the nature of its 'audience' or its effects'. He I she takes the content material as the basis for
drawing inferences about the characteristics of the 'audience' for whom the material (content) is designed
or about the effects of communication which it brings about. For instance, if you are content analysing an
interview transcript of juvenile delinquents, you will be able to interpret from the data about the
probable causes that has led the children to be juvenile delinquents
Qualitative data analysis
• Are you taking on research? You may benefit from a mixed methods approach
to data collection.
Summarization
Types of data
• Primary data - are originated by a researcher for the specific purpose of addressing the problem at hand.
They are individually tailored for the decision-makers of organisations that pay for well-focused and
exclusive support.
• Secondary data are data that have already been collected for purposes other than the problem at hand.
• Internal data are those generated within the organisation for which the research is being conducted. This
information may be available in a ready to-use format, such as information routinely supplied by the
management decision support system. These are of two types: ready to use and requires further
processing.
• External data, on the other hand, are those generated by sources outside the organisation. These data may
exist in the form of published material, online databases, or information made available by syndicated
services.
Measurement and Scaling
• Ordinal scale - An ordinal scale is a ranking scale in which numbers are assigned to objects to indicate the
relative extent to which the objects possess some characteristic. An ordinal scale allows you to determine
whether an object has more or less of a characteristic than some other object, but not how much more or
less. Thus, an ordinal scale indicates relative position, not the magnitude of the differences between the
objects. Common examples of ordinal scales include quality rankings, rankings of teams in a tournament and
occupational status. Measurements of this type include ‘greater than’ or ‘less than’ judgements from the
respondents. For these reasons, in addition to the counting operation allowable for nominal scale data, ordinal
scales permit the use of statistics based on centiles. It is meaningful to calculate percentile, quartile, median,
rank-order correlation or other summary statistics from ordinal data.
Measurement and Scaling
• Interval scale - In an interval scale, numerically equal distances on the scale represent equal values in the
characteristic being measured. An interval scale contains all the information of an ordinal scale, but it also
allows you to compare the differences between objects. In interval scale, the difference between consecutive
points on the scale are equal over the entire scale but there is no zero point on it. The zero point (point of
reference) of the scale is chosen conventionally or arbitrarily. The scores on an intelligence test or an attitude
scale are also based on interval scales. They have no real zero point. To illustrate this concept, suppose a student
gets "zero" score in a test of mathematics, this does not mean that the student has no knowledge of
mathematics. The operations of addition and subtraction can be performed on interval scales. However, since
operations of multiplication and division assume the existence of an exact zero point, these operations cannot be
used with interval scales. Ex – temperature, SAT score(200-800)
• Ratio scale - A ratio scale possesses all the properties of the nominal, ordinal and interval scales, and, in
addition, an absolute zero point. Thus, in ratio scales we can identify or classify objects, rank the objects, and
compare intervals or differences. It is also meaningful to compute ratios of scale value. Common examples of
ratio scales include height, weight, age and money. All mathematical operations can be conducted in ratio scale.
• Note – Temp. doesn’t indicate Monday is twice as hot as Sunday, moreover if we change the scale from
Fahrenheit to Celsius the values change because zero is arbitrary.
Measurement scales
Scale Basic characteristics Common examples Descriptive statistics Inferential statistics
Nominal Number identify and Student registration number, number Counting, Percentage Chi-square, binomial test
classify objects on football player shirts mode
Ordinal Number indictae the Ranking of the top 4 team in football Percentile, quartile Rank order correlation, friedman
relative position of the world cup ,median, rank order ANOVA
objects but not the correlation
magnitude of the
difference between them.
Interval Difference between objects Temperature(celsius, fahrenheit) Range, mean ,std Product moment correlation, t-test,
can be compared; zero ANOVA,regression, factor analysis
point is arbitary
• There are two methods that are classified as 'descriptive statistics' and 'inferential statistics'.
Descriptive statistics are computed to describe the characteristics (attributes) of a sample or
population in totality and thus limit generalization to the particular group (sample). No
conclusions can be extended beyond this group whereas inferential statistical methods used to
draw generalizations beyond the sample with a known degree of accuracy.
• Descriptive Statistics
• Measures of Central Tendency or Averages
• Measures of Variability
Measures of Central Tendency or Averages
Mean and Median
• The Mean of a distribution is the arithmetic average. If x1,x2, x3, ......... xn are the N observations,
the formula for computing the Mean (X) is given
• MEAN = x1+x2+x3……..xn/n
• Median is the middle value of the dataset in which the dataset is arranged in the ascending order or
in descending order.
• Consider the given dataset with the odd number of observations arranged in descending order – 23,
21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2
• Here 12 is the middle or median number that has 6 values above it and 6 values below it.
• Now, find out the mean value for these two numbers, i.e.,(27+29)/2 =28
• Consider the given dataset 5, 4, 2, 3, 2, 1, 5, 4, 5. Since the mode represents the most common value. Hence,
the most frequently repeated value in the given dataset is 5.
Summary
Based on the properties of the data, the measures of central tendency are selected.
• If you have a symmetrical distribution of continuous data, all the three measures of central
tendency hold good. But most of the times, the analyst uses the mean because it involves all the
values in the distribution or dataset.
• If you have skewed distribution, the best measure of finding the central tendency is the median.
• If you have the original data, then both the median and mode are the best choice of measuring
the central tendency.
• If you have categorical data, the mode is the best choice to find the central tendency.
Measures of Variability
Data Set I: 40 38 42 40 39 39 43 40 39 40
R=xmax−xmin
where xmax is the largest measurement in the data set and xmin is the smallest.
• Solution:
• For Data Set I the maximum is 43 and the minimum is 38, so the range is R=43−38=5. For Data Set II the maximum is 47
and the minimum is 33, so the range is R=47−33=14.
• The range is a measure of variability because it indicates the size of the interval over which the data points are
distributed.
Variance and the Standard Deviation
• Standard deviation is the spread of a group of numbers from the mean. The variance measures the average degree to
which each point differs from the mean. Standard deviation is the square root of the variance and is expressed in the
same units as the data set. Standard deviation can be greater than the variance since the square root of a decimal is
larger (and not smaller) than the original number when the variance is less than one. The standard deviation is
smaller than the variance when the variance is more than one.
The sample variance of a set of n sample data is the number s2 defined by the formula
2
Σ(𝑥 − 𝑥 )2
𝑠 =
𝑛−1
1
Σ𝑥 2 − 𝑛 (Σ𝑥)2
𝑠2 =
𝑛−1
The sample standard deviation of a set of n sample data is the square root of the sample
variance, hence is the number s given by the formulas
1
Σ(𝑥 − 𝑥 )2 Σ𝑥 2 − 𝑛 (Σ𝑥)2
𝑠 = =
𝑛−1 𝑛−1
Properties of standard deviation
• Additivity - This means that the standard deviation of a sum of random
variables. This means that analysts or researchers using standard deviation
are comparing many data points, rather than drawing conclusions based on
only analysing single points of data, which leads to a higher degree of
accuracy.
• Scale invariance - This is particularly useful in comparing the variability of
datasets with different units of measurement. For example, if one dataset is
measured in inches and another in centimetres, their standard deviations
can still be compared directly without needing to convert units.
• Symmetry and non-negativity - This means a standard deviation is always
positive and symmetrically distributed around the mean.
Question on S.D and Var
• Q.1 Find the sample variance and the sample standard deviation
Data Set
46 37 40 33 42 36 40 47 34 45
II:
Solution:
• To use the defining formula (the first formula) in the definition we first compute for each observation x its
deviation x−x̅ from the sample mean. Since the mean of the data is x̅-=40, we obtain the ten numbers
displayed in the second line of the supplied table.
Question on S.D and Var
x−x̅ 6 -3 0 -7 2 -4 0 7 -6 5
s2 = 224/9 = 24.8
s= 24.8
s ~ 4.99
S.D QUESTION 1
A.1
S.D QUESTION 2
A.2
Normal distribution Curve
• Normal distribution, also known as the Gaussian distribution, is a probability distribution that is
symmetric about the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean. The normal distribution appears as a "bell curve" when graphed.
Where f(x) is the probablity, σ = standard deviation, σ2 = variance, μ = mean, x = value of the
variable
Normal distribution Curve
Properties of normal distribution
• The mean, median and mode are exactly the same.
• The distribution is symmetric about the mean—half the values fall below the mean and half
above the mean.
• The distribution can be described by two values: the mean and the standard deviation.
• In a standard normal distribution, the mean is zero and the standard deviation is 1. It has zero
skew and a kurtosis of 3.
What is the empirical rule formula?
• The empirical rule in statistics allows researchers to determine the proportion of values that fall within certain
distances from the mean. The empirical rule is often referred to as the three-sigma rule or the 68-95-99.7 rule.
• Around 68% of values are within 1 standard deviation from the mean.
• Around 95% of values are within 2 standard deviations from the mean.
• Around 99.7% of values are within 3 standard deviations from the mean.
• If the data values in a normal distribution are converted to standard score (z-score) in a standard normal distribution,
the empirical rule describes the percentage of the data that fall within specific numbers of standard deviations (σ) from
the mean (μ) for bell-shaped curves.
• 68% of data falls within the first standard deviation from the mean. This means there is a 68% probability of randomly
selecting a score between -1 and +1 standard deviations from the mean.
Normal distribution diagram for 68% data
Skewness
• Skewness measures the degree of symmetry of a distribution. The normal distribution is symmetric and has
a skewness of zero. If the distribution of a data set instead has a skewness less than zero, or negative skewness (left-
skewness), then the left tail of the distribution is longer than the right tail; positive skewness (right-skewness) implies
that the right tail of the distribution is longer than the left.
Kurtosis
• Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal
distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.
Skewness essentially measures the symmetry of the distribution, while kurtosis determines the heaviness of the
distribution tails. The kurtosis of a normal distribution equals 3.
• Mesokurtic has kurtosis of 3, Platykurtic has kurtosis of <3, Leptokurtic has kurtosis of >3.
Probability Question
• Q.`1 The breakdown of the student body in a local high school according to race and ethnicity is 51% white, 27% black,
11% Hispanic, 6% Asian, and 5% for all others. A student is randomly selected from this high school. (To select
“randomly” means that every student has the same chance of being selected.) Find the probabilities of the following
events:
• Solution:
• The experiment is the action of randomly selecting a student from the student population of the high school. An
obvious sample space is S = {w,b,h,a,o}. Since 51% of the students are white and all students have the same chance of
being selected, P(w)=0.51, and similarly for the other outcomes. This information is summarized in the following table:
Probability Question
b. Since M={b,h,a,o},
P(M)=P(b)+P(h)+P(a)+P(o)=0.27+0.11+0.06+0.05=0.49
a. Since N={w,h,a,o}
P(N)=P(w)+P(h)+P(a)+P(o)=0.51+0.11+0.06+0.05=0.73
Standard Normal Distribution
• The standard normal distribution, also called the z-distribution, is a special normal distribution where the mean is 0 and
the standard deviation is 1. All normal distributions, like the standard normal distribution, are unimodal and
symmetrically distributed with a bell-shaped curve. However, a normal distribution can take on any value as its mean
and standard deviation. In the standard normal distribution, the mean and standard deviation are always fixed. Any
normal distribution can be standardized by converting its values into z scores. Z scores tell you how many standard
deviations from the mean each value lies.
• Converting a normal distribution into the standard normal distribution allows you to:
• Compare scores on different distributions with different means and standard deviations.
• Find the probability of observations in a distribution falling above or below a given value.
• Find the probability that a sample mean significantly differs from a known population mean.
Z-score
• While data points are referred to as x in a normal distribution, they are
called z or z scores in the z distribution. A z score is a standard score that tells you
how many standard deviations away from the mean an individual value (x) lies:
• A positive z score means that your x value is greater than the mean.
• A negative z score means that your x value is less than the mean.
b. P(8<X<14).
Solution 1.
= P(Z< (14−102.5)/0.9452)
= P(Z<1.60)
= 0.9452
Question 1 Contd…
Q.2 The lifetimes of the tread of a certain automobile tire are normally
distributed with mean 37,500 miles and standard deviation 4,500 miles. Find
the probability that the tread life of a randomly selected tire will be between
30,000 and 40,000 miles.
Z-Score Answer 2
Solution 2:
• Let X denote the tread life of a randomly selected tire. To make the numbers easier to work
with we will choose thousands of miles as the units. Thus μ = 37.5, σ = 4.5, and the
problem is to compute P(30<X<40).
Q. 5. The length of courses from the beginning till the end of course
approximates a normal distribution with a mean of 266 days and a
standard deviation of 16 days. What proportion of all courses will last
between 240 and 270 days?
Z-score Question 6
• It relies on the personal judgement of the researcher rather than on chance to select
sample elements.
• The researcher can arbitrarily or consciously decide what elements to include in the
sample.
• Non-probability samples may yield good estimates of the population characteristics, but
they do not allow for objective evaluation of the precision of the sample results.
• Because there is no way of determining the probability of selecting any particular
element for inclusion in the sample, the estimates obtained are not statistically
projectable to the population.
• Commonly used non-probability sampling techniques include convenience sampling,
judgemental sampling, quota sampling and snowball sampling.
Sampling techniques
Probability sampling
Convenience sampling
• Convenience sampling attempts to obtain a sample of convenient elements. The selection
of sampling units is left primarily to the interviewer. Often, respondents are selected
because they happen to be in the right place at the right time.
• Examples of convenience sampling include: (1) use of students, church groups and
members of social organisations, (2) street interviews without qualifying the respondents,
(3) some forms of email and Internet survey, (4) tear-out questionnaires included in a
newspaper or magazine, and (5) journalists interviewing ‘people on the street.
• Convenience sampling is the least expensive and least time-consuming of all sampling
techniques. The sampling units are accessible, easy to measure and cooperative.
• Despite these advantages, this form of sampling has serious limitations. Many potential
sources of selection bias are present, including respondent self-selection.
Non-probability sampling techniques
Judgemental sampling
• Judgemental sampling is a form of convenience sampling in which the population
elements are selected based on the judgement of the researcher.
• The researcher, exercising judgement or expertise, chooses the elements to be
included in the sample because he or she believes that they are representative of
the population of interest or are otherwise appropriate.
• Common examples of judgemental sampling include: (1) test markets selected to
determine the potential of a new product, (2) purchase engineers selected in
industrial marketing research because they are considered to be representative
of the company, (3) product testing with individuals who may be particularly
fussy or who hold extremely high expectations, (4) expert witnesses used in
court, and (5) supermarkets selected to test a new merchandising display system.
• Judgemental sampling is subjective and its value depends entirely on the
researcher’s judgement, expertise and creativity.
Non-probability sampling techniques
Quota sampling
• Quota sampling may be viewed as two-stage restricted judgemental sampling that is used
extensively in street interviewing.
• The first stage consists of developing control characteristics, or quotas, of population
elements such as age or gender. To develop these quotas, the researcher lists relevant
control characteristics and determines the distribution of these characteristics in the
target population, such as Males 49%, Females 51% (resulting in 490 men and 510 women
being selected in a sample of 1,000 respondents). .
• In the second stage, sample elements are selected based on convenience or judgement.
Non-probability sampling techniques
Snowball sampling
• In snowball sampling, an initial group of respondents is selected, sometimes on a
random basis, but more typically targeted at a few individuals who are known to
possess the desired characteristics of the target population.
• After being interviewed, these respondents are asked to identify others who also
belong to the target population of interest. Subsequent respondents are selected
based on the referrals.
• By obtaining referrals from referrals, this process may be carried out in waves,
thus leading to a snowballing effect.
• The main objective of snowball sampling is to estimate characteristics that are rare
in the wider population. Examples include users of particular government or social
services, such as food stamps, whose names cannot be revealed; special census
groups, such as widowed males under 35; and members of a scattered minority
ethnic group
Diagram of Non-probability sampling
techniques
Probability sampling techniques
Stratified sampling
• Stratified sampling is a two-step process in which the population is partitioned into sub-
populations, or strata. The strata should be mutually exclusive and collectively exhaustive in that
every population element.
• Next, elements are selected from each stratum by a random procedure, usually SRS.
Technically, only SRS should be employed in selecting the elements from each stratum. The
variables used to partition the population into strata are referred to as stratification variables.
• The criteria for the selection of these variables consist of homogeneity, heterogeneity,
relatedness and cost. The elements within a stratum should be as homogeneous as possible, but
the elements in different strata should be as heterogeneous as possible.
• Example - The company has 800 female employees and 200 male employees. You want to
ensure that the sample reflects the gender balance of the company, so you sort the population
into two strata based on gender. Then you use random sampling on each group, selecting 80
women and 20 men, which gives you a representative sample of 100 people
Probability sampling techniques
Cluster sampling
• In cluster sampling, the target population is first divided into mutually exclusive
and collectively exhaustive sub-populations. These sub-populations or clusters are
assumed to contain the diversity of respondents held in the target population.
• A random sample of clusters is selected, based on a probability sampling
technique such as SRS.
• For each selected cluster, either all the elements are included in the sample or a
sample of elements is drawn probabilistically.
• If all the elements in each selected cluster are included in the sample, the
procedure is called one-stage cluster sampling. If a sample of elements is drawn
probabilistically from each selected cluster, the procedure is two-stage cluster
sampling. Cluster sampling can have two-stage or multi-stage sampling.
Example of cluster sampling
• The company has offices in 10 cities across the country (all with
roughly the same number of employees in similar roles). You
don’t have the capacity to travel to every office to collect your
data, so you use random sampling to select 3 offices – these are
your clusters.
Diagram of Sampling technique
Some statistical term
• Mutually exclusive is a statistical term describing two or more events
that cannot happen simultaneously. Ex- Flipping a coin
• Suppose random samples of size n are drawn from a population with mean μ and standard deviation σ. The
mean μX and standard deviation σX of the sample mean X̅ satisfy
We will write X̅ when the sample mean is thought of as a random variable, and write x for the values that it
takes
Q.1 The mean and standard deviation of the tax value of all vehicles registered in a certain state are μ =
$13,525 and σ = $4,180. Suppose random samples of size 100 are drawn from the population of vehicles. What
are the mean μX and standard deviation σX of the sample mean X̅?
Solution