0% found this document useful (0 votes)
11 views44 pages

1-DA (1).pptx

Data analytics is the process of analyzing raw data to extract meaningful insights that inform decision-making. It involves defining questions, collecting and cleaning data, analyzing it, and sharing results to address specific business problems. A data analyst utilizes various tools and methodologies to ensure data quality and relevance, ultimately replacing guesswork with data-driven insights.

Uploaded by

shahin siddiquei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views44 pages

1-DA (1).pptx

Data analytics is the process of analyzing raw data to extract meaningful insights that inform decision-making. It involves defining questions, collecting and cleaning data, analyzing it, and sharing results to address specific business problems. A data analyst utilizes various tools and methodologies to ensure data quality and relevance, ultimately replacing guesswork with data-driven insights.

Uploaded by

shahin siddiquei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

What is data analytics?

If one has to sum it up in a single sentence:


we’d say it’s the process of analyzing raw data in order to draw out
meaningful, actionable insights.
It’s a form of business intelligence, enabling companies and organizations
to make smart decisions based on what the data is telling them.

Data analytics encompasses the extraction (or collection) of raw data, the
preparation and subsequent analysis of that data, and storytelling—sharing
key insights from the data, using them to explain or predict certain scenarios
and outcomes, and to inform decisions, strategies, and next steps.
An example
Imagine you’re a data analyst working for a public transport network—think MTA in
New York City, or TFL in London. There’s a major sporting event coming up in the city,
and you know that people will be flying in from all over to attend.
In order to avoid absolute chaos, you need to adapt the usual public transport
schedule to accommodate for this influx of people and increase in travel throughout
the city. How do you plan ahead with accuracy?
You guessed it…data analytics! You analyze data from similar events that have
happened in the past and use it to predict the number, frequency, and types of
journeys that are likely to occur around this event. With these insights, you’re able
to ensure that public transportation continues to run smoothly
As you can see, data analytics replaces guesswork with data-driven insights. It helps
you make sense of the past and predict future trends and behaviors, leaving you
much better equipped to make smart decisions.
What does a data analyst do?
As a data analyst, it’s your job to turn raw data into meaningful insights.
Any kind of data analysis usually starts with a specific problem you want to solve, or
a question you need to answer—
for example:
“Why did we lose so many customers in the last quarter?” or
“Why are patients dropping out of their therapy programs at the halfway mark?”
To find the insights and answers you need, you’ll generally go through the following
steps:
Data Analysis Process
1. Defining the question
2. Collecting the data
3. Cleaning the data
4. Analyzing the data
5. Sharing your results
6. Embracing failure
7. Summary
1. Defining the question
The first step in any data analysis process is to define your objective. In data
analytics jargon, this is sometimes called the ‘problem statement’.

Defining your objective means coming up with a hypothesis and figuring how
to test it. Start by asking: What business problem am I trying to solve?
While this might sound straightforward, it can be trickier than it seems.

For instance, your organization’s senior management might pose an issue,


such as: “Why are we losing customers?” It’s possible, though, that this
doesn’t get to the core of the problem.

A data analyst’s job is to understand the business and its goals in enough
depth that they can frame the problem the right way.
Two interesting examples.
Let’s say you work for a fictional company called TopNotch Learning. TopNotch
creates custom training software for its clients. While it is excellent at securing
new clients, it has much lower repeat business. As such, your question might not
be, “Why are we losing customers?” but, “Which factors are negatively impacting
the customer experience?” or better yet: “How can we boost customer retention
(hold) while minimizing costs?”
Now you’ve defined a problem, you need to determine which sources of data will
best help you solve it. This is where your business acumen comes in again. For
instance, perhaps you’ve noticed that the sales process for new clients is very
slick, but that the production team is inefficient. Knowing this, you could
hypothesize that the sales process wins lots of new clients, but the subsequent
customer experience is lacking. Could this be why customers don’t come back?
Which sources of data will help you answer this question?
Tools to help define your objective
Defining your objective is mostly about soft skills, business knowledge, and
lateral thinking. But you’ll also need to keep track of business metrics and key
performance indicators (KPIs).

Monthly reports can allow you to track problem points in the business. Some
KPI dashboards come with a fee, like Databox and DashThis. However, you’ll
also find open-source software like Grafana, Freeboard, and Dashbuilder.

These are great for producing simple dashboards, both at the beginning and
the end of the data analysis process.
2. Collecting the data
Once you’ve established your objective, you’ll need to create a strategy for
collecting and aggregating the appropriate data. A key part of this is
determining which data you need.

This might be quantitative (numeric) data, e.g. sales figures, or qualitative


(descriptive) data, such as customer reviews.

All data fit into one of three categories:


🡪 first-party
🡪 second-party, and
🡪 third-party data
What is first-party data?
First-party data are data that you, or your company, have directly collected from customers.
It might come in the form of transactional tracking data or information from your
company’s customer relationship management (CRM) system. Whatever its source,
first-party data is usually structured and organized in a clear, defined way. Other sources of
first-party data might include customer satisfaction surveys, focus groups, interviews, or
direct observation.

What is second-party data?


To enrich your analysis, you might want to secure a secondary data source. Second-party
data is the first-party data of other organizations. This might be available directly from the
company or through a private marketplace. The main benefit of second-party data is that
they are usually structured, and although they will be less relevant than first-party data,
they also tend to be quite reliable. Examples of second-party data include website, app or
social media activity, like online purchase histories, or shipping data.ax
What is third-party data?
Third-party data is data that has been collected and aggregated from numerous
sources by a third-party organization. Often (though not always) third-party data
contains a vast amount of unstructured data points (big data).
Many organizations collect big data to create industry reports or to conduct market
research.
The research and advisory firm Gartner is a good real-world example of an
organization that collects big data and sells it on to other companies. Open data
repositories and government portals are also sources of third-party data.
Example.
https://journals.asm.org/list-data-repositories
https://data.gov/ [USA gov.]
https://careerfoundry.com/en/blog/data-analytics/where-to-find-free-datasets/
- Google Dataset Search
- Kaggle
- Datahub.io
- UCI Machine Learning Repository
- Earth Data (Bu NASA)
Tools to help you collect data
Once you’ve devised a data strategy (i.e. you’ve identified which data you
need, and how best to go about collecting them) there are many tools you
can use to help you. One thing you’ll need, regardless of industry or area of
expertise, is a data management platform (DMP).

A DMP is a piece of software that allows you to identify and aggregate data
from numerous sources, before manipulating them, segmenting them, and
so on.

There are many DMPs available. Some well-known enterprise DMPs include
Salesforce DMP, SAS, and the data integration platform, Xplenty. If you want
to play around, you can also try some open-source platforms like Pimcore or
D:Swarm.
3. Cleaning the data
Once you’ve collected your data, the next step is to get it ready for analysis.
This means cleaning, or ‘scrubbing’ it, and is crucial in making sure that you’re
working with high-quality data. Key data cleaning tasks include:
Removing major errors, duplicates, and outliers—all of which are inevitable
problems when aggregating data from numerous sources.
Removing unwanted data points—extracting irrelevant observations that
have no bearing on your intended analysis.
Bringing structure to your data—general ‘housekeeping’, i.e. fixing typos or
layout issues, which will help you map and manipulate your data more easily.
Filling in major gaps—as you’re tidying up, you might notice that important
data are missing. Once you’ve identified gaps, you can go about filling them.
Key benefits of data cleaning
Staying organized: Today’s businesses collect lots of information from clients,
customers, product users, and so on. These details include everything from
addresses and phone numbers to bank details and more. Cleaning this data
regularly means keeping it tidy. It can then be stored more effectively and
securely.
Avoiding mistakes: Dirty data doesn’t just cause problems for data analytics. It
also affects daily operations. For instance, marketing teams usually have a
customer database. If that database is in good order, they’ll have access to
helpful, accurate information. If it’s a mess, mistakes are bound to happen, such
as using the wrong name in personalized mail outs.
Improving productivity: Regularly cleaning and updating data means rogue
information is quickly eliminated (purged). This saves teams from having to
wade (walk) through old databases or documents to find what they’re looking
for.
Continue ………….

Avoiding unnecessary costs: Making business decisions with bad data can
lead to expensive mistakes. But bad data can incur costs in other ways too.
Simple things, like processing errors, can quickly snowball into bigger
problems. Regularly checking data allows you to detect blips sooner. This gives
you a chance to correct them before they require a more time-consuming
(and costly) fix.

Improved mapping: Increasingly, organizations are looking to improve their


internal data infrastructures. For this, they often hire data analysts to carry
out data modeling and to build new applications. Having clean data from the
start makes it far easier to collate and map, meaning that a solid data hygiene
plan is a sensible measure.
Data quality
Key to data cleaning is the concept of data quality. Data quality measures the objective
and subjective suitability of any dataset for its intended purpose.
There are a number of characteristics that affect the quality of data including accuracy,
completeness, consistency, timeliness, validity, and uniqueness.

Data validity
Validity is the degree to which a dataset conforms to a defined format or set of rules. These rules, or constraints,
are easy to enforce with modern data capture systems, e.g. online forms. Since forms are a common source of
data capture (one we’re all familiar with) let’s use them to highlight a few examples:
🡪 Data type: In an online form, values must match the data type, e.g. numbers > numerical, true/false > Boolean,
and so on.
🡪 Range: Data must fall within a particular range. Ever tried putting a false year of birth into a form (e.g. 1700)? It
will tell you this invalid because it falls outside of the accepted date range.
🡪 Mandatory data: It’s happened to us all. You hit submit and the form comes back at you with an angry, red
warning to say you can’t leave cell ‘X’ empty. This is mandatory data. In online forms, it includes things like email
addresses and customer ID numbers.
Data accuracy
Accuracy is a simple measure of whether your data are correct.
This could be anything from your date of birth to your bank balance, eye color, or geographical
location. Data accuracy is important for the obvious reason that if data are incorrect, they’ll hurt the
results of any analysis (and subsequent business decisions).
Unfortunately, it’s hard to measure accuracy since we can’t test it against existing ‘gold standard’
datasets.

Data completeness
Data completeness is how exhaustive a dataset is. In short, do you have all the necessary
information needed to complete your task?
Identifying an incomplete dataset isn’t always as easy as looking for empty cells. Let’s say you have
a database of customer contact details, missing half the surnames. If you wanted to list the
customers alphabetically, the dataset would be incomplete. But if your only aim was to analyze
customer dialing codes to determine geographical locators, surnames wouldn’t matter. Like data
accuracy, incomplete data are challenging to fix.
This is because it’s not always possible to infer missing data based on what you already have.
Data consistency
Data consistency refers to whether your data match information from other sources. This determines its
reliability.
For instance, if you work at a doctor’s surgery, you may find patients with two phone numbers or postal
addresses. The data here are inconsistent. It’s not always possible to return to the source, so determining
data consistency requires smart thinking. You may be able to infer which data are correct by looking at the
most recent entry, or by determining reliability in some other way.

Data uniformity
Data uniformity looks at units of measure, metrics, and so on.
For instance, imagine you’re combining two datasets on people’s weight. One dataset uses the metric
system, the other, imperial. For the data to be of any use during analysis, all the measurements must be
uniform, i.e. all in kilograms or all in pounds. This means converting it all to a single unit. Luckily, this aspect
of data quality is easier to manage. It doesn’t mean filling in gaps or determining accuracy…phew!

Data relevance
Data relevance is a more subjective measure of data quality. It looks at whether data is sufficiently complete,
uniform, consistent (and so on) to fulfill its given task.
Another aspect of data relevance, though, is timeliness. Is the data available when you need it? Is it accessible
to everyone who requires it? For instance, if you’re reporting to the Board with quarterly profits and losses,
you need the most up-to-date information. With only the previous quarter’s figures, you’ll have
lower-quality data and can therefore only offer lower quality insights.
How to clean
1: Get rid of unwanted observations
The first stage in any data cleaning process is to remove the observations (or data points) you don’t
want. This includes irrelevant observations, i.e. those that don’t fit the problem you’re looking to
solve.
For instance, if we were running an analysis on vegetarian eating habits, we could remove any
meat-related observations from our data set.

2: Fix structural errors


Structural errors usually emerge as a result of poor data housekeeping. They include things like typos
and inconsistent capitalization, which often occur during manual data entry.
Let’s say you have a dataset covering the properties of different metals. ‘Iron’ (uppercase) and ‘iron’
(lowercase) may appear as separate classes (or categories). Ensuring that capitalization is consistent
makes that data much cleaner and easier to use. You should also check for mislabeled categories.
For instance, ‘Iron’ and ‘Fe’ (iron’s chemical symbol) might be labeled as separate classes, even though
they’re the same. Other things to look out for are the use of underscores, dashes, and other rogue
punctuation!
3: Standardize your data
Standardizing your data is closely related to fixing structural errors, but it takes it a step further.
Correcting typos is important, but you also need to ensure that every cell type follows the same rules.
For instance, you should decide whether values should be all lowercase or all uppercase, and keep this
consistent throughout your dataset. Standardizing also means ensuring that things like numerical data
use the same unit of measurement.
As an example, combining miles and kilometers in the same dataset will cause problems. Even dates
have different conventions, with the US putting the month before the day, and Europe putting the
day before the month. Keep your eyes peeled; you’ll be surprised what slips through.
4: Remove unwanted outliers
Outliers are data points that dramatically differ from others in the set. They can cause problems with
certain types of data models and analysis.
For instance, while decision tree algorithms are generally accepted to be quite robust to outliers,
outliers can easily skew a linear regression model. While outliers can affect the results of an analysis,
you should always approach removing them with caution.
Only remove an outlier if you can prove that it is erroneous, e.g. if it is obviously due to incorrect data
entry, or if it doesn’t match a comparison ‘gold standard’ dataset.
What is Outlier?
An outlier is a single data point that goes far outside the average value of a
group of statistics. Outliers may be exceptions that stand outside individual
samples of populations as well. In a more general context, an outlier is an
individual that is markedly different from the norm in some respect.

An outlier, in statistics, can be defined


as a value that is distant from the
majority of the values in a data set.
The image below shows how an
outlier can be identified in a data set
that represents the test results, out of
100, for a group of ten students.
5: Fix contradictory data errors
Contradictory (or cross-set) data errors are another common problem to look out for.
Contradictory errors are where you have a full record containing inconsistent or incompatible data.
An example could be a log of athlete racing times. If the column showing the total amount of
time spent running isn’t equal to the sum of each racetime, you’ve got a cross-set error.
Another example might be a pupil’s grade score being associated with a field that only allows
options for ‘pass’ and ‘fail’, or an employee’s taxes being greater than their total salary.
6: Type conversion and syntax errors
Once you’ve tackled other inconsistencies, the content of your spreadsheet or dataset might look
good to go.
However, you need to check that everything is in order behind the scenes, too. Type conversion
refers to the categories of data that you have in your dataset.
A simple example is that numbers are numerical data, whereas currency uses a currency value.
You should ensure that numbers are appropriately stored as numerical data, text as text input,
dates as objects, and so on.
7: Deal with missing data
When data is missing, what do you do? There are three common approaches to this problem.
The first is to remove the entries associated with the missing data.
The second is to impute (or guess) the missing data, based on other, similar data.
The third option (and often the best one) is to flag the data as missing. To do this, ensure that
empty fields have the same value, e.g. ‘missing’ or ‘0’ (if it’s a numerical field). Then, when you
carry out your analysis, you’ll at least be taking into account that data is missing, which in itself
can be informative.
8: Validate your dataset
Once you’ve cleaned your dataset, the final step is to validate it. Validating data means checking
that the process of making corrections, deduping, standardizing (and so on) is complete.
This all sounds a bit technical, but all you really need to know at this stage is that validation means
checking the data is ready for analysis. If there are still errors (which there usually will be) you’ll
need to go back and fix them…there’s a reason why data analysts spend so much of their time
cleaning data!
Tools to help you clean your data
Cleaning datasets manually—especially large ones—can be daunting. Luckily,
there are many tools available to streamline the process. Open-source tools,
such as OpenRefine, are excellent for basic data cleaning, as well as high-level
exploration.

However, free tools offer limited functionality for very large datasets. Python
libraries (e.g. Pandas) and some R packages are better suited for heavy data
scrubbing. You will, of course, need to be familiar with the languages.
Alternatively, enterprise tools are also available.

For example, Data Ladder, which is one of the highest-rated data-matching


tools in the industry. There are many more. Why not see which free data
cleaning tools you can find to play around with?
Carrying out an exploratory analysis
Another thing many data analysts do (alongside cleaning data) is to carry out
an exploratory analysis. This helps identify initial trends and characteristics,
and can even refine your hypothesis.
Let’s use the fictional learning company as an example again:
Carrying out an exploratory analysis, perhaps you notice a correlation
between how much TopNotch Learning’s clients pay and how quickly they
move on to new suppliers.
This might suggest that a low-quality customer experience (the assumption in
your initial hypothesis) is actually less of an issue than cost. You might,
therefore, take this into account.
4. Step four: Analyzing the data
Finally, you’ve cleaned your data. Now comes the fun bit—analyzing it! The
type of data analysis you carry out largely depends on what your goal is. But
there are many techniques available. Univariate or bivariate analysis,
time-series analysis, and regression analysis are just a few you might have
heard of.
More important than the different types, though, is how you apply them. This
depends on what insights you’re hoping to gain. Broadly speaking, all types of
data analysis fit into one of the following four categories.
Descriptive analysis
Descriptive analysis identifies what has already happened. It is a common first
step that companies carry out before proceeding with deeper explorations.
As an example, let’s refer back to our fictional learning provider once more.
TopNotch Learning might use descriptive analytics to analyze course
completion rates for their customers. Or they might identify how many users
access their products during a particular period.
The following kinds of data can all be summarized using descriptive analytics:
🡪 Financial statements
🡪 Surveys
🡪 Social media engagement
🡪 Website traffic
🡪 Scientific findings
🡪 Weather reports
🡪 Traffic data
Google Analytics is a good example of descriptive analytics in action; it provides a
simple overview of what’s been going on with your website, showing you how
many people visited in a given time period, for example, or where your visitors
came from.
Similarly, tools like HubSpot will show you how many people opened a particular
email or engaged with a certain campaign.
There are two main techniques used in
descriptive analytics: Data aggregation and
data mining.
Data aggregation
Data aggregation is the process of gathering
data and presenting it in a summarized format.
Data mining
Data mining is the analysis part. This is when
the analyst explores the data in order to
uncover any patterns or trends.
Diagnostic analysis
Diagnostic analytics focuses on understanding why something has happened.
It is literally the diagnosis of a problem, just as a doctor uses a patient’s
symptoms to diagnose a disease.
Remember TopNotch Learning’s business problem? ‘Which factors are
negatively impacting the customer experience?’ A diagnostic analysis would
help answer this.
For instance, it could help the company draw correlations between the issue
(struggling to gain repeat business) and factors that might be causing it (e.g.
project costs, speed of delivery, customer sector, etc.)
Let’s imagine that, using diagnostic analytics, TopNotch realizes its clients in
the retail sector are departing at a faster rate than other clients. This might
suggest that they’re losing customers because they lack expertise in this sector.
And that’s a useful insight!
Predictive analysis
Predictive analysis allows you to identify future trends based on historical data. In
business, predictive analysis is commonly used to forecast future growth.
Predictive analysis has grown increasingly sophisticated in recent
years. The speedy evolution of machine learning allows organizations to make
surprisingly accurate forecasts.
Take the insurance industry.
Insurance providers commonly use past data to predict which customer groups
are more likely to get into accidents. As a result, they’ll hike up customer
insurance premiums for those groups.
Likewise, the retail industry often uses transaction data to predict where future
trends lie, or to determine seasonal buying habits to inform their strategies.
These are just a few simple examples, but the untapped potential of predictive
analysis is pretty compelling.
Prescriptive analysis
Prescriptive analysis allows you to make recommendations for the future.
This is the final step in the analytics part of the process. It’s also the most
complex.
This is because it incorporates aspects of all the other analyses we’ve
described.
A great example of prescriptive analytics is the algorithms that guide
Google’s self-driving cars. Every second, these algorithms make countless
decisions based on past and present data, ensuring a smooth, safe ride.
Prescriptive analytics also helps companies decide on new products or areas
of business to invest in.
So: Prescriptive analytics looks at what has happened, why it happened, and
what might happen in order to determine the best course of action for the
future.
5. Sharing your results
You’ve finished carrying out your analyses. You have your insights. The
final step of the data analytics process is to share these insights with the
wider world (or at least with your organization’s stakeholders!)
This is more complex than simply sharing the raw results of your work—it
involves interpreting the outcomes, and presenting them in a manner that’s
digestible for all types of audiences.
Since you’ll often present information to decision-makers, it’s very important
that the insights you present are 100% clear and unambiguous. For this
reason, data analysts commonly use reports, dashboards, and interactive
visualizations to support their findings.
Tools for interpreting and sharing your findings
There are tons of data visualization tools available, suited to different
experience levels.

Popular tools requiring little or no coding skills include Google Charts,


Tableau, Datawrapper, and Infogram.

If you’re familiar with Python and R, there are also many data visualization
libraries and packages available. For instance, check out the Python
libraries Plotly, Seaborn, and Matplotlib. Whichever data visualization tools
you use, make sure you polish up your presentation skills, too. Remember:
Visualization is great, but communication is key!
Basics Statistics
Basics Statistics
Frequency Distribution.
A frequency distribution is a representation, either in a graphical or tabular
format, that displays the number of observations within a given interval.
The frequency is how often a value occurs in an interval, while the
distribution is the pattern of frequency of the variable.
Data presentation is the process of visually representing data sets to convey
information effectively to an audience. In an era where the amount of data
generated is vast, visually presenting data using methods such as diagrams,
graphs, and charts has become crucial.
There are two types such as: Graphical and Numerical.
Following one is the numerical example:

Some key graphical tools name:


- Bar diagram
- Pie chart
- Histogram (self study: diff. between
bar and histogram)
Central Tendency
Central tendency is defined as “the statistical measure that identifies a
single value as representative of an entire distribution.”
It aims to provide an accurate description of the entire data. It is the
single value that is most typical/representative of the collected data.
Dispersion is the state of getting dispersed or spread. Statistical dispersion
means the extent to which numerical data is likely to vary about an average
value. In other words, dispersion helps to understand the distribution of
the data.
Measures of Dispersion
In statistics, the measures of dispersion help to interpret the variability
(inconsistency) of data i.e. to know how much homogenous or
heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.
Types of Measures of Dispersion
There are two main types of dispersion methods in statistics which are:
🡪 Absolute Measure of Dispersion
🡪 Relative Measure of Dispersion
Absolute Measure of Dispersion
An absolute measure of dispersion contains the same unit as the original data set. The absolute
dispersion method expresses the variations in terms of the average of deviations of observations
like standard or means deviations. It includes range, standard deviation, quartile deviation, etc.
The types of absolute measures of dispersion are:
1.Range: It is simply the difference between the maximum value and the minimum value given in a
data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2.Variance: Deduct the mean from each data in the set, square each of them and add each square
and finally divide them by the total no of values in the data set to get the variance. Variance (σ2)
= ∑(X−μ)2/N
3.Standard Deviation: The square root of the variance is known as the standard deviation i.e. S.D. =
√σ.
4.Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers into
quarters. The quartile deviation is half of the distance between the third and the first quartile.
5.Mean and Mean Deviation: The average of numbers is known as the mean and the arithmetic
mean of the absolute deviations of the observations from a measure of central tendency is
known as the mean deviation (also called mean absolute deviation).
Relative Measure of Dispersion
The relative measures of dispersion are used to compare the distribution of
two or more data sets. This measure compares values without units.
Common relative dispersion methods include:
1. Co-efficient of Range
2. Co-efficient of Variation
3. Co-efficient of Standard Deviation
4. Co-efficient of Quartile Deviation
5. Co-efficient of Mean Deviation
Arithmetic Mean Formula
Sum of all of the numbers of a group, when divided by the number of
items in that list is known as the Arithmetic Mean or Mean of the
group. For example, the mean of the numbers 5, 7, 9 is 4 since 5 + 7 +
9 = 21 and 21 divided by 3 [there are three numbers] is 7.

Q1. The marks obtained by 6 students in a class test are 20, 22, 24,
26, 28, 30. Find the mean.
Q2. If the arithmetic mean of 14 observations 26, 12, 14, 15, x, 17, 9, 11, 18,
16, 28, 20, 22, 8 is 17. Find the missing observation.
Given 14 observations are: 26, 12, 14, 15, x, 17, 9, 11, 18, 16, 28, 20, 22, 8
Arithmetic mean = 17
We know that,
Arithmetic mean = Sum of observations/Total number of observations
Hence,
17 = (216 + x)/14
17 x 14 = 216 + x
216 + x = 238
x = 238 – 216
x = 22
Therefore, the missing observation is 22.
Q3. Find the Variance and Standard deviation of the following numbers: 1, 3, 5, 5,
6, 7, 9, 10.
The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8 = 5.75
Step 1: Subtract the mean value from individual value
(1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75), (7 – 5.75), (9 – 5.75), (10 –
5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
Step 2: Squaring the above values
we get, 22.563, 7.563, 0.563, 0.563, 0.063, 1.563, 10.563, 18.063
Step 3: 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
Step 4: n = 8, therefore variance (σ2) = 61.504/ 8 = 7.69
Now, Standard deviation (σ) = 2.77
Q4. Calculate the range and coefficient of range for the following data values:
45, 55, 63, 76, 67, 84, 75, 48, 62, 65
Let Xi values be: 45, 55, 63, 76, 67, 84, 75, 48, 62, 65
Here, Maxium value (Xmax) = 84 and Minimum or Least value (Xmin) = 45
Range = Maximum value - Minimum value = 84 – 45 = 39
Coefficient of range = (Xmax – Xmin)/(Xmax + Xmin)
= (84 – 45)/(84 + 45)
= 39/129
= 0.302 (approx)
Q5. Find the median, lower quartile, upper quartile and inter-quartile range of the
following data set of scores: 19, 21, 23, 20, 23, 27, 25, 24, 31 ?
First, lets arrange of the values in an ascending order:
19, 20, 21, 23, 23, 24, 25, 27, 31
Now let’s calculate the Median:

Lower Quartile:
Average of 2nd and 3rd terms
= (20 + 21)/2 = 20.5 = Lower Quartile
Upper Quartile:
Average of 7th and 8th terms
= (25 + 27)/2 = 26 = Upper Quartile
IQR = Upper quartile – Lower quartile
= 26 – 20.5
= 5.5

You might also like