0% found this document useful (0 votes)
39 views29 pages

Descriptive Statistics

Descriptive statistics are used to summarize and describe data. Common descriptive statistics include measures of central tendency like the mean, median, and mode as well as measures of spread like standard deviation, interquartile range, and range. Descriptive statistics help characterize data through tables, graphs, and numerical summaries.

Uploaded by

መለክ ሓራ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views29 pages

Descriptive Statistics

Descriptive statistics are used to summarize and describe data. Common descriptive statistics include measures of central tendency like the mean, median, and mode as well as measures of spread like standard deviation, interquartile range, and range. Descriptive statistics help characterize data through tables, graphs, and numerical summaries.

Uploaded by

መለክ ሓራ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

2.

Descriptive statistics
Descriptive statistics are used for the measurement of the average or
the standard deviation that aids in judging the data in a descriptive
statistics manner. 

It can be taken as the most interesting technique to obtain the


different data in columns or levels of the parameters. Descriptive
statistics provides an idea of the differences or similarities between the
gathered data.

It can be used to characterize the collected data using tables, graphs,


and numerical conclusions.

Measure of Location

Mean: Average of the Median: Center point of the collected Mode: The most occurring value
given information. data. points.

Measure of Spread

Standard deviation:
Interquartile Range: The difference Range: It is the difference value
Deviation of the
between the 75% and 25% of the of the largest and smallest
collected information in
collected data. values.
an experiment.

 Frequency: It is the proportion of the given data values, which


is of a single variable from various variables.
 Outliers: It is known as the extreme of the data points.

Once the data is explored, one needs to recognize which technique


should be used to judge the data that aids in detail, visualize the
analysis, and make the necessary summaries about the collected
data. 
There are several statistical techniques that are used to deal with
various kinds of experimental data and evaluate the required
relationship between the given data. 

It can be used to draw a summary of the population standard, which is


based on the sample values:

 Confidence Intervals: It is the combination of the standard


error and sample statistics to predict the larger population
parameters.
 Standard Error: It is the uncertainty of the sample average.
 Statistical Tests: These tests are used to quantify the
connection between comparisons.
 A statistical test can be performed depending on the number of
comparisons, variable type, and the given population’s
underlying distribution.
 It is used for the comparisons which are between the two or
more paired or independent groups.
 The given population’s distribution can be non-parametric (no
supposed distribution) or parametric (normally distributed).
 Types of statistical tests: z-test, chi-square, regression, t-test, f-
test, ANOVA, correction, and much more.

Study Type

 Observational study
Observation of the existing condition and analysis inferences.

 Case-control: It is used to study the existing set of group


dissimilarities on the result, such as w/o vs patients with the
disease.
 Cross-sectional: It is the study of the experimental patients one
point at a time.
 Cohort: it is used to study the instruction or step of the group of
the same people who are different on certain parameters to
check the effect of these factors on the result of the interest.
 Experimental
The analysts randomly assign the task to the people for treating the
groups.
 Randomization: These are the methods that are used for
selecting the samples of the specific constant variables across the
standardization (groups) so that the real effect can be examined.
 Placebo: It is the treatment given to a set of the group that
does not have therapeutic effects.
 Blinding: It is the assignment for the treatment which is
unknown for the doctor, patients, or both.
 Hypothesis
It is the detailed prediction of the scientific questions which are tested:

 Null hypothesis: In the null hypothesis, there is no relation


between the set of groups.
 Alternative hypothesis: In this, there is a relation between the
set of groups.
 P-value: The probability of the tests showed the difference
between the comparisons, supposing the null function as true.

Population – All objects, individuals, or measures whose characteristics


have been studied.

Variable- A feature of interest related to a specific object or person in a


given population.

Sample- It is the subset of the studied population.

Data- A set of outcomes (a set of possible observations ); that can be


separated into two groups: quantitative (a trait that is indicated by the
series of a number) or qualitative (a trait that is indicated using a label).

Parameter- A number used to describe a characteristic of the population,


which is not determined easily.

Statistic- A sample’s numerical characteristic; a statistic that measures a


similar population parameter.

Probability- A number that lies between zero & one, including a specific
event that will occur.

Sample sizes justifications

The technical terminologies of statistics are used to make sure that


there must be enough experiment to search a statistics difference
between the set of the group when they are biologically different. 

 Significance level (α): It has the threshold where the null


hypothesis can be rejected. Standard values of α consist of 0.05,
0.01, 0.001.
 If the value of p is greater than α, then the test fails in the
category of the rejected null hypothesis.
 If the value of p is equal to or less than α, then the null
hypothesis can be rejected.
 Effect size: It is used to check the difference between the
comparison values.
 Power: It is the ability to detect the difference between the truly
existed values.

Nominal Scale: 1st Level of Measurement


Nominal Scale, also called the categorical variable scale, is defined as a scale
used for labeling variables into distinct classifications and doesn’t involve a
quantitative value or order. This scale is the simplest of the four variable
measurement scales. Calculations done on these variables will be futile as
there is no numerical value of the options.

There are cases where this scale is used for the purpose of classification –
the numbers associated with variables of this scale are only tags for
categorization or division. Calculations done on these numbers will be futile as
they have no quantitative significance.

For a question such as:


Where do you live?

 1- Suburbs
 2- City
 3- Town
Nominal scale is often used in research surveys and questionnaires where
only variable labels hold significance.

For instance, a customer survey asking “Which brand of smartphones do you


prefer?” Options : “Apple”- 1 , “Samsung”-2, “OnePlus”-3.

 In this survey question, only the names of the brands are significant for the
researcher conducting consumer research. There is no need for any specific
order for these brands. However, while capturing nominal data, researchers
conduct analysis based on the associated labels.
 In the above example, when a survey respondent selects Apple as their
preferred brand, the data entered and associated will be “1”. This helped in
quantifying and answering the final question – How many respondents
selected Apple, how many selected Samsung, and how many went for
OnePlus – and which one is the highest.
 This is the fundamental of quantitative research, and nominal scale is the
most fundamental research scale.

Nominal Scale Data and Analysis


There are two primary ways in which nominal scale data can be collected:

1. By asking an open-ended question, the answers of which can be coded to a


respective number of label decided by the researcher.
2. The other alternative to collect nominal data is to include a multiple choice
question in which the answers will be labeled.
In both cases, the analysis of gathered data will happen using percentages or
mode,i.e., the most common answer received for the question. It is possible
for a single question to have more than one mode as it is possible for two
common favorites can exist in a target population.  

Nominal Scale Examples


 Gender
 Political preferences
 Place of residence
What is your Gender? What is your Political preference? Where do you live?
 1- Independent  1- Suburbs
 M- Male
 2- Democrat  2- City
 F- Female
 3- Republican  3- Town

Nominal Scale SPSS


In SPSS, you can specify the level of measurement as scale (numeric data on
an interval or ratio scale), ordinal, or nominal. Nominal and ordinal data can
be either string alphanumeric or numeric.

Upon importing the data for any variable into the SPSS input file, it takes it as
a scale variable by default since the data essentially contains numeric values.
It is important to change it to either nominal or ordinal or keep it as scale
depending on the variable the data represents.

Ordinal Scale: 2nd Level of Measurement


Ordinal Scale is defined as a variable measurement scale used to simply
depict the order of variables and not the difference between each of the
variables. These scales are generally used to depict non-mathematical ideas
such as frequency, satisfaction, happiness, a degree of pain, etc. It is quite
straightforward to remember the implementation of this scale as ‘Ordinal’
sounds similar to ‘Order’, which is exactly the purpose of this scale.

Ordinal Scale maintains descriptional qualities along with an intrinsic order but
is void of an origin of scale and thus, the distance between variables can’t be
calculated. Descriptional qualities indicate tagging properties similar to the
nominal scale, in addition to which, the ordinal scale also has shows a relative
position of variables. Origin of this scale is absent due to which there is no
fixed start or “true zero”.

Ordinal Data and Analysis  


Ordinal scale data can be presented in tabular or graphical formats for a
researcher to conduct a convenient analysis of collected data. Also, methods
such as Mann-Whitney U test and Kruskal–Wallis H test can also be used to
analyze ordinal data. These methods are generally implemented to compare
two or more ordinal groups.

In the Mann-Whitney U test, researchers can conclude which variable of one


group is bigger or smaller than another variable of a randomly selected group.
While in the Kruskal–Wallis H test, researchers can analyze whether two or
more ordinal groups have the same median or not.

Learn about: Nominal vs. Ordinal Scale

Ordinal Scale Examples


Status at workplace, tournament team rankings, order of product quality, and
order of agreement or satisfaction are some of the most common examples of
the ordinal Scale. These scales are generally used in market research to
gather and evaluate relative feedback about product satisfaction, changing
perceptions with product upgrades, etc.

For example, a semantic differential scale question such as:

How satisfied are you with our services?

 Very Unsatisfied – 1
 Unsatisfied – 2
 Neutral – 3
 Satisfied – 4
 Very Satisfied – 5
1. Here, the order of variables is of prime importance and so is the labeling.
Very unsatisfied will always be worse than unsatisfied and satisfied will be
worse than very satisfied.
2. This is where ordinal scale is a step above nominal scale – the order is
relevant to the results and so is their naming.
3. Analyzing results based on the order along with the name becomes a
convenient process for the researcher.
4. If they intend to obtain more information than what they would collect using
a nominal scale, they can use the ordinal scale.
This scale not only assigns values to the variables but also measures the rank
or order of the variables, such as:

 Grades
 Satisfaction
 Happiness
How satisfied are you with our services?

 1- Very Unsatisfied
 2- Unsatisfied
 3- Neural
 4- Satisfied
 5- Very Satisfied

Interval Scale: 3rd Level of Measurement


Interval Scale is defined as a numerical scale where the order of the variables
is known as well as the difference between these variables. Variables that
have familiar, constant, and computable differences are classified using the
Interval scale. It is easy to remember the primary role of this scale too,
‘Interval’ indicates ‘distance between two entities’, which is what Interval scale
helps in achieving.  

These scales are effective as they open doors for the statistical analysis of
provided data. Mean, median, or mode can be used to calculate the central
tendency in this scale. The only drawback of this scale is that there no pre-
decided starting point or a true zero value.

Interval scale contains all the properties of the ordinal scale, in addition to
which, it offers a calculation of the difference between variables. The main
characteristic of this scale is the equidistant difference between objects.  

For instance, consider a Celsius/Fahrenheit temperature scale –

 80 degrees is always higher than 50 degrees and the difference between


these two temperatures is the same as the difference between 70 degrees
and 40 degrees.
 Also, the value of 0 is arbitrary because negative values of temperature do
exist – which makes the Celsius/Fahrenheit temperature scale a classic
example of an interval scale.
 Interval scale is often chosen in research cases where the difference
between variables is a mandate – which can’t be achieved using a nominal
or ordinal scale. The Interval scale quantifies the difference between two
variables whereas the other two scales are solely capable of associating
qualitative values with variables.
 The mean and median values in an ordinal scale can be evaluated, unlike
the previous two scales.
 In statistics, interval scale is frequently used as a numerical value can not
only be assigned to variables but calculation on the basis of those values
can also be carried out.
Even if interval scales are amazing, they do not calculate the “true zero” value
which is why the next scale comes into the picture.

Interval Data and Analysis


All the techniques applicable to nominal and ordinal data analysis are
applicable to Interval Data as well. Apart from those techniques, there are a
few analysis methods such as descriptive statistics, correlation regression
analysis which is extensively for analyzing interval data.

Descriptive statistics is the term given to the analysis of numerical data which
helps to describe, depict, or summarize data in a meaningful manner and it
helps in calculation of mean, median, and mode.

Interval Scale Examples


 There are situations where attitude scales are considered to be interval
scales.
 Apart from the temperature scale, time is also a very common example of
an interval scale as the values are already established, constant, and
measurable.
 Calendar years and time also fall under this category of measurement
scales.
 Likert scale, Net Promoter Score, Semantic Differential Scale, Bipolar Matrix
Table, etc. are the most-used interval scale examples.
The following questions fall under the Interval Scale category:

 What is your family income?


 What is the temperature in your city?

Ratio Scale: 4th Level of Measurement


Ratio Scale is defined as a variable measurement scale that not only
produces the order of variables but also makes the difference between
variables known along with information on the value of true zero. It is
calculated by assuming that the variables have an option for zero, the
difference between the two variables is the same and there is a specific order
between the options.

With the option of true zero, varied inferential, and descriptive analysis
techniques can be applied to the variables. In addition to the fact that the ratio
scale does everything that a nominal, ordinal, and interval scale can do, it can
also establish the value of absolute zero. The best examples of ratio scales
are weight and height. In market research, a ratio scale is used to calculate
market share, annual sales, the price of an upcoming product, the number of
consumers, etc.

 Ratio scale provides the most detailed information as researchers and


statisticians can calculate the central tendency using statistical techniques
such as mean, median, mode, and methods such as geometric mean, the
coefficient of variation, or harmonic mean can also be used on this scale.
 Ratio scale accommodates the characteristic of three other variable
measurement scales, i.e. labeling the variables, the significance of the order
of variables, and a calculable difference between variables (which are
usually equidistant).
 Because of the existence of true zero value, the ratio scale doesn’t have
negative values.
 To decide when to use a ratio scale, the researcher must observe whether
the variables have all the characteristics of an interval scale along with the
presence of the absolute zero value.
 Mean, mode and median can be calculated using the ratio scale.

Ratio Data and Analysis


At a fundamental level, Ratio scale data is quantitative in nature due to which
all quantitative analysis techniques such as SWOT, TURF, Cross-
tabulation, Conjoint, etc. can be used to calculate ratio data. While some
techniques such as SWOT and TURF will analyze ratio data in such as
manner that researchers can create roadmaps of how to improve products or
services and Cross-tabulation will be useful in understanding whether new
features will be helpful to the target market or not.

 Ratio Scale Examples


The following questions fall under the Ratio Scale category:

 What is your daughter’s current height?


 Less than 5 feet.
 5 feet 1 inch – 5 feet 5 inches
 5 feet 6 inches- 6 feet
 More than 6 feet
 What is your weight in kilograms?
 Less than 50 kilograms
 51- 70 kilograms
 71- 90 kilograms
 91-110 kilograms
 More than 110 kilograms

Summary –  Levels of Measurement


The four data measurement scales – nominal, ordinal, interval, and ratio – 
are quite often discussed in academic teaching. Below easy-to-remember
chart might help you in your statistics test.

Offers: Nominal Ordinal Interval


The sequence of variables is established – Yes Yes
Mode Yes Yes Yes
Median – Yes Yes
Mean – – Yes
Difference between variables can be evaluated – – Yes
Addition and Subtraction of variables – – Yes
Multiplication and Division of variables – – –
Absolute zero – – –

 Quantitative data is numbers-based, countable, or measurable. Qualitative


data is interpretation-based, descriptive, and relating to language.
 Quantitative data tells us how many, how much, or how often in
calculations. Qualitative data can help us to understand why, how, or what
happened behind certain behaviors.
 Quantitative data is fixed and universal. Qualitative data is subjective and
unique.
 Quantitative research methods are measuring and counting. Qualitative
research methods are interviewing and observing.
 Quantitative data is analyzed using statistical analysis. Qualitative data is
analyzed by grouping the data into categories and themes.

What are the advantages and disadvantages of


quantitative data?
Each type of data set has its own pros and cons. 

Advantages of quantitative data

 It’s relatively quick and easy to collect and it’s easier to draw conclusions from. 

 When you collect quantitative data, the type of results will tell you which
statistical tests are appropriate to use. 

 As a result, interpreting your data and presenting those findings is straightforward


and less open to error and subjectivity.

Another advantage is that you can replicate it. Replicating a study is possible because
your data collection is measurable and tangible for further applications.

Disadvantages of quantitative data

 Quantitative data doesn’t always tell you the full story (no matter what the
perspective). 

 With choppy information, it can be inconclusive.

 Quantitative research can be limited, which can lead to overlooking broader


themes and relationships.
 By focusing solely on numbers, there is a risk of missing larger focus information
that can be beneficial.

What are the advantages and disadvantages of


qualitative data?
Advantages of qualitative data

 Qualitative data offers rich, in-depth insights and allows you to explore context.

 It’s great for exploratory purposes.

 Qualitative research delivers a predictive element for continuous data.

Disadvantages of qualitative data

 It’s not a statistically representative form of data collection because it relies upon
the experience of the host (who can lose data).

 It can also require multiple data sessions, which can lead to misleading
conclusions.

The takeaway is that it’s tough to conduct a successful data analysis without both. They
both have their advantages and disadvantages and, in a way, they complement each
other.

What are the collection methods of both quantitative


and qualitative data? 
In order to analyze both types of data, you’ve got to collect the information first, of
course. 
Qualitative research methods are more flexible and utilize open-ended questions.
Quantitative data collection methods focus on highly controlled approaches and
numerical information.

Quantitative data collection methods

Surveys

A survey is one of the most common research methods with quantitative data that
involves questioning a large group of people. Questions are usually closed-ended and
are the same for all participants. An unclear questionnaire can lead to distorted
research outcomes.

Polls

Similar to surveys, polls yield quantitative data. That is, you poll a number of people and
apply a numeric value to how many people responded with each answer.

Experiments

An experiment is another common method that usually involves a control group and


an experimental group. The experiment is controlled and the conditions can be
manipulated accordingly. You can examine any type of records involved if they pertain
to the experiment, so the data is extensive. 

Or you can mix it up — use mixed methods of both to combine qualitative and
quantitative data. 

The best practices of each help to look at the information under a broader lens to get a
unique perspective. Using both methods is helpful because they collect rich and reliable
data, which can be further tested and replicated.

Controlled experiments, A/B tests, blind experiments, and many others fall under this
category.

Qualitative data collection methods


Interviews

An interview is the most common qualitative research method. This method involves
personal interaction (either in real life or virtually) with a participant. It’s mostly used for
exploring attitudes and opinions regarding certain issues.

Interviews are very popular methods for collecting data in product design.

Focus groups

Data analysis by focus group is another method where participants are guided by a host
to collect data. Within a group (either in person or online), each member shares their
opinion and

What’s an example of the difference between


quantitative and qualitative data?
You’ve most likely run into quantitative and qualitative data today, alone. For the visual
learner, here are some examples of both quantitative and qualitative data: 

Quantitative data example

 The customer has clicked on the button 13 times. 

 The engineer has resolved 34 support tickets today. 

 The team has completed 7 upgrades this month. 

 14 cartons of eggs were purchased this month.

Qualitative data example

 My manager has curly brown hair and blue eyes.


 My coworker is funny, loud, and a good listener. 

 The customer has a very friendly face and a contagious laugh.

 The eggs were delicious.

The fundamental difference is that one type of data answers primal basics and one
answers descriptively. 

What does this mean for data quality and analysis? If you just analyzed quantitative
data, you’d be missing core reasons behind what makes a data collection meaningful.
You need both in order to truly learn from data—and truly learn from your customers.

Discrete data Continuous data


Takes specific countable values Takes any measured value within a specific range
Some common examples of discrete
data are the number Some common examples of continuous data are height, weight,
of students, the number of children, length, time, temperature, age, and so on
the shoe size, and so on
Ordinal data values and integer
Decimal numbers and fractions represent continuous data
values represent discrete data
Easily counted on something as Requires more in-depth measurement tools and methods like
simple as a number line curves and skews
Discrete data remains constant over a Continuous data varies over time and can have separate values
specific time interval at any given point

What is Geodata?
Geodata is location information stored in a Geographic Information
System (GIS).

By viewing data with a geographic component, we see it through a


different lens.

Geodata tackles the problem of location because geographic problems


require spatial thinking.

Let’s dive into the types, themes, and sources of geodata.


Types of Geographic Data
Geodata Types

There are different types of geographic data and each of these has its own
unique value in how you use them.

Whether the data is from government, private sources, or open data, it’s
important to understand the type of data, where it comes from, how it
is collected, and what it can be used for.

From vector to raster, or web-based to multi-temporal, here are some of


the most common types of data along with their benefits and
drawbacks.

1. Vector Files
Vector Data Type Line

Vector data consists of vertices and paths. The three basic types of vector
data are points, lines, and polygons (areas). Each point, line and
polygon has a spatial reference frame such as latitude and longitude.

First, vector points are simply XY coordinates. Secondly, vector lines


connect each point or vertex with paths in a particular order. Finally,
polygons join a set of vertices. But it encloses the first and last
vertices creating a polygon area.

2. Raster Files
Raster Pixels

Raster data is made up of pixels or grid cells. Commonly, they are square
and regularly spaced. But rasters can be rectangular as well. Rasters
associate values to each pixel.

Continuous rasters have values that gradually change such as elevation or


temperature. But discrete rasters set each pixel to a specific class. For
example, we represent land cover classes to a set of values.

3. Geographic Database
The purpose of geographic databases is to house vectors and rasters.
Databases store geographic data as a structured set of
data/information. For example, Esri geodatabases, geopackages and

SpatiaLite is the most common type of geographic database. We use


geographic databases because it’s a way to put all data in a single
container. Within this container, we can build networks, create
mosaics, and do versioning.

4. Web Files
WMS Web Mapping Service

As the internet becomes the largest library in the world, geodata has
adapted with its own types of storage and access. For example,
GeoJSON, GeoRSS, and web mapping services (WMS) were built
specifically to serve and display geographic features over the internet.

Additionally, online platforms such as Esri’s ArcGIS Online allow


organizations to build data warehouses in the cloud.

5. Multi-temporal
Space Time Cubes Pattern

Multi-temporal data attaches a time component to information. But multi-


temporal geodata not only has a time component but a geographic
component as well.

For example, weather and climate data tracks how temperature and
meteorological information changes in time in a geographical context.
Other examples of multi-temporal geodata are demographic trends,
land use patterns, and lightning strikes.

READ MORE: The Ultimate List of GIS Formats and Geospatial File
Extensions

Geodata Themes
Geodata ThemesThe truth is:

You can group geodata into as many themes as you want.

They can be as broad or as narrow to your liking.


Here are examples of geographic themes:

Cultural
Administrative (Boundaries, cities and planning)
Socioeconomic data (Demographics, economy and crime)
Transportation (Roads, railways and airport)
Physical
Environmental data (Agriculture, soils and climate)
Hydrography data (Oceans, lakes and rivers)
Elevation data (Terrain and relief)
Sources for Geodata
Are you trying to find open, authoritative geodata to use in your maps?

Before the concept of open data took off, organizations were protecting
data as if it was Fort Knox. Since then, we are in a much better
position.

Currently, there’s no single website that holds all the geodata in the
world. Instead, they branch out into what they are most specialized in.

For example, OpenStreetMap data is the largest crowd-sourced GIS


database in the world providing countless applications for the public.

From NASA, USGS to the United Nations, these 10 free GIS data sources
list the best vector and raster to boost your geodata repertoire.

And finally, over 1000 satellites orbit the Earth collecting imagery of our
planet. These 15 free satellite imagery sources give you the most up-
to-date bird’s eye view of it all.

Data visualization is the practice of translating information into a visual


context, such as a map or graph, to make data easier for the human brain to
understand and pull insights from. The main goal of data visualization is to
make it easier to identify patterns, trends and outliers in large data sets. The
term is often used interchangeably with others, including information graphics,
information visualization and statistical graphics.
Data visualization is one of the steps of the data science process, which
states that after data has been collected, processed and modeled, it must be
visualized for conclusions to be made. Data visualization is also an element of
the broader data presentation architecture (DPA) discipline, which aims to
identify, locate, manipulate, format and deliver data in the most efficient way
possible.

Data visualization is important for almost every career. It can be used by


teachers to display student test results, by computer scientists exploring
advancements in artificial intelligence (AI) or by executives looking to share
information with stakeholders. It also plays an important role in big
data projects. As businesses accumulated massive collections of data during
the early years of the big data trend, they needed a way to quickly and easily
get an overview of their data. Visualization tools were a natural fit.

Visualization is central to advanced analytics for similar reasons. When a data


scientist is writing advanced predictive analytics or machine learning (ML)
algorithms, it becomes important to visualize the outputs to monitor results
and ensure that models are performing as intended. This is because
visualizations of complex algorithms are generally easier to interpret than
numerical outputs.

Why is data visualization important?


Data visualization provides a quick and effective way to communicate
information in a universal manner using visual information. The practice can
also help businesses identify which factors affect customer behavior; pinpoint
areas that need to be improved or need more attention; make data more
memorable for stakeholders; understand when and where to place specific
products; and predict sales volumes.

Other benefits of data visualization include the following:


 the ability to absorb information quickly, improve insights and make faster
decisions;

 an increased understanding of the next steps that must be taken to


improve the organization;

 an improved ability to maintain the audience's interest with information


they can understand;

 an easy distribution of information that increases the opportunity to share


insights with everyone involved;

 eliminate the need for data scientists since data is more accessible and
understandable; and

 an increased ability to act on findings quickly and, therefore, achieve


success with greater speed and less mistakes.
Data visualization and big data
The increased popularity of big data and data analysis projects have made
visualization more important than ever. Companies are increasingly
using machine learning to gather massive amounts of data that can be
difficult and slow to sort through, comprehend and explain.
Visualization offers a means to speed this up and present information
to business owners and stakeholders in ways they can understand.

Big data visualization often goes beyond the typical techniques used in
normal visualization, such as pie charts, histograms and corporate
graphs. It instead uses more complex representations, such as heat
maps and fever charts. Big data visualization requires powerful
computer systems to collect raw data, process it and turn it into
graphical representations that humans can use to quickly draw
insights.

While big data visualization can be beneficial, it can pose several


disadvantages to organizations. They are as follows:

To get the most out of big data visualization tools, a visualization


specialist must be hired. This specialist must be able to identify the
best data sets and visualization styles to guarantee organizations are
optimizing the use of their data.
Big data visualization projects often require involvement from IT, as well
as management, since the visualization of big data requires powerful
computer hardware, efficient storage systems and even a move to the
cloud.
The insights provided by big data visualization will only be as accurate as
the information being visualized. Therefore, it is essential to have
people and processes in place to govern and control the quality of
corporate data, metadata and data sources.

A Bar Graph (also called Bar Chart) is a graphical display of data using bars of
different heights.

Imagine you just did a survey of your friends to find which kind of movie they
liked best:

Table: Favorite Type of Movie


Comedy Action Romance Drama SciFi
4 5 6 1 4

We can show that on a bar graph like this:

It is a really good way to show relative sizes: we can see which types of movie
are most liked, and which are least liked, at a glance.

We can use bar graphs to show the relative sizes of many things, such as what
type of car people have, how many customers a shop has on different days and
so on.

Example: Nicest Fruit

A survey of 145 people asked them "Which is the nicest fruit?":

Appl
Fruit: Orange Banana Kiwifruit Blueberry Grapes
e
People: 35 30 10 25 40 5
And here is the bar graph:

That group of people think Blueberries are the nicest.

Bar Graphs can also be Horizontal, like this:

Example: Student Grades

In a recent test, this many students got these grades:

Grade: A B C D
Students: 4 12 10 2

And here is the bar graph:

You can create graphs like that using our Data Graphs (Bar, Line, Dot, Pie,
Histogram) page.

Histograms vs Bar Graphs

Bar Graphs are good when your data is in categories (such as "Comedy",


"Drama", etc).

But when you have continuous data (such as a person's height) then use


a Histogram.

It is best to leave gaps between the bars of a Bar Graph, so it doesn't look like
a Histogram.

 
Pie Chart: a special chart that uses "pie slices" to show relative sizes of data.

Imagine you survey your friends to find the kind of movie they like best:

Table: Favorite Type of Movie


Comedy Action Romance Drama SciFi
4 5 6 1 4

You can show the data by this Pie Chart:

It is a really good way to show relative sizes: it is easy to see which movie
types are most liked, and which are least liked, at a glance.

You can create graphs like that using our Data Graphs (Bar, Line and Pie) page.

Or you can make them yourself ...

How to Make Them Yourself


First, put your data into a table (like above), then add up all the values to get a
total:

Table: Favorite Type of Movie


Comedy Action Romance Drama SciFi TOTAL
4 5 6 1 4 20

Next, divide each value by the total and multiply by 100 to get a percent:

Comedy Action Romance Drama SciFi TOTAL


4 5 6 1 4 20
4/20 5/20 6/20 1/20 4/20
100%
= 20% = 25% = 30% = 5% = 20%

 
Now to figure out how many degrees for each "pie slice" (correctly called
a sector).

A Full Circle has 360 degrees, so we do this calculation:

Comedy Action Romance Drama SciFi TOTAL


4 5 6 1 4 20
20% 25% 30% 5% 20% 100%
4/20 × 360° 5/20 × 360° 6/20 × 360° 1/20 × 360° 4/20 × 360°
360°
= 72° = 90° = 108° = 18° = 72°

Now you are ready to start drawing!

Draw a circle.

Then use your protractor to measure the degrees of each sector.

Here I show the first sector ...

Finish up by coloring each sector and giving it a label like "Comedy: 4 (20%)",
etc.

(And dont forget a title!)


Another Example
You can use pie charts to show the relative sizes of many things, such as:

 what type of car people have,


 how many customers a shop has on different days and so on.
 how popular are different breeds of dogs

Example: Student Grades

Here is how many students got each grade in the recent test:

A B C D
4 12 10 2

And here is the pie chart:

Histogram: a graphical display of data using bars of different heights.

Histogram

It is similar to a Bar Chart, but a histogram groups numbers into ranges .

The height of each bar shows how many fall into each range.

And you decide what ranges to use!

orange orchard
Example: Height of Orange Trees
You measure the height of every tree in the orchard in centimeters (cm)

The heights vary from 100 cm to 340 cm

You decide to put the results into groups of 50 cm:


The 100 to just below 150 cm range,
The 150 to just below 200 cm range,
etc...
So a tree that is 260 cm tall is added to the "250-300" range.

And here is the result:

histogram heights

You can see (for example) that there are 30 trees from 150 cm to just
below 200 cm tall

(PS: you can create graphs like that using Make your own Histogram)

Notice that the horizontal axis is continuous like a number line: histogram
x axis

puppy
Example: How much is that puppy growing?
Each month you measure how much weight your pup has gained and get
these results:

0.5, 0.5, 0.3, −0.2, 1.6, 0, 0.1, 0.1, 0.6, 0.4

They vary from −0.2 (the pup lost weight that month) to 1.6

Put in order from lowest to highest weight gain:

−0.2, 0, 0.1, 0.1, 0.3, 0.4, 0.5, 0.5, 0.6, 1.6

You decide to put the results into groups of 0.5:

The −0.5 to just below 0 range,


The 0 to just below 0.5 range,
etc...
And here is the result:

histogram weight change


(There are no values from 1 to just below 1.5, but we still show the
space.)

The range of each bar is also called the Class Interval

In the example above each class interval is 0.5

Histograms are a great way to show results of continuous data, such as:

weight
height
how much time
etc.
But when the data is in categories (such as Country or Favorite Movie),
we should use a Bar Chart.

bar chart vs histogram

Frequency Histogram
A Frequency Histogram is a special graph that uses vertical columns to
show frequencies (how many times each score occurs):

Frequency Histogram
Here I have added up how often 1 occurs (2 times),
how often 2 occurs (5 times), etc,
and shown them as a histogram.

Line Graph: a graph that shows information that is connected in some


way (such as change over time)

You are learning facts about dogs, and each day you do a short test to see
how good you are. These are the results:

Table: Facts I got Correct


Day 1 Day 2 Day 3 Day 4
3 4 12 15
And here is the same data as a Line Graph:

Line Graph Example

You seem to be improving!

Making Line Graphs


You can create graphs like that using the Data Graphs (Bar, Line and Pie)
page.

Or you can draw it yourself!

Example: Ice Cream Sales


Table: Ice Cream Sales
Mon Tue Wed Thu Fri Sat Sun
$410 $440 $550 $420 $610 $790 $770
Let's make the vertical scale go from $0 to $800, with tick marks every
$200

line graph 1
Draw a vertical scale with tick marks

line graph 2
Label the tick marks, and give the scale a label

line graph 3
Draw a horizontal scale with tick marks and labels

line graph 4
Put a dot for each data value

line graph 5
Connect the dots and give the graph a title

Important! Make sure to have:

A Title
Vertical scale with tick marks and labels
Horizontal scale with tick marks and labels
Data points connected by lines

You might also like