0% found this document useful (0 votes)
3 views

data visualization and EDA

Data visualization is the practice of converting raw information into graphic formats to reveal patterns and correlations, making data more accessible and actionable. Common techniques include pie charts, bar plots, scatter plots, and heat maps, with tools like Tableau, Microsoft Power BI, and Google Charts facilitating the process. Exploratory Data Analysis (EDA) complements visualization by using statistical techniques to summarize and inspect data, uncovering insights and relationships for further analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

data visualization and EDA

Data visualization is the practice of converting raw information into graphic formats to reveal patterns and correlations, making data more accessible and actionable. Common techniques include pie charts, bar plots, scatter plots, and heat maps, with tools like Tableau, Microsoft Power BI, and Google Charts facilitating the process. Exploratory Data Analysis (EDA) complements visualization by using statistical techniques to summarize and inspect data, uncovering insights and relationships for further analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data visualization

Data visualization: It is the practice of converting raw


information (text, numbers, or symbols) into a graphic format.
The data is visualized with a clear purpose: to show logical
correlations between units, and define inclinations,
tendencies, and patterns. Depending on the type of logical
connection and the data itself, visualization can be done in a
suitable format. Any analytical report contains examples of
data interpretations like pie charts, comparison bars,
demographic maps, and much more. Data Visualization
represents the text or numerical data in a visual format, which
makes it easy to grasp the information the data express.
Data visualization is the process of creating a visual
representation of the information within a dataset.
While there are hundreds of ways to visualize data, some of
the most common data visualization techniques include:
• Pie charts

• Bar charts

• Histograms

• Heat maps

• Scatter plots

• Infographics

• Maps

Visually depicting data often makes it easier to understand


and draw insights from. As such, data visualization is an
effective means of making data more accessible across an
organization. This, in turn, can empower employees to back
their actions using concrete information instead of relying on
assumptions—resulting in more data-driven organizational
processes

1) Histogram
A histogram is a value distribution plot of numerical columns.
It basically creates bins in various ranges in values and plots it
where we can visualize how values are distributed. We can
have a look where more values lie like in positive, negative,
or at the center(mean). Let’s have a look at the Age column
2) Pie Chart
The pie chart is also the same as the countplot, only gives you
additional information about the percentage presence of each
category in data means which category is getting how much
weightage in data. This classic chart type is effective when
you want to illustrate the proportion of each category in the
dataset. However, remember not to use these types of charts
for large datasets, as too many slices can create confusion.
The chart is suitable when you have limited categories, ideally
less than six or seven

3) Bar Plot
Bar plot is a simple plot which we can use to plot categorical
variable on the x-axis and numerical variable on y-axis and
explore the relationship between both variables.

4) Scatter plots
Scatter plots are types of visualization that show a collection
of data points ‘scattered’ around the graph. The data points
can be evenly or unevenly distributed. Scatter plots are ideal
for exploring relationships and patterns between two
continuous variables. They can help you identify trends,
correlations, or potential clusters in the data.

5) Line charts
A line chart connects distinct data points through straight
lines. Its best use case is to illuminate trends, patterns, and
variable changes. This type of chart helps measure how
different groups relate to each other. This type of chart is also
effective for demonstrating progression, making them suitable
for scenarios like project timelines, production cycles, or
population growth.

6) Heatmap charts
Heatmap charts are a type of map data visualization that
uses a system of color coding to represent value. Each cell
in the matrix is assigned a color based on the value it
holds. This type of chart is commonly used to establish
relationships between two variables across a grid. In the
example above, the intensity of the colors in the map
clearly demonstrates the variables, making it easy to
identify patterns and trends.
Data Visualization Tools

1. Tableau
Tableau is one of the most popular data visualization tools on
the market for two main reasons: It’s relatively easy to use
and incredibly powerful. The software can integrate with
hundreds of sources to import data and output dozens of
visualization types—from charts to maps and more. Owned
by Salesforce, Tableau boasts millions of users and
community members, and it’s widely used at the enterprise
level.
Tableau offers several products, including desktop, server, and
web-hosted versions of its analytics platform, along with
customer relationship management (CRM) software.
A free option, called Tableau Public, is also available. It’s
important to note, however, that any visualizations created on
the free version are available for anyone to see. This makes it
a good option to learn the software's basics, but it’s not ideal
for any proprietary or sensitive data.
Tableau is a data visualization tool that can be used by data
analysts, scientists, statisticians, etc. to visualize the data and
get a clear opinion based on the data analysis. Tableau is very
famous as it can take in data and produce the required data
visualization output in a very short time. And it can do this
while providing the highest level of security with a guarantee
to handle security issues as soon as they arise or are found by
users. The public version of Tableau is free to use for anyone
looking for a powerful way to create data visualizations that
can be used in a variety of settings.
Tableau also allows its users to prepare, clean, and format
their data and then create data visualizations to obtain
actionable insights that can be shared with other users.
Tableau is available for individual data analysts or at scale for
business teams and organizations.
2. Microsoft Excel and Microsoft Power BI
In the strictest sense, Microsoft Excel is a spreadsheet
software, not a data visualization tool. Even so, it has useful
data visualization capabilities. Given that Microsoft products
are widely used at the enterprise level, you may already have
access to it.
According to Microsoft’s documentation, you can use Excel
to design at least 20 types of charts using data in spreadsheets.
These include common options, such as bar charts, pie charts,
and scatter plots, to more advanced ones like radar charts,
histograms, and treemaps.
There are limitations to what you can create in Excel. If your
organization is looking for a more powerful data visualization
tool but wants to stay within the Microsoft ecosystem, Power
BI is an excellent alternative. Built specifically as a data
analytics and visualization tool, Power BI can import data
from various sources and output visualizations in a range of
formats.
Microsoft Power BI is a Data Visualization platform focused
on creating a data-driven business intelligence culture in all
companies today. To fulfill this, it offers self-service analytics
tools that can be used to analyze, aggregate, and share data in
a meaningful fashion.
Microsoft Power BI offers hundreds of data visualizations to
its customers along with built-in Artificial Intelligence
capabilities and Excel integration facilities.
3. Zoho Analytics
Zoho Analytics is a Business Intelligence and Data
Analytics software that can help you create wonderful-looking
data visualizations based on your data in a few minutes. You
can obtain data from multiple sources and mesh it together to
create multidimensional data visualizations that allow you to
view your business data across departments.
Zoho Analytics is a data visualization tool specifically
designed for professionals looking to visualize business
intelligence. As such, it’s most commonly used to visualize
information related to sales, marketing, profit, revenues, costs,
and pipelines with user-friendly dashboards. More than
500,000 businesses and two million users currently leverage
the software.
Zoho Analytics has several paid options, depending on your
needs. There’s also a free version that allows you to build a
limited number of reports, which can be helpful if you’re
testing the waters to determine which tool is best for your
business.
4. Domo
Domo is a business intelligence model that contains multiple
data visualization tools that provide a consolidated platform
where you can perform data analysis and then create
interactive data visualizations that allow other people to easily
understand your data conclusions. You can combine cards,
text, and images in the Domo dashboard so that you can guide
other people through the data while telling a data story as they
go.
In case of any doubts, you can use their pre-built dashboards
to obtain quick insights from the data.
5. Infogram
Infogram is a fully-featured drag-and-drop visualization tool
that allows even non-designers to create effective
visualizations of data for marketing reports,
infographics, social media posts, maps, dashboards, and more.
Infogram is popular option that can be used to generate charts,
reports, and maps.
What sets Infogram apart from the other tools on this list is
that you can use it to create infographics (where its name
comes from), making it especially popular among creative
professionals. Additionally, the tool includes a drag-and-drop
editor, which can be helpful for beginners.
Visualizations can be saved as image files and GIFs to be
embedded in reports and documents, or in HTML to be used
online. Like most of the other tools on this list, Infogram has
tiered pricing, ranging from a free to enterprise-level version.
Finished visualizations can be exported into a number of
formats: .PNG, .JPG, .GIF, .PDF, and .HTML. Interactive
visualizations are also possible, perfect for embedding into
websites or apps. Infogram also offers a WordPress plugin
that makes embedding visualizations even easier for
WordPress users.
6. Google Charts
For professionals interested in creating interactive data
visualizations destined to live on the internet, Google
Charts is a popular free option.
The tool can pull data from various sources—including
Salesforce, SQL databases, and Google Sheets—and uses
HTML5/SVG technology to generate charts, which makes
them incredibly accessible. It offers 18 types of charts,
including bar charts, pie charts, histograms, geo charts, and
area charts.
Members of the Google community occasionally generate
new charts and share them with other users, which are
arranged in a gallery on Google's website. These charts tend
to be more advanced but may not be HTML5-compliant.
7. R. Studio
In R, we can create visually appealing data visualizations by
writing few lines of code. For this purpose, we use the diverse
functionalities of R. Data visualization is an efficient
technique for gaining insight about data through a visual
medium. With the help of visualization techniques, a human
can easily obtain information about hidden patterns in data
that might be neglected.
By using the data visualization technique, we can work with
large datasets to efficiently obtain key insights about it.
R provides a series of packages for data visualization like
ggplot2, plotly, tidyquant.

Exploratory Data Analysis (EDA)


Exploratory Data Analysis (EDA) is a process of describing
the data by means of statistical and visualization techniques in
order to bring important aspects of that data into focus for
further analysis. This involves inspecting the dataset from
many angles, describing & summarizing it without making
any assumptions about its contents.
It involves using statistics and visual tools to understand and
summarize data, helping data scientists and data analysts
inspect the dataset from various angles without making
assumptions about its contents. Exploratory Data Analysis
(EDA) is an essential step in the data analysis process. It
involves analyzing and visualizing data to understand its main
characteristics, uncover patterns, and identify relationships
between variables.
Steps involved in EDA
• Look at the Data: Gather information about the data, such

as the number of rows and columns, and the type of


information each column contains. This includes
understanding single variables and their distributions.
• Clean the Data: Fix issues like missing or incorrect

values. Preprocessing is essential to ensure the data is


ready for analysis and predictive modeling.
• Make Summaries: Summarize the data to get a general

idea of its contents, such as average values, common


values, or value distributions. Calculating quantiles and
checking for skewness can provide insights into the
data’s distribution.
• Visualize the Data: Use interactive charts and graphs to

spot trends, patterns, or anomalies. Bar plots, scatter


plots, and other visualizations help in understanding
relationships between variables. Python libraries like
pandas, NumPy, Matplotlib, Seaborn, and Plotly are
commonly used for this purpose.
• Ask Questions: Formulate questions based on your

observations, such as why certain data points differ or if


there are relationships between different parts of the data.
• Find Answers: Dig deeper into the data to answer these

questions, which may involve further analysis or creating


models, including regression or linear regression models

Types of EDA
Here are key types of EDA techniques:
• Univariate Analysis: Univariate analysis is the simplest

form of analysis where we explore a single variable.


Univariate analysis is performed to describe the data in a
better way. Univariate analysis examines individual
variables to understand their distributions and summary
statistics. This includes calculating measures such as
mean, median, mode, and standard deviation, and
visualizing the data using histograms, bar charts, box
plots, and violin plots.
• Bivariate Analysis: Bivariate analysis explores the

relationship between two variables. It uncovers patterns


through techniques like scatter plots, pair plots, and
heatmaps. This helps to identify potential associations or
dependencies between variables.
• Multivariate Analysis: Multivariate analysis involves

examining more than two variables simultaneously to


understand their relationships and combined effects.
Techniques such as contour plots, and principal
component analysis (PCA) are commonly used in
multivariate EDA.
• Visualization Techniques: EDA relies heavily on
visualization methods to depict data distributions, trends,
and associations. Various charts and graphs, such as bar
charts, line charts, scatter plots, and heatmaps, are used
to make data easier to understand and interpret.
• Outlier Detection: EDA involves identifying outliers

within the data—anomalies that deviate significantly


from the rest of the data. Tools such as box plots, z-score
analysis, and scatter plots help in detecting and analyzing
outliers.
• Statistical Tests: EDA often includes performing
statistical tests to validate hypotheses or discern
significant differences between groups. Tests such as t-
tests, chi-square tests, and ANOVA add depth to the
analysis process by providing a statistical basis for the
observed patterns.
By using these EDA techniques, we can gain a comprehensive
understanding of the data, identify key patterns and
relationships, and ensure the data’s integrity before
proceeding with more complex analysis.

You might also like