DataScience&Analytics DataVisualiztn
DataScience&Analytics DataVisualiztn
Data visualization is the process of turning data into pictures or graphics, like charts, graphs, and maps, to help
people understand and interpret information more easily. Instead of looking at rows and rows of numbers, data
visualization allows you to see patterns, trends, and outliers in a more intuitive way, helping you make decisions
based on that data.
Importance:
Simplifies Complex Data: Large sets of data can be hard to understand. By using graphs, charts, or maps, you
can see the big picture at a glance. For example, a line graph showing sales over time can instantly show whether
the sales are increasing or decreasing.
Reveals Patterns and Trends: Visuals make it easier to spot trends or patterns that might be hard to see in a
table of numbers. For instance, you could quickly spot a spike in sales during a particular month just by looking at
a bar chart.
Helps with Decision Making: When data is visualized, it’s easier to analyze, compare, and make decisions
based on it. A manager might use a dashboard showing key performance metrics to make quick business
decisions.
Engages Viewers: People tend to remember and understand visual information better than raw data. Visuals
capture attention and make complex concepts easier to grasp.
Conventional Data Visualization Methods
Bar Chart
A bar chart is a graphical representation of data where individual bars represent different categories, with the
height or length of each bar corresponding to the value of that category. Bar charts are commonly used to compare
quantities across different groups or categories.
is a type of data visualization used to represent categorical data. It displays rectangular bars, where the
length or height of each bar is proportional to the value of the corresponding category. Bar charts are used to
compare the frequency, count, or other measures (such as sums or averages) across different
categories.
Advantages:
Easy to interpret and visualize comparisons across categories.
Can effectively display both nominal and ordinal data.
Provides a clear, intuitive way to represent and compare quantities.
Limitations:
Not suitable for displaying relationships or trends in continuous data.
May become cluttered or difficult to read when there are too many categories or data points.
Example:
Suppose a company sells three product types (A, B, and C) in four different regions (North,
South, East, West). A stacked bar chart can be used to show the distribution of total sales
across these regions, with the sales data broken down by product type.
Applications:
Circle: The entire circle represents the whole data set or 100%.
Slices: Each slice of the pie represents a category, with its size corresponding to the value or percentage that
category contributes to the total.
Percentages: Often, pie charts display the percentage or proportion each slice represents, either inside the slice
or in a legend.
Labels: Each slice can be labeled with either the category name, the numerical value, or the percentage it
represents.
Parallel Coordinates
Parallel coordinates are a technique used in data visualization to represent high-dimensional datasets.
Parallel plot or parallel coordinates plot allows to compare the feature of several individual observations (series)
on a set of numeric variables. Each vertical bar represents a variable and often has its own scale. (The units can
even be different). Values are then plotted as series of lines connected across each axis.
Allows the study of features for several quantitative variables. The variables can be completely different, different
ranges or even different units.
Key Characteristics:
1. Axes: Each axis corresponds to one feature of the dataset. These axes are typically placed parallel to each
other.
2. Lines: Each data point is represented as a polyline that connects its corresponding values along each axis. The
lines can represent individual data points or entire data subsets.
3. Interpretation: Patterns, correlations, and relationships between different dimensions can be seen through
the lines and their intersections.
IRIS dataset
Observations:
Flowers belonging to setosa species, have large Sepal Widths but low Sepal Lengths, Petal
Widths and Petal Lengths.
Flowers belonging to versicolor species have low Sepal Widths and medium Sepal Lengths,
Petal Widths and Lengths
Flowers belonging to virginica species have low to medium Sepal Width, medium to large
Sepal lengths and large petal widths and petal lengths.
Drawbacks:
Overcrowding: With many data points and dimensions, the chart can become cluttered,
making it hard to interpret.
1. Key Characteristics:
1. Hierarchy: Treemaps are particularly suited for visualizing hierarchical data, such as file systems,
organizational structures, or any data that can be structured into parent-child relationships.
2. Rectangular Layout: The data is represented by nested rectangles, with the size of each rectangle typically
corresponding to a quantitative variable (e.g., sales, revenue, population).
3. Color Encoding: Different colors can be used to represent categorical data, or to highlight differences in a
certain metric (e.g., performance, growth rate).
4. Compact: Treemaps are efficient in terms of space utilization, as they can fit a large amount of hierarchical
data into a small area.
Treemap which contains
rectangles and are sized and
colored by the sales in certain
cities.
1. Position: The placement of elements within a visual space, often used in graphs and charts (e.g., x and y axes on a
scatter plot). Positioning is a powerful way to show relationships between data points.
2. Size: The size of visual elements (such as bars, dots, or lines) can be used to represent magnitude or volume. Larger
elements often indicate higher values, while smaller elements indicate lower values.
3. Shape: Different shapes can be used to distinguish categories or represent different types of data. For example, circles,
squares, or triangles might represent different groups or variables.
4. Color: Color is often used to differentiate between categories, highlight trends, or represent values (e.g., using a
color gradient to show intensity or value). Color can help in visually grouping or distinguishing parts of the data.
5. Orientation: The angle or direction of elements can convey different types of information. For example, tilted
bars or lines can represent trends or directional relationships.
6. Texture: Texture refers to the surface detail of visual elements (like patterns or gradients). It can be used to
represent additional layers of data or simply to add aesthetic distinction.
7. Connection: The use of lines or arrows to connect elements in the visualization, which helps to show
relationships, flows, or networks between data points. This is particularly useful in network diagrams or flowcharts.
These variables, when thoughtfully combined, can help create more effective and interpretable visualizations,
enabling viewers to quickly grasp insights from complex data.
Mapping Variables to Encoding
Mapping variables to encoding involves assigning specific variables to certain types of encoding or transformations
in order to prepare data for processing, such as in machine learning or data analysis.
Example: If you have a age variable and decide to group ages into categories, you could bin it as:
0-18 -> "child",
19-35 -> "young adult",
36-65 -> "adult",
66+ -> "senior"
Comparison: When the goal is to compare different sets of data, either over time or comparing different
items
Distribution: When there is a need to understand how data is distributed over a range, charts in this
category are divided based on the number of variables analyzed.
Composition: When there is a need to understand how different components add up to form a whole,
which can be either over time or static.
Google Chart
Tableau
Qlikview
Datawrapper
Oracle Visual Analyzer
Fusion Charts
HighCharts
Microsoft Power BI
Plotly
Sisense
Q3. What is the importance of Big Data Visualization?
Q7. Use any free online tool to create a Word Cloud from any
pdf document of your choice.
Correlation Matrix