Unit V
Big Data Visualization
✔ Introduction to:
✔ Introduction to Data visualization
✔ Pentaho
✔ Challenges to Big data visualization
✔ Flare
✔ Conventional data visualization tools
✔ Jasper Reports
✔ Techniques for visual data representations ✔ Dygraphs
✔ Types of data visualization Visualizing Big Data ✔ Datameer Analytics Solution and Cloudier
✔ Tools used in data visualization, Propriety Data ✔ Platfora
Visualization tools ✔ NodeBox
✔ Open source data visualization tools ✔ Gephi
✔ Analytical techniques used in Big data ✔ Google Chart API
visualization ✔ Flot
✔ Data visualization with Tableau ✔ D3
✔ Visually.
Introduction to Data visualization
? “Data Visualization is the technique use to communicate data by representing
information using visual graphic object like point, lines or bars.”
❖ Objective of data visualization
o To enlighten the data or see data in context.
o To solve or give solutions to problem.
o For understanding data clearly explore data, help to take proper decision.
o To illustrate or hide data.
o To find patterns or relationship among data
o To make comparison between statistical data
Visualizing Big Data
? Amount of data generated by organization increased year and year through
internet activity called as Big Data.
? main Problem is collected data should be use full only. Big data visualization
refers “Front End” of big data.
? Data Visualization used to represent data in different sensitive objects like tables,
diagram, images etc.
Challenges to Big data visualization
? Problems occurs in Big Data Visualization :
Problems in Big Data
Visualization
1. Visual noise (too relative data, user unable separate)
2. Information loss (reduction of data set, but may be info loss)
(limitations- aspect ratio, screen resolution)
3. Large image perception
(only see data, can’t change it)
4. High rate of image change
5. High performance requirement
(lower visualization speed)
Challenges to Big data visualization
? Solution occurs in Big Data Visualization :
Solution in Big Data
Visualization
(By using fast h/w, increasing m/m)
1. Speed upping process
2. Understanding the data (Take help of expertise to understand)
(Assure quality by information
3. Addressing data quality
management process)
4. Displaying meaningful results (effective visualization by clustering)
5. Dealing with outliers (removing outliers)
Types of Data Visualization
Problems in Big Data
Visualization
1. Tables
2. Histogram
3. Scatter plot
4. Various charts
5. Timeline
6. Various diagrams
1. Tables
? Collection of rows and columns, represent data into structured.
? Small unit is ‘cell’, represented as [4(row),2(columns)]
2. Histogram
? Vertical bar chart is used
? Represent distribution od set of data over continues interval
3. Scatter plot
? Also known as X-Y Plots, Scattered Graph, Point Graphs or Scatter grams
? use to represent relationship among 2 different variables where one may or may
not correlate to another.
Correlation
1. Positive
2. Negative
3. Null
4. Linear
5. Exponential
6. U-shape
4. Charts
Types of Chart
1. Line Chart
2. Bar Chart
3. Pie Chart
4. Area Chart
5. Flow Chart
6. Bubble Chart
1. Line Chart 2. Bar Chart 3. Pie Chart
4. Area Chart 5. Flow Chart 6. Bubble Chart
5. Timeline
? Pictorial representation of events in chronological sequence along with drawing
straight line .
Timeline
1. Linear timeline
2. Comparative timeline
5. Various Diagram
Various diagram
1. Venn Diagram
2. Data Flow Diagram
3. Entity Relationship Diagram
1. Venn Diagram
2. Data Flow Diagram 3. Entity Relationship Diagram
Conventional Data Visualization Tool
The methods and ideas used by organization for visualizing data.
[Link] point on which interactive visualization takes place.
Selection point on which
interactive visualization takes
place
(To make perfect choice, size and
1. Size and Volume of data
volume should be visualize)
2. Cardinality (Cordiality should be visualize)
(Visualizing the point/portion of
3. Portion of Data to be Convey data which user want to convey)
4. Audience (To whom user want to convey)
5. Type of visual (Which type of visualization user
should use)
2. Interactive Visualization approaches
Interactive Visualization
approaches
(allows user to change scale of
1. Zoom in and Zoom out or Zooming interface are according to choice)
2. Overview + Detail (Multiple view simultaneously used)
(Focus area represent detail about
3. Focus + Context or Flash Eye part of information)
3. Steps used to perform interactive visualization
Steps used to perform
interactive visualization
(According to user data entities /subset/part of
1. Interactive selection of data object whole selected for visualization)
(use for connecting multiple views)
2. Linking data object with each other
(only valuable data focused and
3. Filtering Information
unrelated remove)
4. Rearranging or Remapping (rearrange data)
Techniques For Visual Data Representation
? According to diff author data visualization techniques are different.
Visualization Techniques/Methods
(Help to represent quantitative data with or without axes
1. Data Visualization in diagrammatic form eg. Table, Line chart, pie chart)
(Provide interactivity in data to increase cognition. eg.
2. Information Visualization Tree map, Clustering, Venn Diagram )
(Used to explain ideas, plans, concept in detail and
3. Concept Visualization analyse easily eg. Decision Tree)
(used to represent organizations strategies of development,
4. Strategic Visualization Formulation, implementation. Eg. Organizational chart,
failure Tree, Strategy map)
5. Metaphor Visualization (organize and structure information graphically.
Express insight of information. Metro Map, Tree)
6. Compound Visualization (allow merging diff graphic format in single shema.
eg Cartoon )
Data Visualization Tools
? For visualizing data sets in the form of 2D and 3D various tool are used
? Part of visualization tool divided into 2 part:
Part of visualization tools
1. Multidimensional visualization
2. Multidimensional Visualization Tool
1. Multidimensional Visualization
There are two categories of multidimensional visualization
First type examine the category properties or category count.
❑ Example
o Pie chart
o Bar chart
o Histogram
o Tree map
Second type examines the relationship among the variables
❑ Example
o Scatter Plot
o Line chart
o Area chart
o Tabular comparison
2. Multidimensional Visualization Tool
❖ Google Charts
? This tool display live data on our website
? Google Chart contains Introduction ,Quick Start and Chart Gallery for ideas.
❖ Many Eyes
? Many eyes is an research done by IBM research and IBM Cognos s/w grp.
? Developed by using JAVA and Flash, Open Source
? It is public website, allows user to upload data and for such data it will generate
interactive visualization.
❖ Tableau Public
? Most popular tool, developed by US company Tableau Software.
? According to their website it “Brings Data to Life”.
❖ Weave
? Web Based Analysis and Visualization Environment
? Can handle diff datatypes bcz it has large array of option for working with
various data.
❖ Wordle
? Wordle takes text as input from user and generate ‘Word Clouds’
? Clouds provide greater importance to words which frequently occurs in source
text
Open – Source Data Visualization Tools
Open – Source Data
Visualization Tools
1. Datawrapper
2. Chart JS
3. Raw
4. Charted
5. Timeline
6. Leaflet
1. Datawrapper
? Open –source, produce in Europe by the journalism organization.
? designed to create data visualization for news institutes.
? Graph can be created in 4 steps;:
- To create graph click on “New Chart” link on top menu bar
- paste your data in the text area.
- Then, tool analyses, and show preview
- if everything is fine then publish data
2. Chart JS
? Open source, having clean charting library
? Chart JS allow self control to user over look and feel of their chart
? Before creating chart ,include library in frontend code(code must)
? Then add chart and assign value to them
3. Raw
? Open source, web based tool, built on [Link] library
? Simple, ready to use tool for non-programmable user
4. Charted
Open source, invented by the product science team at Medium
To visualize data just paste link of Google spreadsheet or .csv as input
it check whether data is up-to-date or not after some interval(30 min).
5. Timeline
? To display set of events in sequential manner
? Need proper formatting of data in Google spreadsheet
6. Leaflet
Lightweight, mobile friendly JavaScript library, use to create interactive maps
take advantage of HTML5 and CSS3
Well documented, easy to use, beautiful API and readable source code
Analytical Techniques used in Big Data Visualization
Analytical Methods
1. Classification
2. Regression
3. Clustering
4. Association Rule
1. Classification
Supervised learning
? SL is where you have input variables (X) and an output variable (Y)
? We use an algorithm to learn mapping function from input to output Y=f(X).
? Goal is when you have new input data(X) then you can predict output
variables(Y)
? For instance, suppose you are given an basket filled fruits.
Now the first step is to train the machine with different fruits :
? If shape of object is rounded and depression at top having color Red then it will be
labelled as –Apple.
? If shape of object is long curving cylinder having color Green-Yellow then it will
be labelled as –Banana.
i) Classification
? Classification problem is when the output variable is category, such as “red” or “blue.”
? Classification model attempts to draw some conclusion from observed values.
? Given one or more input to classification model will try to predict value of one or
more outcomes.
? for eg. When filtering emails “spam” or “not spam”
ii) Regression
Regression problem is when output variable is real or continuous value such as
“Salary” or “Weight”
Diff between classification and regression, Classification predict something will
happen whereas Regression predict how much of it will happen
The ans of following types of question Regression analysis use:
1) How much person expected income is? (ans- Linear regression)
2) What is the probability that an applicant will fail to clear loan? (ans- Logistic regression)
Unsupervised learning
? Hidden structure is discovered from unlabeled data
? Unsupervised learning is the training of machine using information that is neither
classified nor labeled
? Unlike supervised learning, no teacher is provided that means no training will be
given to the machine
? Task of machine is to group unsorted information according to similarities,
patterns and differences without training data.
For instance, suppose it is given an image having both dogs and cats
which have not seen ever
Thus machine has no any idea about the features of dogs and cat so we
can’t categorize it in dogs and cats. But it can categorize them
according to their similarities, patterns and differences .
i) Clustering
? Unsupervised technique used for grouping similar object.
? No prediction, find out similarities between object and grp in to cluster
ii) Association Rule
Unsupervised technique
No prediction made, instead it find out remarkable relationship among item that
are hidden in large dataset.
This discovered relation denoted as Rules
Data Visualization with Tableau
? Tableau is Business Intelligence tool s/w data.
? has its own in-memory data engine, Help to speed up the visualization
? Hadoop embedded with Tableau, uses Hive
Features
1. Quick and easy data acquisition
2. Publication of interactive graphics
3. Data are public
4. Has 3 main product : i) Tableau Desktop
ii) Tableau Server
iii) Tableau Public
Introduction :
Pentaho :
? It provide Data analysis designing, monitoring, Data Mining and integration
features
Flare :
? It is ActionScript library,runs on Adobe Flash Player
Jasper Reports
? open source java reporting tool, define in XML format
Dygraphs
? fast, flexible, open source JavaScript charting library
Datameer Analytics Solution and Cloudier
? allows to store entire data in hadoop
Platfora
? bult on Hadoop and Spark
NodeBox :
node-based s/w, used for creating 2D graphs
Gephi:
Written in java and OpenGL
open source java reporting tool, define in XML format
Google Chart API
provide simple visualization using online tool
Flot
Jquery library for line and Bar chart
[Link]
Data-Drivan Document(HTML + CSS)
[Link]
provide template, popular for infographics