0% found this document useful (0 votes)
25 views66 pages

Big Data

The document discusses the concepts of Big Data and data visualization, highlighting their significance in modern business analytics. It covers the evolution of Big Data, its characteristics, and the importance of effective data visualization techniques and principles. The document emphasizes the need for clear communication of data insights through visualizations while adhering to best practices for design and presentation.

Uploaded by

Vince Tejada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views66 pages

Big Data

The document discusses the concepts of Big Data and data visualization, highlighting their significance in modern business analytics. It covers the evolution of Big Data, its characteristics, and the importance of effective data visualization techniques and principles. The document emphasizes the need for clear communication of data insights through visualizations while adhering to best practices for design and presentation.

Uploaded by

Vince Tejada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Big Data and Data Visualization

Dr. Shantanu Ganguly


Fellow, Knowledge Resource Center
The Energy and Resources Institute (TERI)
Email: [email protected]
Lesson Topics
• Big Data

• Big Data and Analytics

• Powerful data visualizations

• Understand visualizations

• Understand design considerations that lead to powerful data visualizations

• Understand effective techniques to create data visualizations

• Understand best practice tips for presenting data visualizations


The Information Continuum

Cartoon by David Somerville, based on a two pane version by Hugh McLeod


Types of Data
Quantitative Data Qualitative Data
• Measurable • Descriptive
• Collected through observation,
• Collected through field work, focus groups,
measuring things that have interviews, recording or filming
a fixed reality conversations
• Close ended • Open ended
Big Data
Data that is too large or too
complex to be managed
using traditional data
processing, analysis, and
storage techniques.
What is Big Data?
• Big Data is a collection of large datasets that cannot be adequately
processed using traditional processing techniques. Big data is not only data
it has become a complete subject, which involves various tools, techniques
and frameworks.
• Big data term describes the volume amount of data both structured and
unstructured manner that adapted in day-to-day business environment. It’s
important that what organizations utilize with these with the data that
matters.

• Big data helps to analyze the in-depth concepts for the better decisions and
strategic taken for the development of the organization.
The Evolution of Big Data
Volume: scale of data
Volume: scale of data

• 90% of today’s data has been created in just the last 2 years
• Every day we create 2.5 quintillion bytes of data or enough to fill 10 million Blu-ray discs
• 40 zettabytes (4o trillion gigabytes) of data will be created by 2020, an increase of 300
times from 2005, and the equivalent of 5,200 gigabytes of data for every man, woman and
child on Earth
• Most companies in the US have over 100 terabytes (100,000 gigabytes) of data stored
What is the importance of Big Data?
Who are the ones who use the Big Data
Technology?
Brief explanation of how
exactly businesses are utilizing Big Data
Big Data Technologies
Big Data and Analytics
Where does Big Data come from?
Email Transactions
Enterprise Partner, Employee
“Dark Data” Customer, Supplier

Contracts Monitoring

Public Commercial
Sensor
Credit
Weather
Industry
Population Social Media Sentiment
Economic
Network
Types of Data
Which Big Data characteristic is the
biggest issue for your organization?

Velocity of
data
16%
Variety of
data
Volume of 48%
data
35%

Source: Getting Value from Big Data, Gartner Webinar, May 2012
Biggest opportunity for Big Data in your
organization?
• 85% of Fortune 500
organizations will be unable
to exploit big data for
competitive advantage.

• Business analytics needs


will drive 70% of
investments in the
expansion and
modernization of
information infrastructure.
Source: Getting Value from Big Data, Gartner Webinar, May 2012
Business Analytics
Business Analytics/ Business Intelligence

• Business Analytics/Business intelligence (BI) is a broad category of


applications, technologies, and processes for:
• gathering,
• storing,
• accessing, and
• analyzing data

• to help business users make better decisions.


Things Are Getting More Complex

• Many companies are performing new kinds of analytics (**sentiment analysis, etc.), to
better and more quickly understand and respond to what customers are saying about them
and their products.
• The cloud, and appliances are being used as data stores
• Advanced analytics are growing in popularity and importance

**Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text
analysis and computational linguistics to identify and extract subjective information in source materials.
Analytics Models How can we make it
happen?

What will Prescriptive


happen? Analytics

Why did it Predictive


VALUE

happen? Analytics

Diagnostic
What happened? Analytics

Descriptive
Analytics

DIFFICULTY
Descriptive Analytics
• Descriptive analytics, such as reporting/ OLAP, dashboards, and data visualization, have
been widely used for some time.
• They are the core of traditional BI.

What has occurred?


Descriptive analytics, such as data visualization, is important in
helping users interpret the output from predictive and predictive
analytics.
Predictive Analytics

• Algorithms for predictive analytics, such as regression analysis, machine learning, and
neural networks, have also been around for some time.

What will occur?

• Marketing is the target for many predictive analytics applications.


• Descriptive analytics, such as data visualization, is important in helping users
interpret the output from predictive and prescriptive analytics.
Prescriptive Analytics
• Prescriptive analytics are often referred to as advanced analytics.
• Often for the allocation of scarce resources
• Optimization

What should occur?

Prescriptive analytics can benefit healthcare strategic planning by using analytics to


leverage operational and usage data combined with data of external factors such as
economic data, population demographic trends and population health trends, to more
accurately plan for future capital investments such as new facilities and equipment
utilization as well as understand the trade-offs between adding additional beds and
expanding an existing facility versus building a new one.
A Brief History of Data Visualization
What makes a good chart?

Napoleon’s 1812 March by


Charles Joseph Minard
Reprinted in Tufte (2009), p. 41
Perhaps the most famous data presentation…
Florence Nightingale's 'Coxcombs‘1858

• Pioneer hospital sanitation


• Meticulously gathered data
• Pioneer in applied statistics and
visualization
• Nurse
Willard C. Brinton, 1914
First business book about visualization

• Rules for presenting data


• American consulting engineer
Mary Eleanor Spear 1952, 1969

• Common-sense advice
• Invented box plot
• Worked for various US
government agencies
Jacques Bertin1967
• Principle of expressiveness:
• Say everything you want to say — no
more, no less
• Don’t mislead
• Principle of effectiveness:
• Use the best method available for
showing your data
• Cartographer
Jacques Bertin
Seven Visual Variables
• Position
• Size
• Shape
• Color
• Brightness
• Orientation
• Texture
Edward Tufte
1983

• Disciplined design principles


• Minimalist approach
• Professor emeritus at Yale University
Jock Mackinlay
1986

• Automatically encode data with software


• Enable people to focus on ideas, concepts
• Added eighth variable to Bertin’s list: motion
• VP of Research and Design, Tableau Software
What is data visualization and why is
it important?
Is data visualization a part of data
science?
What is data discovery and
visualization?
What are data visualization tools?
Is Excel a data visualization tool?
How do you create good data visualization?
What kind of visual communication
do you want to create?

1. Is my information conceptual or data-


driven?
• Conceptual information is qualitative
• Data-driven information is quantitative
2. Are my visuals meant to be declarative or
exploratory?
• A declarative purpose is to make a statement
• An exploratory purpose is to look for new ideas
Four Types of Data Visualizations
Declarative
Idea
illustration
Conceptual Data-Driven

Idea
generation
Exploratory
Four Types of Data Visualizations
Declarative
Idea Everyday
illustration dataviz
Conceptual Data-Driven

Idea Visual
generation discovery
Exploratory
Data Visualization

provide clear understanding of patterns in data

detect hidden structures in data

condense information
What makes a good chart?

Wikipedia: Patriotic War of 1812


Video: Napoleonic Wars in 8 Minutes Another video http://en.wikipedia.org/wiki/File:Patriotic_War_of_1812_ENG_map1.svg
Some basic principles (adapted from Tufte 2009)

1 • The chart should tell a story

• The chart should have graphical


2 integrity

• The chart should minimize graphical


3 complexity

Tufte’s fundamental principle:


Above all else show the data
When a table is better than a chart
For a few data points, a table can do just as well…

Salesperson Total Sales


Total Sales by Salesperson Peacock $225,763.68
$250,000.00
Leverling $201,196.27
$200,000.00
Davolio $182,500.09
$150,000.00
Fuller $162,503.78
$100,000.00
Callahan $123,032.67
$50,000.00 King $116,962.99
$0.00 Dodsworth $75,048.04
Suyama $72,527.63
Buchanan $68,792.25

The table carries more information in less space


and is more precise.
The Ultimate Table: The Box Score

• Large amount of
information in a very
small space

• So why does this work?


• Depends on the reader’s
knowledge of the data
Being conscious of data ink
Lower data-ink ratio
Hypothetical City Crime
(worse) 425

375

Thefts per 100000 citizens


325

275

225

175
Hypothetical City Crime 125
425 75
375 25
Thefts per 100000 citizens

325 2003 2004 2005 2006 2007 2008 2009 2010

275

225

175 Hypothetical City Crime


125 400
370 370
75 350
320 330
25
2003 2004 2005 2006 2007 2008 2009 2010 270

200

Higher data-ink ratio


2003 2004 2005 2006 2007 2008 2009 2010
(better)
What makes a good chart?
Sum of Extended Price

2011 Total Sales


160000

Sometimes it’s
140000
120000
100000
80000
60000
40000
really a matter of
20000
0 preference.
Order Date

These both
Sum of Extended Price

2011 Total Sales minimize data ink.


160000
140000
120000
100000
80000 Why isn’t a table
60000
40000
20000
better here?
0

Order Date
3-D Charts

Evaluate this from a data-ink perspective.


How does it affect the clarity of the chart?
One of the golden rules of data
visualization is…..
Never use 3D!
• 3D skews numbers,
Data Integrity/
making them difficult
Lie Factor
to interpret or compare
• Adding 3D to graphs
introduces
Graphical
unnecessary chart
Complexity
elements like side and
floor panels

Source: Knaflic (2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. Chapter 2.
Chartjunk: Data Ink “gone wild”

Unnecessary visual clutter that doesn’t


provide additional insight

Distraction from the story the chart is


supposed to convey

When the data-ink ratio is low, chartjunk is


likely to be high
Review: Data principles (adapted from
Tufte 2009)
1 • The chart should tell a story

• The chart should have graphical


2 integrity

• The chart should minimize graphical


3 complexity

Tufte’s fundamental principle:


Above all else show the data
Infographics
• Information graphics

• Visualization of information, data


or knowledge intended to present
information quickly and clearly

• We will have an ICA to create


infographics using Piktochart.

http://the-digital-reader.com/2015/04/13/infographic-ebooks-on-track-to-double-dutch-ebook-market-in-2014/
Summary
• Use data visualization principles to assess a visualization
• Tell a story
• Graphical integrity (lie factor)
• Minimize graphical complexity (data ink, chartjunk)
• Explain how a visualization can be improved based on those principles
• Types of visualization
THANK YOU

You might also like