Data Analytics Unit I 1
Data Analytics Unit I 1
Unit-I: Introduction
and Life Cycle
By
Dr.Manjushree Nayak
Big Data Overview
• Industries that gather and exploit data
Credit card companies monitor purchase
• Good at identifying fraudulent purchases
• Mobile phone companies analyze calling patterns –
e.g., even on rival networks
• Look for customers might switch providers
• For social networks data is primary product
• Intrinsic value increases as data grows
Attributes Defining
Big Data Characteristics
• Huge volume of data
• Not just thousands/millions, but billions of items
• Complexity of data types and structures
• Varity of sources, formats, structures
• Speed of new data creation and grow
• High velocity, rapid ingestion, fast analysis
Attributes Defining
Big Data Characteristics
• Volume-The dataset in Petabytes
• Big Data observes and tracks what happens from various
sources which include business transactions, social media and
information from machine-to-machine or sensor data. This
creates large volumes of data.
• Variety-Dealing with types of data(Structurd
Data/Unstructured Data)
• Data comes in all formats that may be structured, numeric in
the traditional database or the unstructured text documents,
video, audio, email, stock ticker data.
• Velocity-How speed it Processes.
• The data streams in high speed and must be dealt with timely.
The processing of data that is, analysis of streamed data to
produce near or real time results is also fast.
Big Data Characteristics
• Variability-Frequent change in data
• Veracity-Maintaing quality/Meaning ful
datasets
• Visualization-Displaying data on charts
• Value-Utilization of Data in making
Revenue.
Big Data Analytics Importance
• Cost Savings : help in identifying more efficient ways of doing
business.
• Time Reductions :helps businesses analyzing data
immediately and make quick decisions based on the learnings.
• New Product Development : By knowing the trends of
customer needs and satisfaction through analytics you can
create products according to the wants of customers.
• Understand the market conditions : By analyzing big data you
can get a better understanding of current market conditions.
• Control online reputation: Big data tools can do
sentiment analysis. Therefore, you can get feedback about
who is saying what about your company.
Sources of Big Data Deluge
• Mobile sensors – GPS, accelerometer, etc.
• Social media – 700 Facebook updates/sec in2012
• Video surveillance – street cameras, stores, etc.
• Video rendering – processing video for display
• Smart grids – gather and act on information
• Geophysical exploration – oil, gas, etc.
• Medical imaging – reveals internal body structures
• Gene sequencing – more prevalent, less expensive,
healthcare would like to predict personal illnesses
Sources of Big Data Deluge
Data Structures:
Characteristics of Big Data
Data Structures:
Characteristics of Big Data
• Structured – defined data type, format, structure
• Transactional data, OLAP cubes, RDBMS, CVS files, spreadsheets
• Semi-structured
• Text data with discernable patterns – e.g., XML data
• Quasi-structured
• Text data with erratic data formats – e.g., clickstream data
• Unstructured
• Data with no inherent structure – text docs, PDF’s, images, video
Example of Structured Data
Rno Name Address Phone no
1 Amit Nashik 9766543267
2 Neha Pune -
3 Jiya Mumbai -
4 Riya Aurangabad 8990765432
Example of Semi-Structured Data
Example of Quasi-Structured Data
visiting 3 websites adds 3 URLs to user’s log files
Example of Unstructured Data
Video about Antarctica Expedition
Types of Data Repositories
from an Analyst Perspective
State of the Practice in Analytics
BI (Business Intelligence)
Data Science
Business Intelligence (BI) vs Data Science
Current Analytical Architecture
Typical Analytic Architecture
Current Analytical Architecture
Phase 1: Discovery
Phase 6: Operationalize
• The data analytic lifecycle is designed for Big Data problems and data
science projects
• With six phases the project work can occur in several phases
simultaneously
• The cycle is iterative to portray a real project
• Work can return to earlier phases as new information is uncovered
Key Roles for a Successful Analytics
Project
Key Roles for a
Successful Analytics Project
• Business User – understands the domain area
• Project Sponsor – provides requirements
• Project Manager – ensures meeting objectives
• Business Intelligence Analyst – provides business domain
expertise based on deep understanding of the data
• Database Administrator (DBA) – creates DB environment
• Data Engineer – provides technical skills, assists data
management and extraction, supports analytic sandbox
• Data Scientist – provides analytic techniques and modeling
Background and Overview of Data
Analytics Lifecycle
• Data Analytics Lifecycle defines the analytics process and best
practices from discovery to project completion
• The Lifecycle employs aspects of
• Scientific method
• Cross Industry Standard Process for Data Mining (CRISP-DM)
• Process model for data mining
• Davenport’s DELTA framework
• Hubbard’s Applied Information Economics (AIE) approach
• MAD Skills: New Analysis Practices for Big Data by Cohen et al.
Overview of
Data Analytics Lifecycle
Phase 1: Discovery
Phase 1: Discovery