0% found this document useful (0 votes)
8 views19 pages

Big Data Analytics

Uploaded by

vagdevitanuku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Big Data Analytics

Uploaded by

vagdevitanuku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Big Data Analytics

Big Data Analytics


• To - uncover hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.
• For - more effective marketing, new revenue opportunities, better
customer service, improved operational efficiency, competitive
advantages over rival organizations.
• From - Web server logs and Internet clickstream data, social media
content and social network activity reports, text from customer emails
and survey responses, mobile-phone call detail records and machine
data captured by sensors connected to the Internet of Things.
Big Data Analytics
• Structured data – main stream BI tools, Data warehouse, relational DB
• Major weakness of relational databases – cannot handle variety of
data, insufficient to handle velocity of data, scalability, replications in
data centers.
• ACID properties – intolerant to distributed environment
• CAP theorem : Coherence: All the nodes of the system have to see
exactly the same data at the same time; Availability: The system must
stay up and running even if one of its node is failing down; Partition
Tolerance: each subnet-works must be autonomous
CAP theorem
Key-Value databases – volume and
scalability
• These databases contain a simple string (the key) that is always unique
and an arbitrary large data field (the value), making them a straight
forward option for data storage.
• Eg: Redis , Riak, Voldemort
Key-Value Databases
• Handling large volume of small and continuous reads and
writes
• Applications with infrequent updates and simple queries
• Key-value databases for volatile Data
Use cases for key-value databases:
1. Session management on a large scale
2. Using cache to accelerate application responses
3. Storing personal data on specific users
4. Product recommendations and personalized lists
5. Managing player sessions in massive multiplayer online
games
Column based databases
• column is the basic entity
• Rows of the grids are assimilated to records and identified by a unique
Key such as in the Key-value model
• Cassandra, HBase, Google BigTable.
Document based databases
• the value associated to the key
can be a structured and
complex objects rather than
simple types – XML or JSON
• Schemaless
• Simple and flexible – suitable
for content management
systems
• MongoDB, CosmosDB,
DynamoDB
Graph Databases
• Nodes ,
relationships – set
of properties
• Handle complexity
of database
• Neo4j, Amazon
Neptune, Apache
AGE, Apache
TinkerPop
• Cartography, Social
networks – network
modelling
Type of Analytics
• Descriptive
• Diagnostic
• Predictive
• Prescriptive
Analytics Applications
Tools and techniques
• Tools – SAS, SPSS, R etc..
• Various Analytics techniques are:
1.Data Preparation
2. Reporting, Dashboards & Visualization
3. Segmentation
4. Forecasting
5. Descriptive Modelling
6. Predictive Modelling
7. Optimization
• Steps involved in Analytics – Access, Manage, Analyze, Report
Application of Modeling in Business
• Statistical / ML/DL model
• for representation, analysis, synthesis, discovery, recovery, sensing,
acquisition, extraction, learning, security
• Eg: Regression, Clustering, Time series
• Missing Imputations – MICE package in R (missing imputations using
chained equations – using PMM ( predictive mean matching) based on
regression model.
Databases – Type of data & Variable
• DB – DBMS- Data dictionary
• Types of Data – categorical or numerical: continuous or discrete
• Data – nominal or ordinal
• Based on usage : quantitative or qualitative
Regression Models
• Independent variable, dependent variable
• Various types in regression –
Univariate [ Linear – Simple/Multiple, Nonlinear ]
Multivariate [Linear, Nonlinear]
Logistic – response variable is qualitative
ANOVA – all predictors are qualitative
Analysis of covariance - quantitative and qualitative
Linear Regression
Three main purposes-
• For describing the linear dependence of one variable on the other.
• For prediction of values of other variable from the one which has more
data.
• Correction of linear dependence of one variable on the other.
Linear regression
Linear Regression
• Find alpha , beta for given dataset using OLS
Sample data
Service Salary
(months) (in K)
12 50
20 60
36 65
60 100
66 120

You might also like