We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19
Big Data Analytics
Big Data Analytics
• To - uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. • For - more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations. • From - Web server logs and Internet clickstream data, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data captured by sensors connected to the Internet of Things. Big Data Analytics • Structured data – main stream BI tools, Data warehouse, relational DB • Major weakness of relational databases – cannot handle variety of data, insufficient to handle velocity of data, scalability, replications in data centers. • ACID properties – intolerant to distributed environment • CAP theorem : Coherence: All the nodes of the system have to see exactly the same data at the same time; Availability: The system must stay up and running even if one of its node is failing down; Partition Tolerance: each subnet-works must be autonomous CAP theorem Key-Value databases – volume and scalability • These databases contain a simple string (the key) that is always unique and an arbitrary large data field (the value), making them a straight forward option for data storage. • Eg: Redis , Riak, Voldemort Key-Value Databases • Handling large volume of small and continuous reads and writes • Applications with infrequent updates and simple queries • Key-value databases for volatile Data Use cases for key-value databases: 1. Session management on a large scale 2. Using cache to accelerate application responses 3. Storing personal data on specific users 4. Product recommendations and personalized lists 5. Managing player sessions in massive multiplayer online games Column based databases • column is the basic entity • Rows of the grids are assimilated to records and identified by a unique Key such as in the Key-value model • Cassandra, HBase, Google BigTable. Document based databases • the value associated to the key can be a structured and complex objects rather than simple types – XML or JSON • Schemaless • Simple and flexible – suitable for content management systems • MongoDB, CosmosDB, DynamoDB Graph Databases • Nodes , relationships – set of properties • Handle complexity of database • Neo4j, Amazon Neptune, Apache AGE, Apache TinkerPop • Cartography, Social networks – network modelling Type of Analytics • Descriptive • Diagnostic • Predictive • Prescriptive Analytics Applications Tools and techniques • Tools – SAS, SPSS, R etc.. • Various Analytics techniques are: 1.Data Preparation 2. Reporting, Dashboards & Visualization 3. Segmentation 4. Forecasting 5. Descriptive Modelling 6. Predictive Modelling 7. Optimization • Steps involved in Analytics – Access, Manage, Analyze, Report Application of Modeling in Business • Statistical / ML/DL model • for representation, analysis, synthesis, discovery, recovery, sensing, acquisition, extraction, learning, security • Eg: Regression, Clustering, Time series • Missing Imputations – MICE package in R (missing imputations using chained equations – using PMM ( predictive mean matching) based on regression model. Databases – Type of data & Variable • DB – DBMS- Data dictionary • Types of Data – categorical or numerical: continuous or discrete • Data – nominal or ordinal • Based on usage : quantitative or qualitative Regression Models • Independent variable, dependent variable • Various types in regression – Univariate [ Linear – Simple/Multiple, Nonlinear ] Multivariate [Linear, Nonlinear] Logistic – response variable is qualitative ANOVA – all predictors are qualitative Analysis of covariance - quantitative and qualitative Linear Regression Three main purposes- • For describing the linear dependence of one variable on the other. • For prediction of values of other variable from the one which has more data. • Correction of linear dependence of one variable on the other. Linear regression Linear Regression • Find alpha , beta for given dataset using OLS Sample data Service Salary (months) (in K) 12 50 20 60 36 65 60 100 66 120