0% found this document useful (0 votes)

80 views67 pages

Introduction To Bda

Uploaded by

kshitijseven1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views67 pages

Introduction To Bda

Uploaded by

kshitijseven1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Unit 1

Big Data Analytics

Course Code: CDCSC11
Faculty: Dr. Vandana Bhatia
Part 1: Contents
❑Introduction to Big Data
❑Databases and their evolution
❑Convergence of key trends
❑Unstructured data
❑Industry examples of Big Data:
➢ Web analytics
➢ Big data and marketing
➢ Fraud and big data
➢ Risk and big data
➢ Credit risk management
➢ Big data and algorithmic trading
➢ Big data and healthcare
➢ Big data in medicine
➢ Advertising and big data
Part 2: Contents

❑Big data technologies:

➢ Introduction to Hadoop

➢ Open-source technologies

➢ Cloud and big data mobile business intelligence

➢ Crowd sourcing analytics

➢ Inter and trans firewall analytics.

Understanding
Big Data
Objectives:
• To understand what is big data
• To know various types of data
• To understand Examples
• To explore various applications of
Big Data
What is
Big Data
• Simply: Data of Very Big Size

• Can’t process with usual tools

• Distributed Architecture Needed

• Structured / Unstructured
❑ According to Gartner – It is huge-volume, fast-
velocity, and different variety information assets that
demand innovative platform for enhanced insights
and decision making.

❑ A Revolution, authors explain it as – It is a way to

solve all the unsolved problems related to data
What is management and handling, an earlier industry was
used to live with such problems. With Big data
Big Data analytics, you can also unlock hidden patterns and
know the 360-degree view of customers and better
understand their needs.
What is Bigdata Type
Example
Characteristics of Big Data
~VOLUME ~VELOCITY ~VARIETY
Big Data Categories
This refers to the data that is tremendously
large. As you can see from the image, the
volume of data is rising exponentially. In 2016,
the data created was only 8 ZB and it is
expected that, by 2020, the data would rise up
to 40 ZB, which is extremely large.
Big Data:
Volume
A reason for this rapid growth of data
volume is that the data is coming from
different sources in various formats.
The data is categorized as follows:.
Big Data:
Variety
Big Data: Velocity

The speed of data accumulation also plays a role in determining whether

the data is categorized into big data or normal data.
Big Data: Value

Deals with a mechanism to bring out the correct meaning out of data. First of
all, you need to mine the data, i.e., a process to turn raw data into useful data.
Then, an analysis is done on the data that you have cleaned or retrieved out of
the raw data. Then, you need to make sure whatever analysis you have done
benefits your business such as in finding out insights, results, etc. which were
not possible earlier.
Big Data: Veracity
• The trustworthiness and quality of data.
• It is necessary that the veracity of the data is maintained. For example, think
about Facebook posts, with hashtags, abbreviations, images, videos, etc., which
make them unreliable and hamper the quality of their content.
• Collecting loads and loads of data is of no use if the quality and trustworthiness
of the data is not up to the mark.
Applications Of Big Data
Finance
Banking:
o Since there is a massive amount of data that is gushing in from innumerable
sources, banks need to find uncommon and unconventional ways in order to
manage big data.
o It’s also essential to examine customer requirements, render services according
to their specifications, and reduce risks while sustaining regulatory compliance.
Stock Exchange:
o NYSE generates about one terabyte of new trade data every single day.
o So imagine, if one terabyte of data is generated every day, in a whole
year how much data there would be to process.
Applications of Big Data:
Social Network
• Social media in the current scenario is considered as the largest data generator.
• The stats have shown that around 500+ terabytes of new data get generated into the
databases of social media every day, particularly in the case of Facebook.
• The data generated mainly consist of videos, photos, message exchanges, etc. A
single activity on any social media site generates a lot of data which is again stored
and gets processed whenever required.
• Since the data stored is in terabytes, it would take a lot of time for processing if it is
done by our legacy systems. Big Data is a solution to this problem.
Applications of Big Data:
Healthcare

• Nowadays, doctors rely mostly on patients’ clinical records, which means that a lot
of data needs to be gathered, that too for different patients.
• Obviously, it is not possible for old or traditional data storage methods to store
this data.
• Since there is a large amount of data coming from different sources, in various
formats, the need to handle this large amount of data is increased
Applications of Big Data:
E-Commerce
• Maintaining customer relationships is the most important in the e-commerce industry.
• E-commerce websites have different marketing ideas to retail their merchandise to their
customers, to manage transactions, and to implement better tactics of using innovative
ideas with Big Data to improve businesses.
• Flipkart:
▪ Flipkart is a huge e-commerce website dealing with lots of traffic on a daily basis.
▪ But, when there is a pre-announced sale on Flipkart, traffic grows exponentially that
actually crashes the website.
▪ So, to handle this kind of traffic and data, Flipkart uses Big Data.
▪ Big Data can actually help in organizing and analyzing the data for further use.
Applications of Big Data:
Education

• The education sector holds a lot of information with regard to curriculum,

students, and faculty.
• The information is analyzed to get insights that can enhance the operational
adequacy of the educational organization.
• Collecting and analyzing information of a student such as attendance, test
scores, grades, and other issues take up a lot of data.
• So, big data makes an approach for a progressive framework wherein this
data can be stored and analyzed making it easier for the institutes to work
with.
Analyzing limitations and solutions of
existing data analytics
Objectives:
• Understanding Big data
Analytics
• Difference between data
analytics and Big data
analytics
• Limitations
• Solutions
Big Data Challenges
Big Data Challenges
Big Data Analytics
➢Big Data Analytics examines
large and different types of data
in order to uncover the hidden
patterns, insights, and
correlations.

➢Big Data Analytics is helping

large companies facilitate their
growth and development.
➢It majorly includes applying
various data mining algorithms
on a certain dataset.
Big Data Analytics
Big Data Analytics Use Cases

Real Time Intelligence Data Discovery Business Reporting

Big Data Analytics Reference Architectures
Why to put Big Data and analytics together?

➢Big data provides gigantic statistical samples, which enhance analytic

tool results.
➢Analytic tools and databases can now handle big data
➢The economics of analytics is now more embraceable than ever
➢There’s a lot to learn from messy data, as long as it’s big.
➢Big data is a special asset that merits leverage
➢Analytics based on large data samples reveals and leverages business
change
Drivers and Enablers

Big Data

Business Technology
Need Advances

Analytical
Platforms
Technologies for Big Data
(and Analytics)

 Data warehouses
 Appliances
 Analytical sandboxes
 In-memory analytics
 In-database analytics
 Columnar databases
Technologies for Big Data (and Analytics)

Streaming and Critical Event

Processing (CEP) Engines
 Cloud-based services
 Non relational databases
 Hadoop/MapReduce
Part 2: Contents

❑Big data technologies:

➢ Introduction to Hadoop
➢ Open-source technologies
➢ Cloud and big data mobile business intelligence
➢ Crowd sourcing analytics
➢ Inter and trans firewall analytics.
Introduction
to Hadoop
Hadoop/MapReduce
• Grew out of the efforts of Google, Yahoo, and
others to handle massive volumes of data
• Handles multi-structured data
• Process the data across commodity parallel
servers
• Open source software from the Apache
Software Foundation
Understanding Hadoop and its features

• Hadoop was created by Doug Cutting in order to build his

search engine called Nutch. He was joined by Mike Cafarella.
• Hadoop was based on the three papers published by Google:
Google File System, Google MapReduce, and Google Big Table.
• It is named after the toy elephant of Doug Cutting's son.
• Hadoop is under Apache license which means you can use it
anywhere without having to worry about licensing.
• It is quite powerful, popular and well supported.
• It is a framework to handle Big Data.
Hadoop Ecosystem
Understanding Hadoop and its features
• Started as a single project, Hadoop is now an umbrella of projects.
• All of the projects under the Apache Hadoop umbrella should have
followed three characteristics:
1. Distributed - They should be able to utilize multiple machines in
order to solve a problem.
2. Scalable - If needed it should be very easy to add more machines.
3. Reliable - If some of the machines fail, it should still work fine.
These are the three criteria for all the projects or components to
be under Apache Hadoop.
• Hadoop is written in Java so that it can run on kinds of devices.
Hadoop Ecosystem HDFS
• HDFS or Hadoop Distributed File System is the most important component because
the entire eco-system depends upon it. It is based on Google File System.
The Apache Hadoop • It is basically a file system which runs on many computers to provide a humongous
is a suite of storage. If you want to store your petabytes of data in the form of files, you can use
HDFS.
components. Let us • YARN or yet another resource negotiator keeps track of all the resources (CPU,
Memory) of machines in the network and run the applications. Any application
take a look at each of which wants to run in distributed fashion would interact with YARN.

these components HBase

• HBase provides humongous storage in the form of a database table. So, to manage
briefly. We will cover humongous records, you would like to use HBase.

the details in depth • HBase is a kind NoSQL Datastore.

MapReduce
during the full • MapReduce is a framework for distributed computing. It utilizes YARN to execute
course. programs and has a very good sorting engine.
• The programs are written in two parts Map and reduce. The map part transforms
the raw data into key-value and reduce part groups and combines data based on
the key.
Hive
• Writing code in MapReduce is very time-consuming. So, Apache
Hive makes it possible to write your logic in SQL which internally
converts it into MapReduce. So, you can process humongous
structured or semi-structured data with simple SQL using Hive.
SQOOP
• Sqoop is used to transport data between Hadoop and SQL
Databases. Sqoop utilizes MapReduce to efficiently transport data

Hadoop using many machines in a network.

Oozie

Ecosystem • Since a project might involve many components, there is a need of

a workflow engine to execute work in sequence.
• For example, a typical project might involve importing data from
SQL Server, running some Hive Queries, doing predictions with
Mahout, Saving data back to an SQL Server.
• This kind of workflow can be easily accomplished with Oozie.
User Interaction
• A user can either talk to the various components of Hadoop using
Command Line Interface, Web interface, API or using Oozie. We will
cover each of these components in details later.
Pig (Latin)
• Pig Latin is a simplified SQL like language to express your ETL needs in stepwise
fashion. Pig is the engine that translates Pig Latin into Map Reduce and executes it
on Hadoop.
Mahout
• Mahout is a library of machine learning algorithms that run in a distributed fashion.

Hadoop Since machine learning algorithms are complex and time-consuming, mahout
breaks down work such that it gets executed on MapReduce running on many
machines.

Ecosystem ZooKeeper
• Apache Zookeeper is an independent component which is used by various
distributed frameworks such as HDFS, HBase, Kafka, YARN. It is used for the
coordination between various components. It provides a distributed configuration
service, synchronization service, and naming registry for large distributed systems.
Flume
• Flume makes it possible to continuously pump the unstructured data from many
sources to a central source such as HDFS.
• If you have many machines continuously generating data such as Webserver Logs,
you can use flume to aggregate data at a central place such as HDFS.
Hadoop 2.X
core
components
• Hadoop 2.0 feature HDFS Federation allows horizontal
HADOOP 2.X scaling for Hadoop distributed file system (HDFS). This is
one of the many sought after features by enterprise class
CORE Hadoop users such as Amazon and eBay. HDFS Federation
supports multiple NameNodes and namespaces.
COMPONENTS • Hadoop 2.x has the following three Major Components:
• HDFS
• YARN
• MapReduce
Hadoop 2.x Architecture
HDFS
Hadoop 3.X
core
components
Why Hadoop 3.x

• With Java 7 attaining end of life in 2015, there was a need to revise the minimum runtime version
to Java 8 with a new Hadoop release so that the new release is supported by Oracle with security
fixes and also will allow hadoop to upgrade its dependencies to modern versions.
• With Hadoop 2.0 shell scripts were difficult to understand as hadoop developers had to read almost
all the shell scripts to understand what is the correct environment variable to set an option and how
to set it whether it is java.library.path or java classpath or GC options.
• With support for only 2 NameNodes, Hadoop 2 did not provide maximum level of fault tolerance
but with the release of Hadoop 3.x there will be additional fault tolerance as it offers multiple
NameNodes.
• Replication is a costly affair in Hadoop 2 as it follows a 3x replication scheme leading to 200%
additional storage space and resource overhead. Hadoop 3.0 will incorporate Erasure Coding in
place of replication consuming comparatively less storage space whilst providing same level of fault
tolerance.
Hadoop 3.x Architecture
Data Replication in 3.x
Difference between 1.x, 2,x
and 3,x Hadoop components
Difference between Hadoop 1.x, 2,x
Difference between Hadoop 2.x, 3,x
Open Source
Technologies
Open-source technologies

• Open source is a term that originally referred to open-source software

(OSS).
• Open-source software is code that is designed to be publicly accessible—
anyone can see, modify, and distribute the code as they see fit.
• Open-source software is developed in a decentralized and collaborative
way, relying on peer review and community production.
• Open-source software is often cheaper, more flexible, and has more
longevity than its proprietary peers because it is developed by communities
rather than a single author or company.
• Hadoop
• Atlas.ti
• Apache Storm
• Qubole
• Cassandra
Open-Source
• CouchDB
Big Data Tools
• Stats iQ
• Flink
• Cloudera
• RapidMiner
• DataCleaner
https://www.guru99.com/big-data-tools.html
Business Analytics
• Data Mining
• Reporting
• Performance metrics and benchmarking
• Descriptive Analysis
• Querying
• Statistical Analysis
• Data Visualization
• Data Preparation
Cloud and Big data mobile business intelligence

• Mobile business intelligence is software that extends desktop business intelligence (BI) applications so they can
be used on a mobile device.
• Business intelligence is a composition of system software that helps in generating meaningful and useful
information that enables the user to understand the keen insight of the company and know about the trends,
patterns, technologies, and reports.
• Big data is always assumed as huge and unstructured data, whereas big data is not just huge but it is also about
the composition of data, operation of days, and value-added in developing data.
Cloud and Big data mobile business
intelligence
• The cloud can help you process and analyze your big data faster,
leading to insights that can improve your products and business.
• Crowdsourcing, a combination of “crowd” and “outsourcing”
• first authored by Wired magazine in 2005
• It is an amazing sourcing model that use the profundity of
experience and thoughts of an open gathering instead of an
associations claim representatives.
• Crowdsourcing taps into the global world of ideas, helping
companies work through a rapid design process.
Crowd • You outsource to large crowds in an effort to make sure your
products or services are right.
sourcing • The upsides of utilizing crowdsourcing are professed to

analytics incorporate improved costs, speed, quality, adaptability,

versatility, or assorted variety.
• It has been utilized by new companies, expansive partnerships,
non-benefit associations, and to make normal products.
• crowdsourcing is a case of ICT marvel based collaboration,
collection, cooperation, agreement,and imagination.
• It is another method for doing work, where if the conditions are
correct, the group can outflank singular specialists.
• Geologically scattered individuals associated by web can
cooperate to deliver strategies and results that are worthy to
most.
Advantages

SAVE COSTS SAVE TIME EVOLVING REDUCE RISK INCREASED

INNOVATION EFFICIENCY
An association that has an errand it
needs performed

A people group (crowd) that is happy to

play out the errand willfully,
Key elements of
crowdsourcing An ICT situation that enables the work to
occur and the network to collaborate
with the association,

Shared advantage for the association

and the network.
CROWDSOURCING
BIG DATA
• Crowdsourcing is an imaginative
methodology in the time of big
data as it improves appropriated
handling and huge information
examination.
• Crowdsourcing big data enables associations to spare
their interior assets - Why procure over qualified staff for
huge information forms that publicly support workforce
can handle all the more proficiently, rapidly and cost
adequately.
• Crowdsourcing big data enables associations to profit by
the human component Content balance and
assessment investigation from criticism of clients, social
updates, surveys or remarks with publicly supported
workforce results in exceedingly exact, significant and
important bits of knowledge when contrasted with
CROWDSOURCING machines
BIG DATA • The appropriated idea of publicly supporting guarantees
that enormous information is handled at an unforeseen
speed which would not be conceivable to accomplish
in-house.
• Associations can fabricate applications dependent on
constant examination as publicly supported workforce
produce enormous information investigation at
ongoing. Endeavors don't need to be made a fuss over
being unfashionably late to the huge information party.
• Generally, an information researcher invests 78% of his energy in
setting up the information for enormous information investigation.
Therefore, a smart and financially savvy system for enormous
information organizations is hand over the unstructured
informational collections to a very much oversaw publicly
supporting stage so the group will educate all the more concerning
the data contained inside the information focuses gathered. For
instance, before the examination the group can tell whether the
information focuses are a Tweet or updates from Facebook and
whether it conveys a negative, positive or impartial meaning.
CROWDSOURCING
• Crowd gives structure (archive altering, sound translation, picture
in BIG DATA comment) to enormous information in this manner helping experts
Analytics improve their investigation prescient models by 25%.
• Crowdsourcing alongside enormous information examination can
help uncover concealed bits of knowledge from scattered however
associated data rapidly.
• Big information issues can be comprehended with more exactness
with publicly supporting as a dependable medium.
• The results from the group can be utilized by information
researchers to improve the productivity of the AI calculations.
crowdsourcing Context

Crowd — An individual or groups Community — Individuals or groups

dealing with a movement and dealing with a movement with
finishing it with zero ability to see some dimension of perceivability to
to different people or groups different people and groups

Competition — Individuals or Collaboration — Individuals or

groups taking a shot at and groups taking a shot at parts of a
finishing a movement movement and adding to its finish
autonomously (just a single victor) (everyone wins)
• Over the last 100 years, supply chains have
Inter and evolved to connect multiple companies and
enable them to collaborate to create
trans enormous value to the end-consumer via
firewall concepts like CPFR, VMI, etc.
• Decision sciences is witnessing a similar
analytics trend as enterprises are beginning to
collaborate on insights across the value
chain.
• We call this trend the move from intra- to
inter and trans-firewall analytics.
Inter and trans firewall analytics
Inter and trans firewall analytics

Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
SQLMAP
No ratings yet
SQLMAP
23 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
SAN Interview Prep Guide
No ratings yet
SAN Interview Prep Guide
6 pages
BDA GTU Study Material Presentations Unit-1 09082021103431AM
No ratings yet
BDA GTU Study Material Presentations Unit-1 09082021103431AM
57 pages
Data Engineering and Data Engineer - Students
No ratings yet
Data Engineering and Data Engineer - Students
56 pages
Bsd1313 Chapter 4
No ratings yet
Bsd1313 Chapter 4
129 pages
Spark Guide for 4th Year Engineering Students
No ratings yet
Spark Guide for 4th Year Engineering Students
241 pages
1 - Big Data and Hadoop Framework
No ratings yet
1 - Big Data and Hadoop Framework
40 pages
Hadoop Succinctly
No ratings yet
Hadoop Succinctly
83 pages
Getting Started With Hadoop Planning Guide
No ratings yet
Getting Started With Hadoop Planning Guide
24 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
38 pages
Apache NiFi Overview
No ratings yet
Apache NiFi Overview
20 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Introduction To Data Ingestion and Processing
No ratings yet
Introduction To Data Ingestion and Processing
28 pages
Slide - Deeply Practical Project Management
No ratings yet
Slide - Deeply Practical Project Management
272 pages
MySQL Enterprise Monitor
No ratings yet
MySQL Enterprise Monitor
352 pages
BDA GTU Study Material E-Notes All-Units 03122021014217PM
No ratings yet
BDA GTU Study Material E-Notes All-Units 03122021014217PM
42 pages
Linux Command Line Mastery
No ratings yet
Linux Command Line Mastery
16 pages
Mysql Replication
No ratings yet
Mysql Replication
125 pages
Basic Linux Commands Cheat Sheet
No ratings yet
Basic Linux Commands Cheat Sheet
2 pages
Earth Science (Big) Data Analytics: March 2018
No ratings yet
Earth Science (Big) Data Analytics: March 2018
37 pages
Brochure BigData Analytics
No ratings yet
Brochure BigData Analytics
2 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
37 pages
Ai and ML (All Modules)
No ratings yet
Ai and ML (All Modules)
735 pages
Big Data in Future Wireless Networks
No ratings yet
Big Data in Future Wireless Networks
10 pages
Introduction To Big Data Ecosystem V 2.0
No ratings yet
Introduction To Big Data Ecosystem V 2.0
76 pages
Matlab Part
No ratings yet
Matlab Part
10 pages
Reservoir Quality
No ratings yet
Reservoir Quality
16 pages
00 - Introduction (Read ME!!!)
No ratings yet
00 - Introduction (Read ME!!!)
50 pages
Geological Field Report.176
No ratings yet
Geological Field Report.176
32 pages
Data Science Bootcamp Insights
No ratings yet
Data Science Bootcamp Insights
161 pages
MATLAB ODE and PDE Solutions Guide
No ratings yet
MATLAB ODE and PDE Solutions Guide
9 pages
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
No ratings yet
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
10 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
106105223
No ratings yet
106105223
335 pages
SQL Training for Data Professionals
No ratings yet
SQL Training for Data Professionals
94 pages
Gentle Introduction To Apache Nifi For Data Flow. and Some Clojure
No ratings yet
Gentle Introduction To Apache Nifi For Data Flow. and Some Clojure
7 pages
Big Data Architecture Overview
No ratings yet
Big Data Architecture Overview
8 pages
Well Loging - Project
No ratings yet
Well Loging - Project
38 pages
S. Masue Introduction To Microservices
No ratings yet
S. Masue Introduction To Microservices
39 pages
Module 1 Data Warehousing Fundamentals
No ratings yet
Module 1 Data Warehousing Fundamentals
17 pages
CB116-Lab-Workbook (6.x)
No ratings yet
CB116-Lab-Workbook (6.x)
28 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
Big Data Analysis Workshop 2018
No ratings yet
Big Data Analysis Workshop 2018
4 pages
Hadoop Data Transfer with Sqoop
No ratings yet
Hadoop Data Transfer with Sqoop
21 pages
Big Data - S
No ratings yet
Big Data - S
79 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
PPDM TrainingProgramGuide 2016 10 PDF
No ratings yet
PPDM TrainingProgramGuide 2016 10 PDF
8 pages
Big Data and Data Warehouse
No ratings yet
Big Data and Data Warehouse
19 pages
Essential Guide To Data Science For Petroleum Engineers
No ratings yet
Essential Guide To Data Science For Petroleum Engineers
150 pages
22MATS21 MATLAB Progs
No ratings yet
22MATS21 MATLAB Progs
6 pages
Wellbore Instability & Rock Mechanics
No ratings yet
Wellbore Instability & Rock Mechanics
65 pages
Presentation On Python Code To Modelling in Oil & Gas
No ratings yet
Presentation On Python Code To Modelling in Oil & Gas
18 pages
HDPDeveloper EnterpriseSpark1 StudentGuide
100% (1)
HDPDeveloper EnterpriseSpark1 StudentGuide
244 pages
MySQL Replication for Developers
No ratings yet
MySQL Replication for Developers
38 pages
Data Mining Overview
No ratings yet
Data Mining Overview
14 pages
02 Big Data Pipeline
No ratings yet
02 Big Data Pipeline
61 pages
Unit - 1
No ratings yet
Unit - 1
104 pages
Bda U1
No ratings yet
Bda U1
78 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Questions Interview
No ratings yet
Questions Interview
7 pages
Industrial Training Report
No ratings yet
Industrial Training Report
17 pages
FDMS - Adobe Photoshop - Course Outline
No ratings yet
FDMS - Adobe Photoshop - Course Outline
6 pages
Unit 7-PHP
No ratings yet
Unit 7-PHP
12 pages
CSIntroduction
No ratings yet
CSIntroduction
4 pages
Tutorial - 5 and 6
100% (1)
Tutorial - 5 and 6
2 pages
020 - BCA - 2nd & 4th SEMESTER - REVISED REAPPEAR RESULT - 11 STUDENTS - NOVEMBER, 2020
No ratings yet
020 - BCA - 2nd & 4th SEMESTER - REVISED REAPPEAR RESULT - 11 STUDENTS - NOVEMBER, 2020
14 pages
Eve Lam CV
No ratings yet
Eve Lam CV
2 pages
MSc Course Book Recommendations
0% (1)
MSc Course Book Recommendations
3 pages
VINEET SHARMA CV Updated
No ratings yet
VINEET SHARMA CV Updated
3 pages
The Magic Cafe Forums - Psychological Subtleties
No ratings yet
The Magic Cafe Forums - Psychological Subtleties
5 pages
Nas326 3
No ratings yet
Nas326 3
6 pages
Class 6
No ratings yet
Class 6
25 pages
Fisayo Animashaun's Professional Profile
No ratings yet
Fisayo Animashaun's Professional Profile
4 pages
Module Handbook Adv Web Engineering-V1 0
No ratings yet
Module Handbook Adv Web Engineering-V1 0
10 pages
Alexander Tan: Expert Graphic Designer Profile
No ratings yet
Alexander Tan: Expert Graphic Designer Profile
1 page
How To Create Iris Service
100% (1)
How To Create Iris Service
15 pages
Amgen Case Study
100% (1)
Amgen Case Study
65 pages
Ict - Module 1
No ratings yet
Ict - Module 1
5 pages
Kwitansi Pengawas ANBK
No ratings yet
Kwitansi Pengawas ANBK
55 pages
BCA 1st To 6th Sem
No ratings yet
BCA 1st To 6th Sem
111 pages
SAP HCM Integration Training
No ratings yet
SAP HCM Integration Training
43 pages
Presentation IT Infrastructure
No ratings yet
Presentation IT Infrastructure
18 pages
Emo Aesthetic Computer Wallpapers
No ratings yet
Emo Aesthetic Computer Wallpapers
1 page
Poppy Playtime Chapter 3 - Google Search
No ratings yet
Poppy Playtime Chapter 3 - Google Search
1 page
Understanding Robotic Process Automation White Paper Hackett Group Canon Business Process Services
No ratings yet
Understanding Robotic Process Automation White Paper Hackett Group Canon Business Process Services
7 pages
Silverland Oil Piracy Script Role Play
No ratings yet
Silverland Oil Piracy Script Role Play
8 pages
New Cisco Certification
No ratings yet
New Cisco Certification
2 pages

Introduction To Bda

Uploaded by

Introduction To Bda

Uploaded by

Unit 1

Big Data Analytics

❑Big data technologies:

➢ Cloud and big data mobile business intelligence

➢ Crowd sourcing analytics

➢ Inter and trans firewall analytics.

• Can’t process with usual tools

• Distributed Architecture Needed

❑ A Revolution, authors explain it as – It is a way to

The speed of data accumulation also plays a role in determining whether

• The education sector holds a lot of information with regard to curriculum,

➢Big Data Analytics is helping

Real Time Intelligence Data Discovery Business Reporting

➢Big data provides gigantic statistical samples, which enhance analytic

Streaming and Critical Event

❑Big data technologies:

• Hadoop was created by Doug Cutting in order to build his

these components HBase

the details in depth • HBase is a kind NoSQL Datastore.

Hadoop using many machines in a network.

Ecosystem • Since a project might involve many components, there is a need of

• Open source is a term that originally referred to open-source software

analytics incorporate improved costs, speed, quality, adaptability,

SAVE COSTS SAVE TIME EVOLVING REDUCE RISK INCREASED

A people group (crowd) that is happy to

Shared advantage for the association

Crowd — An individual or groups Community — Individuals or groups

Competition — Individuals or Collaboration — Individuals or

You might also like