0% found this document useful (0 votes)

23 views15 pages

BDA UNIT-I

Uploaded by

Venkata satish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views15 pages

BDA UNIT-I

Uploaded by

Venkata satish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

SYLLABUS

Introduction to BigData Platform – Challenges of Conventional Systems - Intelligent

data analysis Nature of Data - Analytic Processes and Tools - Analysis vs Reporting -
Modern Data Analytic Tools.

1. Introduction to Big Data Platform

Introduction to Big Data:
Big Data has to deal with large and complex datasets that can be structured, Semi-
structured, or unstructured and will typically not fit into memory to be processed.
Big data is a field
• ways to analyze,
• systematically extract information or
• deal with data sets that are too large or complex to be dealt with by traditional data-
processing application software
Big data is a phrase like software testing
• describes about capturing, storing, querying, updating
• and analyzing of huge data sets that are so voluminous and complex
• that traditional data-processing application software are inadequate to deal with them
Why Big Data?
• Cost / Time Reduction
• Faster and Better Decision Making
• New Products and Services
A big data platform is a tool that has been developed by data management vendors with
an aim of increasing the scalability, availability, performance, and security of organizations
that are driven using big data. The platform is designed to handle voluminous data that is
multi-structured in real time. Big data platform - consists of big data storage, servers, database,
big data management, business intelligence and other big data management utilities. It supports
custom development, querying and integration with other systems. Primary benefit is reducing
the complexity of multiple vendors/ solutions into a one cohesive solution. Big data platform
are also delivered through cloud where the provider provides big data solutions and services.
New analytic applications drive the requirements for a big data platform -
•Integrate and manage the full variety, velocity and volume of data
•Apply advanced analytics to information in its native form
•Visualize all available data for ad-hoc analysis
•Development environment for building new analytic applications
•Workload optimization and scheduling
•Security and Governance
Augments open source Hadoop with enterprise capabilities:
–Enterprise-class storage
–Security
–Performance Optimization
–Enterprise integration
–Development tooling

1
–Analytic Accelerators
–Application and industry accelerators
–Visualization

Fig. 1.1 Big Data Platform

Workload Optimization:
Adaptive MapReduce
•Algorithm to optimize execution time of multiple small and large jobs
•Performance gains of 30% reduce overhead of task startup Hadoop System Scheduler
•Identifies small and large jobs from prior experience
•Sequences work to reduce overhead

Fig. 1.2 MapReduce

Big Data Platform - Stream Computing:
Built to analyze data in motion
•Multiple concurrent input streams
•Massive scalability
Process and analyze a variety of data
•Structured, unstructured content, video, audio
•Advanced analytic operators

2
Big Data Platform - Data Warehousing:
Workload optimized systems
–Deep analytics appliance
–Configurable operational analytics appliance
–Data warehousing software
Capabilities
•Massive parallel processing engine
•High performance OLAP
•Mixed operational and analytic workloads
Big Data Platform - Information Integration and Governance
Integrate any type of data to the big data platform
–Structured
–Unstructured
–Streaming
Governance and trust for big data
–Secure sensitive data
–Lineage and metadata of new big data sources
–Lifecycle management to control data growth
–Master data to establish single version of the truth
Leverage purpose-built connectors for multiple data sources:

Fig. 1.3 Multiple Data source

 Massive volume of structured data movement
•2.38 TB / Hour load to data warehouse
•High-volume load to Hadoop file system
 Ingest unstructured data into Hadoop file system
 Integrate streaming data sources
Big Data Platform - User Interfaces
•Business Users
•Visualization of a large volume and wide variety of data

3
•Developers
•Similarity in tooling and languages
•Mature open source tools with enterprise capabilities
•Integration among environments
Administrators
•Consoles to aid in systems management
Big Data Platform –Accelerators:
Analytic accelerators
–Analytics, operators, rule sets
Industry and Horizontal Application Accelerators
–Analytics
–Models
–Visualization / user interfaces
–Adapters
Big Data Platform - Analytic Applications:
Big Data Platform is designed for analytic application development and integration.
BI/Reporting – Cognos BI, Attivio
Predictive Analytics – SPSS, G2, SAS
Exploration/Visualization – BigSheets, Datameer
Instrumentation Analytics – Brocade, IBM GBS
Content Analytics – IBM Content Analytics
Functional Applications – Algorithmics, Cognos Consumer Insights, Clickfox, i2, IBM GBS
Industry Applications – TerraEchos, Cisco, IBM GBS

2. Challenges of Conventional System

Fundamental challenges
– How to store
– How to work with voluminous data sizes,
– and more important, how to understand data and turn it into a competitive
advantage.
How about Conventional system technology?
• CPU Speeds:
– 1990 - 44 MIPS at 40 MHz
– 2000 - 3,561 MIPS at 1.2 GHz
– 2010 - 147,600 MIPS at 3.3 GHz
• RAM Memory
– 1990 – 640K conventional memory (256K extended memory recommended)
– 2000 – 64MB memory
– 2010 - 8-32GB (and more)
• Disk Capacity
– 1990 – 20MB
– 2000 - 1GB
– 2010 – 1TB

4
• Disk Latency (speed of reads and writes) – not much improvement in last 7-10 years,
currently around 70 – 80MB / sec
How long it will take to read 1TB of data?
• 1TB (at 80Mb / sec):
• – 1 disk - 3.4 hours
• – 10 disks - 20 min
• – 100 disks - 2 min
• – 1000 disks - 12 sec
What do we care about when we process data?
• Handle partial hardware failures without going down:
– If machine fails, we should be switch over to stand by machine
– If disk fails – use RAID or mirror disk
• Able to recover on major failures:
– Regular backups
– Logging
– Mirror database at different site
• Capability:
– Increase capacity without restarting the whole system
– More computing power should equal to faster processing
• Result consistency:
– Answer should be consistent (independent of something failing) and returned
in reasonable amount of time

3. Intelligent Data Analysis

Intelligent Data Analysis (IDA) is one of the hot issues in the field of artificial
intelligence and information. Intelligent data analysis reveals implicit, previously unknown and
potentially valuable information or knowledge from large amounts of data. Intelligent data
analysis is also a kind of decision support process. Based on artificial intelligence, machine
learning, pattern recognition, statistics, database and visualization technology mainly, IDA
automatically extracts useful information, necessary knowledge and interesting models from a
lot of online data in order to help decision makers make the right choices. The process of IDA
generally consists of the following three stages: (1) data preparation; (2) rule finding or data
mining; (3) result validation and explanation. Data preparation involves selecting the required
data from the relevant data source and integrating this into a data set to be used for data mining.
Rule finding is working out rules contained in the data set by means of certain methods or
algorithms. Result validation requires examining these rules, and result explanation is giving
intuitive, reasonable and understandable descriptions using logical reasoning.
As the goal of intelligent data analysis is to extract useful knowledge, the process
demands a combination of extraction, analysis, conversion, classification, organization,
reasoning, and so on. It is challenging and fun working out how to choose appropriate methods
to resolve the difficulties encountered in the process. Intelligent data analysis methods and
tools, as well as the authenticity of obtained results pose us continued challenges.

4. Nature of Data
Big data is a term thrown around in a lot of articles, and for those who understand what
big data means that is fine, but for those struggling to understand exactly what big data is, it

5
can get frustrating. There are several definitions of big data as it is frequently used as an all-
encompassing term for everything from actual data sets to big data technology and big data
analytics. However, this article will focus on the actual types of data that are contributing to
the ever growing collection of data referred to as big data. Specifically we focus on the data
created outside of an organization, which can be grouped into two broad categories: structured
and unstructured.
Structured Data
1. Created
Created data is just that; data businesses purposely create, generally for market
research. This may consist of customer surveys or focus groups. It also includes more modern
methods of research, such as creating a loyalty program that collects consumer information or
asking users to create an account and login while they are shopping online.
2. Provoked
A Forbes Article defined provoked data as, “Giving people the opportunity to express
their views.” Every time a customer rates a restaurant, an employee, a purchasing experience
or a product they are creating provoked data. Rating sites, such as Yelp, also generate this type
of data.
3. Transacted
Transactional data is also fairly self-explanatory. Businesses collect data on every
transaction completed, whether the purchase is completed through an online shopping cart or
in-store at the cash register. Businesses also collect data on the steps that lead to a purchase
online. For example, a customer may click on a banner ad that leads them to the product pages
which then spurs a purchase. As explained by the Forbes article, “Transacted data is a powerful
way to understand exactly what was bought, where it was bought, and when. Matching this
type of data with other information, such as weather, can yield even more insights.
4. Compiled
Compiled data is giant databases of data collected on every U.S. household. Companies
like Acxiom collect information on things like credit scores, location, demographics, purchases
and registered cars that marketing companies can then access for supplemental consumer data.
5. Experimental
Experimental data is created when businesses experiment with different marketing
pieces and messages to see which are most effective with consumers. You can also look at
experimental data as a combination of created and transactional data.
Unstructured Data
People in the business world are generally very familiar with the types of structured
data mentioned above. However, unstructured is a little less familiar not because there’s less
of it, but before technologies like NoSQL and Hadoop came along, harnessing unstructured
data wasn’t possible. In fact, most data being created today is unstructured. Unstructured data,
as the name suggests, lacks structure. It can’t be gathered based on clicks, purchases or a
barcode, so what is it exactly?
6. Captured
Captured data is created passively due to a person’s behaviour. Every time someone
enters a search term on Google that is data that can be captured for future benefit. The GPS
info on our smart phones is another example of passive data that can be captured with big data
technologies.

6
7. User-generated
User-generated data consists of all of the data individuals are putting on the Internet
every day. From tweets, to Facebook posts, to comments on news stories, to videos put up on
YouTube, individuals are creating a huge amount of data that businesses can use to better target
consumers and get feedback on products.
Big data is made up of many different types of data. The seven listed above comprise
types of external data included in the big data spectrum. There are, of course, many types of
internal data that contribute to big data as well, but hopefully breaking down the types of data
helps you to better see why combining all of this data into big data is so powerful for business.
Sources of Big Data:

Fig. 1.4 Sources of Big Data

Classification of Types of Big Data
The following classification was developed by the Task Team on Big Data, in June 2013.
Comments and feedback are welcome.
1. Social Networks (human-sourced information): this information is the record of human
experiences, previously recorded in books and works of art, and later in photographs, audio
and video. Human-sourced information is now almost entirely digitized and stored everywhere
from personal computers to social networks. Data are loosely structured and often ungoverned.
 Social Networks: Facebook, Twitter, Tumblr etc.
 Blogs and comments
 Personal documents
 Pictures: Instagram, Flickr, Picasa etc.
 Videos: YouTube etc.
 Internet searches
 Mobile data content: text messages
 User-generated maps
 E-Mail
2. Traditional Business systems (process-mediated data): these processes record and monitor
business events of interest, such as registering a customer, manufacturing a product, taking an
order, etc. The process-mediated data thus collected is highly structured and includes
transactions, reference tables and relationships, as well as the metadata that sets its context.
Traditional business data is the vast majority of what IT managed and processed, in both
operational and BI systems. Usually structured and stored in relational database systems.
 Data produced by Public Agencies

7
 Medical records
 Data produced by businesses
 Commercial transactions
 Banking/stock records
 E-commerce
 Credit cards
3. Internet of Things (machine-generated data): derived from the phenomenal growth in the
number of sensors and machines used to measure and record the events and situations in the
physical world. The output of these sensors is machine-generated data, and from simple sensor
records to complex computer logs, it is well structured. As sensors proliferate and data volumes
grow, it is becoming an increasingly important component of the information stored and
processed by many businesses. Its well-structured nature is suitable for computer processing,
but its size and speed is beyond traditional approaches.
 Data from sensors
 Fixed sensors
 Home automation
 Weather/pollution sensors
 Traffic sensors/webcam
 Scientific sensors
 Security/surveillance videos/images
 Mobile sensors (tracking)
 Mobile phone location
 Cars
 Satellite images
 Data from computer systems
 Logs
 Web logs
5. Analytic Processes and Tool

Fig. 1.5 Traditional Analytic Vs Big Data Analytics

8
Open Source Big Data Tools
Based on the popularity and usability we have listed the following ten open source tools
as the best open source big data tools
1. Hadoop
Apache Hadoop is the most prominent and used tool in big data industry with its
enormous capability of large-scale processing data. This is 100% open source framework and
runs on commodity hardware in an existing data center. Furthermore, it can run on a cloud
infrastructure. Hadoop consists of four parts:
 Hadoop Distributed File System: Commonly known as HDFS, it is a distributed file
system compatible with very high scale bandwidth.
 MapReduce: A programming model for processing big data.
 YARN: It is a platform used for managing and scheduling Hadoop’s resources in
Hadoop infrastructure.
 Libraries: To help other modules to work with Hadoop.
2. Apache Spark
Apache Spark is the next hype in the industry among the big data tools. The key point
of this open source big data tool is it fills the gaps of Apache Hadoop concerning data
processing. Interestingly, Spark can handle both batch data and real-time data. As Spark does
in-memory data processing, it processes data much faster than traditional disk processing. This
is indeed a plus point for data analysts handling certain types of data to achieve the faster
outcome.
Apache Spark is flexible to work with HDFS as well as with other data stores, for
example with OpenStack Swift or Apache Cassandra. It’s also quite easy to run Spark on a
single local system to make development and testing easier. Spark Core is the heart of the
project, and it facilitates many things like
 distributed task transmission
 scheduling
 I/O functionality
Spark is an alternative to Hadoop’s MapReduce. Spark can run jobs 100 times faster
than Hadoop’s MapReduce.
3. Apache Storm
Apache Storm is a distributed real-time framework for reliably processing the
unbounded data stream. The framework supports any programming language. The unique
features of Apache Storm are:
 Massive scalability
 Fault-tolerance
 “fail fast, auto restart” approach
 The guaranteed process of every tuple
 Written in Clojure
 Runs on the JVM
 Supports direct acrylic graph(DAG) topology
9
 Supports multiple languages
 Supports protocols like JSON
Storm topologies can be considered similar to MapReduce job. However, in case of
Storm, it is real-time stream data processing instead of batch data processing. Based on the
topology configuration, Storm scheduler distributes the workloads to nodes. Storm can
interoperate with Hadoop’s HDFS through adapters if needed which is another point that makes
it useful as an open source big data tool.
4. Cassandra
Apache Cassandra is a distributed type database to manage a large set of data across the
servers. This is one of the best big data tools that mainly process structured data sets. It provides
highly available service with no single point of failure. Additionally, it has certain capabilities
which no other relational database and any NoSQL database can provide. These capabilities
are:
 Continuous availability as a data source
 Linear scalable performance
 Simple operations
 Across the data centers easy distribution of data
 Cloud availability points
 Scalability
 Performance
Apache Cassandra architecture does not follow master-slave architecture, and all nodes
play the same role. It can handle numerous concurrent users across data centers. Hence, adding
a new node is no matter in the existing cluster even at its up time.
5. RapidMiner
RapidMiner is a software platform for data science activities and provides an integrated
environment for:
 Preparing data
 Machine learning
 Text mining
 Predictive analytics
 Deep learning
 Application development
 Prototyping
This is one of the useful big data tools that support different steps of machine learning, such
as:
 Data preparation
 Visualization
 Predictive analytics
 Model validation
 Optimization
 Statistical modelling

10
 Evaluation
 Deployment
RapidMiner follows a client/server model where the server could be located on-premise,
or in a cloud infrastructure. It is written in Java and provides a GUI to design and execute
workflows. It can provide 99% of an advanced analytical solution.
6. MongoDB
MongoDB is an open source NoSQL database which is cross-platform compatible with
many built-in features. It is ideal for the business that needs fast and real-time data for instant
decisions. It is ideal for the users who want data-driven experiences. It runs on MEAN software
stack, NET applications and, Java platform.
Some notable features of MongoDB are:
 It can store any type of data like integer, string, array, object, Boolean, date etc.
 It provides flexibility in cloud-based infrastructure.
 It is flexible and easily partitions data across the servers in a cloud structure.
 MongoDB uses dynamic schemas. Hence, you can prepare data on the fly and
quickly. This is another way of cost saving.
7. R Programming Tool
This is one of the widely used open source big data tools in big data industry for
statistical analysis of data. The most positive part of this big data tool is – although used for
statistical analysis, as a user you don’t have to be a statistical expert. R has its own public
library CRAN (Comprehensive R Archive Network) which consists of more than 9000
modules and algorithms for statistical analysis of data.
R can run on Windows and Linux server as well inside SQL server. It also supports
Hadoop and Spark. Using R tool one can work on discrete data and try out a new analytical
algorithm for analysis. It is a portable language. Hence, an R model built and tested on a local
data source can be easily implemented in other servers or even against a Hadoop data lake.
8. Neo4j
Hadoop may not be a wise choice for all big data related problems. For example, when
you need to deal with large volume of network data or graph related issue like social networking
or demographic pattern, a graph database may be a perfect choice. Neo4j is one of the big data
tools that is widely used graph database in big data industry. It follows the fundamental
structure of graph database which is interconnected node-relationship of data. It maintains a
key-value pattern in data storing.
Notable features of Neo4j are:
 It supports ACID transaction
 High availability
 Scalable and reliable
 Flexible as it does not need a schema or data type to store data
 It can integrate with other databases
 Supports query language for graphs which is commonly known as Cypher.

11
9. Apache SAMOA
Apache SAMOA is among well-known big data tools used for distributed streaming
algorithms for big data mining. Not only data mining it is also used for other machine learning
tasks such as:
 Classification
 Clustering
 Regression
 Programming abstractions for new algorithms
It runs on the top of distributed stream processing engines (DSPEs). Apache Samoa is a
pluggable architecture and allows it to run on multiple DSPEs which include
 Apache Storm
 Apache S4
 Apache Samza
 Apache Flink
Due to below reasons, Samoa has got immense importance as the open source big data tool in
the industry:
 You can program once and run it everywhere
 Its existing infrastructure is reusable. Hence, you can avoid deploying cycles.
 No system downtime
 No need for complex backup or update process
10. HPCC
High-Performance Computing Cluster (HPCC) is another among best big data tools. It
is the competitor of Hadoop in big data market. It is one of the open source big data tools under
the Apache 2.0 license. Some of the core features of HPCC are:
 Helps in parallel data processing
 Open Source distributed data computing platform
 Follows shared nothing architecture
 Runs on commodity hardware
 Comes with binary packages supported for Linux distributions
 Supports end-to-end big data workflow management
 The platform includes:
Thor: for batch-oriented data manipulation, their linking, and analytics
Roxie: for real-time data delivery and analytics
 Implicitly a parallel engine
 Maintains code and data encapsulation
 Extensible
 Highly optimized
 Helps to build graphical execution plans
 It compiles into C++ and native machine code

12
6. Analysis Vs Reporting
Reporting
Reporting is the first step of working with data when it comes to marketing. Reporting
is really about the collection and organization of data points to start the storytelling process
(more on story-telling later). Yet, to plant a seed, storytelling is really the core of reporting
when it’s done well. The data should come together into an organized visual format, allowing
you to see changes against time or other relevant variables to show what has happened.
Good reporting should be organized with clear time parameters and have a clear visual
presentation, so you can start to gain understanding of where things are as they pertain to your
marketing efforts.
Analysis
Analysis is the step that should happen after the reports have been created. Analysis is
the process of searching the reports and data to start to tell a more complex story. Analysis
would look for the interactions between various data points to see how they influence each
other. This search for correlation, or for the cause-and-effect relationships that exist inside of
the data, is the basis of good analysis. To find, test, and confirm a true cause-and-effect
relationship within the data would mark a successful analysis of the data.
Sometimes there’s not enough data to truly do analysis in your existing data set. This
would mean that to do true analysis you would have to gather data from outside of your data
set. For example, if you were doing some analysis on your web data, you might have to gather
reports on your social media channels or referral channels to see a bigger picture of the data
and get an idea of how it’s influenced by outside sources.

7. Modern Data Analytic Tools

The growing demand and importance of data analytics in the market have generated
many openings worldwide. It becomes slightly tough to shortlist the top data analytics tools as
the open source tools are more popular, user-friendly and performance oriented than the paid
version. There are many open source tools which doesn’t require much/any coding and
manages to deliver better results than paid versions e.g. – R programming in data mining and
Tableau public, Python in data visualization. Below is the list of top 10 of data analytics tools,
both open source and paid version, based on their popularity, learning and performance.
1. R Programming
R is the leading analytics tool in the industry and widely used for statistics and data
modelling. It can easily manipulate your data and present in different ways. It has exceeded
SAS in many ways like capacity of data, performance and outcome. R compiles and runs on a
wide variety of platforms viz. -UNIX, Windows and MacOS. It has 11,556 packages and allows
you to browse the packages by categories. R also provides tools to automatically install all
packages as per user requirement, which can also be well assembled with Big data.
2. Tableau Public:
Tableau Public is a free software that connects any data source be it corporate Data
Warehouse, Microsoft Excel or web-based data, and creates data visualizations, maps,
dashboards etc. with real-time updates presenting on web. They can also be shared through
social media or with the client. It allows the access to download the file in different formats. If
you want to see the power of tableau, then we must have very good data source. Tableau’s Big

13
Data capabilities makes them important and one can analyze and visualize data better than any
other data visualization software in the market.
3. Python
Python is an object-oriented scripting language which is easy to read, write, maintain
and is a free open source tool. It was developed by Guido Van Rossum in late 1980’s which
supports both functional and structured programming methods.Python is easy to learn as it is
very similar to JavaScript, Ruby, and PHP. Also, Python has very good machine learning
libraries viz. Scikitlearn, Theano, Tensorflow and Keras. Another important feature of Python
is that it can be assembled on any platform like SQL server, a MongoDB database or JSON.
Python can also handle text data very well.
4. SAS:
Sas is a programming environment and language for data manipulation and a leader in
analytics, developed by the SAS Institute in 1966 and further developed in 1980’s and 1990’s.
SAS is easily accessible, manageable and can analyze data from any sources. SAS introduced
a large set of products in 2011 for customer intelligence and numerous SAS modules for web,
social media and marketing analytics that is widely used for profiling customers and prospects.
It can also predict their behaviours, manage, and optimize communications.
5. Apache Spark
The University of California, Berkeley’s AMP Lab, developed Apache in 2009. Apache
Spark is a fast large-scale data processing engine and executes applications in Hadoop clusters
100 times faster in memory and 10 times faster on disk. Spark is built on data science and its
concept makes data science effortless. Spark is also popular for data pipelines and machine
learning models development.
Spark also includes a library – MLlib, that provides a progressive set of machine
algorithms for repetitive data science techniques like Classification, Regression, Collaborative
Filtering, Clustering, etc.
6. Excel
Excel is a basic, popular and widely used analytical tool almost in all industries.
Whether you are an expert in Sas, R or Tableau, you will still need to use Excel. Excel becomes
important when there is a requirement of analytics on the client’s internal data. It analyzes the
complex task that summarizes the data with a preview of pivot tables that helps in filtering the
data as per client requirement. Excel has the advance business analytics option which helps in
modelling capabilities which have prebuilt options like automatic relationship detection, a
creation of DAX (Data Analysis Expressions) measures and time grouping.
7. RapidMiner:
RapidMiner is a powerful integrated data science platform developed by the same
company that performs predictive analysis and other advanced analytics like data mining, text
analytics, machine learning and visual analytics without any programming. RapidMiner can
incorporate with any data source types, including Access, Excel, Microsoft SQL, Tera data,
Oracle, Sybase, IBM DB2, Ingres, MySQL, IBM SPSS, Dbase etc. The tool is very powerful

14
that can generate analytics based on real-life data transformation settings, i.e. you can control
the formats and data sets for predictive analysis.
8. KNIME
KNIME Developed in January 2004 by a team of software engineers at University of
Konstanz. KNIME is leading open source, reporting, and integrated analytics tools that allow
you to analyze and model the data through visual programming, it integrates various
components for data mining and machine learning via its modular data-pipelining concept.
9. QlikView
QlikView has many unique features like patented technology and has in-memory data
processing, which executes the result very fast to the end users and stores the data in the report
itself. Data association in QlikView is automatically maintained and can be compressed to
almost 10% from its original size. Data relationship is visualized using colours – a specific
colour is given to related data and another colour for non-related data.
10. Splunk:
Splunk is a tool that analyzes and searches the machine-generated data. Splunk pulls all
text-based log data and provides a simple way to search through it, a user can pull in all kind
of data, and perform all sort of interesting statistical analysis on it, and present it in different
formats.

BDA1-4 bunits
No ratings yet
BDA1-4 bunits
113 pages
Big Data Analytics Unit-1
100% (2)
Big Data Analytics Unit-1
5 pages
Unit 1
No ratings yet
Unit 1
20 pages
BDA-Unit-1 (2)
No ratings yet
BDA-Unit-1 (2)
39 pages
UNIT-1_BigData
No ratings yet
UNIT-1_BigData
10 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Unit-1 Introduction to Data Analytics.pptx
No ratings yet
Unit-1 Introduction to Data Analytics.pptx
35 pages
Bigdata
No ratings yet
Bigdata
12 pages
Unit 4 LT
No ratings yet
Unit 4 LT
16 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
BA ppt
No ratings yet
BA ppt
17 pages
Big Data Components
No ratings yet
Big Data Components
58 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Module 1
No ratings yet
Module 1
21 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Index: Big Data Analytics: Turning Big Data Into Big Money
No ratings yet
Index: Big Data Analytics: Turning Big Data Into Big Money
8 pages
What Is Data
No ratings yet
What Is Data
20 pages
Big Data Ibm 2014
No ratings yet
Big Data Ibm 2014
33 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Big Data complete Notes
No ratings yet
Big Data complete Notes
33 pages
Unit 1 - From Big Data Analytics PDF
No ratings yet
Unit 1 - From Big Data Analytics PDF
5 pages
Da Unit-1
No ratings yet
Da Unit-1
16 pages
Big Data Framework
No ratings yet
Big Data Framework
6 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
j.ijdsa.20241005.11
No ratings yet
j.ijdsa.20241005.11
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
big data analytics02
No ratings yet
big data analytics02
20 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
UNIT1 -BDH
No ratings yet
UNIT1 -BDH
77 pages
Big Data Analytics On Large Scale Shared Storage System: First Seminar
No ratings yet
Big Data Analytics On Large Scale Shared Storage System: First Seminar
22 pages
unit 1 b tech 3 year bd
No ratings yet
unit 1 b tech 3 year bd
10 pages
BDS Session 3
No ratings yet
BDS Session 3
56 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
No ratings yet
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
9 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
Big Data Components
No ratings yet
Big Data Components
31 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
BIG Data_Unit_1
No ratings yet
BIG Data_Unit_1
24 pages
Lesson 1 - Hadoop and Big Data Overview
No ratings yet
Lesson 1 - Hadoop and Big Data Overview
57 pages
Big Data Analysis Concepts and References
100% (1)
Big Data Analysis Concepts and References
60 pages
Introduction
No ratings yet
Introduction
10 pages
Big Data Spectrum
No ratings yet
Big Data Spectrum
61 pages
3
No ratings yet
3
12 pages
Detailednotes_unit1_Big Data
No ratings yet
Detailednotes_unit1_Big Data
22 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Unit 1
No ratings yet
Unit 1
14 pages
Introduction To Big Data, Hadoop and Spark
No ratings yet
Introduction To Big Data, Hadoop and Spark
40 pages
Big Data Analytics
No ratings yet
Big Data Analytics
4 pages
Big Data..Unit-1 Notes
No ratings yet
Big Data..Unit-1 Notes
16 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
SHS 2 ICT DEC 2019 - Semester Exams
No ratings yet
SHS 2 ICT DEC 2019 - Semester Exams
8 pages
CSC203 Practical Exercise
No ratings yet
CSC203 Practical Exercise
15 pages
2206.00267v2
No ratings yet
2206.00267v2
10 pages
Odin2 Manual v2.3.0
No ratings yet
Odin2 Manual v2.3.0
76 pages
Digital Signal Processing Tutorial
80% (5)
Digital Signal Processing Tutorial
102 pages
Detail Design: Chapter - 5
No ratings yet
Detail Design: Chapter - 5
22 pages
Resumen Cisco Vs Huawei CLI Command
0% (1)
Resumen Cisco Vs Huawei CLI Command
6 pages
Base Station Alarm and Notification Handling Reference (GULI)
No ratings yet
Base Station Alarm and Notification Handling Reference (GULI)
17 pages
1.VLSI Design Flow
No ratings yet
1.VLSI Design Flow
33 pages
CMPS Comprehensive Exam MCQ PRACTICE Questions
No ratings yet
CMPS Comprehensive Exam MCQ PRACTICE Questions
75 pages
Edx Course Lab Programs
No ratings yet
Edx Course Lab Programs
19 pages
Ai Unleashed
No ratings yet
Ai Unleashed
108 pages
Radeon Rx 580 Gaming x 8g
No ratings yet
Radeon Rx 580 Gaming x 8g
1 page
SLC Cube3 PDF
100% (1)
SLC Cube3 PDF
2 pages
Pktbelajar (SFILE
No ratings yet
Pktbelajar (SFILE
4 pages
Mobile Apn
No ratings yet
Mobile Apn
9 pages
Sad Notes
No ratings yet
Sad Notes
25 pages
AI Practical File
No ratings yet
AI Practical File
30 pages
1-Introduction Preliminaries of PPL
No ratings yet
1-Introduction Preliminaries of PPL
33 pages
08-04-2025 Technical Offer for CCTV System for Harpoon Workshop
No ratings yet
08-04-2025 Technical Offer for CCTV System for Harpoon Workshop
1 page
DBT Farmer User Manual (Application Portal)
No ratings yet
DBT Farmer User Manual (Application Portal)
26 pages
Emotional Interaction
No ratings yet
Emotional Interaction
9 pages
Pa-Pcm Terminal Touch
No ratings yet
Pa-Pcm Terminal Touch
5 pages
4G and 5G Baseband unit
No ratings yet
4G and 5G Baseband unit
2 pages
Assignment2021 1 2-1
No ratings yet
Assignment2021 1 2-1
3 pages
02 - Synchronization Clock Frequency Modulation Technique For Compromising Emanations Security - 2009
No ratings yet
02 - Synchronization Clock Frequency Modulation Technique For Compromising Emanations Security - 2009
4 pages
Ibm Pilot Head Office
100% (1)
Ibm Pilot Head Office
7 pages
Proposal - Digital Service Delivery Through PACS - Himank Sharma
No ratings yet
Proposal - Digital Service Delivery Through PACS - Himank Sharma
10 pages
FUNCTION
No ratings yet
FUNCTION
5 pages
Exercise 4. Getting Started With Watson Machine Learning: Estimated Time
No ratings yet
Exercise 4. Getting Started With Watson Machine Learning: Estimated Time
24 pages

BDA UNIT-I

Uploaded by

BDA UNIT-I

Uploaded by

SYLLABUS

Introduction to BigData Platform – Challenges of Conventional Systems - Intelligent

1. Introduction to Big Data Platform

Fig. 1.1 Big Data Platform

Fig. 1.2 MapReduce

Fig. 1.3 Multiple Data source

2. Challenges of Conventional System

3. Intelligent Data Analysis

Fig. 1.4 Sources of Big Data

Fig. 1.5 Traditional Analytic Vs Big Data Analytics

7. Modern Data Analytic Tools

You might also like