0% found this document useful (0 votes)

38 views47 pages

Bda Unit 1

The document provides an introduction to big data analytics, including defining key concepts like structured, unstructured and semi-structured data. It also outlines some of the challenges of big data like volume, velocity and variety as well as how big data is processed using tools like Hadoop MapReduce.

Uploaded by

bhargavvobilisetti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views47 pages

Bda Unit 1

Uploaded by

bhargavvobilisetti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

BIG DATA ANALYTICS

UNIT I INTRODUCTION TO BIG DATA

Syllabus:

UNIT I:
Introduction: Introduction to big data: Introduction to Big Data Platform,
Challenges of Conventional Systems, Intelligent data analysis, Nature of Data,
Analytic Processes and Tools, Analysis vs Reporting.

1
1. INTRODUCTION TO BIGDATA PLATFORM

1.1 Introduction Data

1.1.1 Data and Information

 Data are plain facts.
 The word "data" is plural for "datum."
 Data is nothing but facts and statistics stored or free flowing over a network, generally
it's raw and unprocessed.
 When data are processed, organized, structured or presented in a given context so as to
make them useful, they are called Information.
 It is not enough to have data (such as statistics on the economy).
 Data themselves are fairly useless, but when these data are interpreted and processed to
determine its true meaning, they becomes useful and can be called Information.

For example: When you visit any website, they might store you IP address, that is data,
in return they might add a cookie in your browser, marking you that you visited the
website, that is data, your name, it's data, your age, it's data.

• What is Data?
– The quantities, characters, or symbols on which operations are performed by a
computer,
– which may be stored and transmitted in the form of electrical signals and
– recorded on magnetic, optical, or mechanical recording media.

• 3 Actions on Data
– Capture
– Transform
– Store

BigData

• Big Data may well be the Next Big Thing in the IT world.
• Big data burst upon the scene in the first decade of the 21st century.

2
• The first organizations to embrace it were online and startup firms.
• Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the
beginning.
• Like many new information technologies,
• big data can bring about dramatic cost reductions,
• substantial improvements in the time required to perform a computing task, or
• new product and service offerings.

• Walmart handles more than 1 million customer transactions every hour.

• Facebook handles 40 billion photos from its user base.
• Decoding the human genome originally took 10years to process; now it can be achieved
in one week.

• What is Big Data?

– Big Data is also data but with a huge size.
– Big Data is a term used to describe a collection of data that is huge in size and yet
growing exponentially with time.
– In short such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.

No single definition; here is from Wikipedia:

• Big data is the term for
– a collection of data sets so large and complex that it becomes difficult to process
using on-hand database management tools or traditional data processing
applications.

Examples of Bigdata

• Following are some the examples of Big Data-

– The New York Stock Exchange generates about one terabyte of new trade data
per day.
– Other examples of Big Data generation includes
• stock exchanges,
• social media sites,
• jet engines,
• etc.

3
Types Of Big Data

• BigData could be found in three forms:

1. Structured
2. Unstructured
3. Semi-structured

What is Structured Data?

• Any data that can be stored, accessed and processed in the form of fixed format is
termed as a 'structured' data.
• Developed techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it.
• Foreseeing issues of today :
– when a size of such data grows to a huge extent, typical sizes are being in the rage
of multiple zetta bytes.
• Do you know?
• 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
– That is why the name Big Data is given and imagine the challenges involved in
its storage and processing?
• Do you know?
– Data stored in a relational database management system is one example of
a 'structured' data.

• An 'Employee' table in a database is an example of Structured Data:

Employee_ID Employee_Name Gender Department Salary_In_lacs

2365 Rajesh Kulkarni Male Finance 650000

3398 Pratibha Joshi Female Admin 650000

7465 Shushil Roy Male Admin 500000

7500 Shubhojit Das Male Finance 500000

7699 Priya Sane Female Finance 550000

4
Unstructured Data
• Any data with unknown form or the structure is classified as unstructured data.
• In addition to the size being huge,
– un-structured data poses multiple challenges in terms of its processing for
deriving value out of it.
– A typical example of unstructured data is
• a heterogeneous data source containing a combination of simple text files,
images, videos etc.
• Now day organizations have wealth of data available with them but unfortunately,
– they don't know how to derive value out of it since this data is in its raw form or
unstructured format.

• Example of Unstructured data

– The output returned by 'Google Search'

Semi-structured Data

• Semi-structured data can contain both the forms of data.

• Semi-structured data as a structured in form
– but it is actually not defined with e.g. a table definition in relational DBMS.
• Example of semi-structured data is
– a data represented in an XML file.

• Personal data stored in an XML file.

<rec>
<name>Prashant Rao</name>
<sex>Male</sex>
<age>35</age>
</rec>
<rec>
<name>Seema R.</name>
<sex>Female</sex>
<age>41</age>

5
</rec>
<rec>
<name>Satish Mane</name>
<sex>Male</sex>
<age>29</age>
</rec>
<rec>
<name>Subrato Roy</name>
<sex>Male</sex>
<age>26</age>
</rec>
<rec>
<name>Jeremiah J.</name>
<sex>Male</sex>
<age>35</age></rec>

Characteristics of BD OR 3Vs of Big Data

• Three Characteristics of Big Data V3s:

1) Volume
 Data quantity
2) Velocity
 Data Speed
3) Variety
 Data Types

Growth of Big Data

6
Storing Big Data

• Analyzing your data characteristics

– Selecting data sources for analysis
– Eliminating redundant data
– Establishing the role of NoSQL
• Overview of Big Data stores
– Data models: key value, graph, document, column-family
– Hadoop Distributed File System (HDFS)
– Hbase
– Hive

Processing Big Data

• Integrating disparate data stores
– Mapping data to the programming framework
– Connecting and extracting data from storage
– Transforming data for processing
– Subdividing data in preparation for Hadoop MapReduce
• Employing Hadoop MapReduce

7
– Creating the components of Hadoop MapReduce jobs
– Distributing data processing across server farms
– Executing Hadoop MapReduce jobs
– Monitoring the progress of job flows

Why Big Data?

• Growth of Big Data is needed

– Increase of storage capacities
– Increase of processing power
– Availability of data(different data types)
– Every day we create 2.5 quintillion bytes of data; 90% of the data in the world
today has been created in the last two years alone

 Huge storage need in Real Time Applications

– FB generates 10TB daily
– Twitter generates 7TB of data Daily
– IBM claims 90% of today’s stored data was generated in just the last two years.

How Is Big Data Different?

1) Automatically generated by a machine (e.g. Sensor embedded in an engine)

2) Typically an entirely new source of data(e.g. Use of the internet)
3) Not designed to be friendly(e.g. Text streams)
4) May not have much values
– Need to focus on the important part

Big Data sources

• Users
• Application
• Systems
• Sensors

8
Moved to

• Large and growing files (Big data files)

Risks of Big Data

• Will be so overwhelmed
– Need the right people and solve the right problems
• Costs escalate too fast
– Isn’t necessary to capture 100%
• Many sources of big data is privacy
– self-regulation
– Legal regulation

Leading Technology Vendors

Example Vendors Commonality

IBM – Netezza • MPP architectures

EMC – Greenplum • Commodity Hardware

Oracle – Exadata • RDBMS based

• Full SQL compliance

9
1.1.2 Basics of Bigdata Platform

 Big Data platform is IT solution which combines several Big Data tools and utilities into
one packaged solution for managing and analyzing Big Data.

 Big data platform is a type of IT solution that combines the features and capabilities of
several big data application and utilities within a single solution.
 It is an enterprise class IT platform that enables organization in developing, deploying,
operating and managing a big data infrastructure /environment.

What is Big Data Platform?

 Big Data Platform is integrated IT solution for Big Data management which combines
several software system, software tools and hardware to provide easy to use tools system
to enterprises.
 It is a single one-stop solution for all Big Data needs of an enterprise irrespective of size
and data volume. Big Data Platform is enterprise class IT solution for developing,
deploying and managing Big Data.
 There are several Open source and commercial Big Data Platform in the market with
varied features which can be used in Big Data environment.
 Big data platform is a type of IT solution that combines the features and capabilities of
several big data application and utilities within a single solution.
 It is an enterprise class ITplatformthat enables organization in developing, deploying,
operating and managing abig datainfrastructure /environment.
 Big data platform generally consists of big data storage, servers, database, big data
management, business intelligence and other big data management utilities
 It also supports custom development, querying and integration with other systems.
 The primary benefit behind a big data platform is to reduce the complexity of multiple
vendors/ solutions into a one cohesive solution.
 Big data platform are also delivered through cloud where the provider provides an all
inclusive big data solutions and services.

1.1.2.2 Features of Big Data Platform

Here are most important features of any good Big Data Analytics Platform:

10
a) Big Data platform should be able to accommodate new platforms and tool based on the
business requirement. Because business needs can change due to new technologies or
due to change in business process.
b) It should support linear scale-out
c) It should have capability for rapid deployment
d) It should support variety of data format
e) Platform should provide data analysis and reporting tools
f) It should provide real-time data analysis software
g) It should have tools for searching the data through large data sets

Big data is a term for data sets that are so large or complex that traditional data processing
applications are inadequate.

Challenges include
 Analysis,
 Capture,
 Data Curation,
 Search,
 Sharing,
 Storage,
 Transfer,
 Visualization,
 Querying,
 Updating

Information Privacy.
 The term often refers simply to the use of predictive analytics or certain other
advancedmethods to extract value from data, and seldom to a particular size of data set.

 ACCURACY in big data may lead to more confident decision making, and better
decisions can result in greater operational efficiency, cost reduction and reduced risk.
 Big data usually includes data sets with sizes beyond the ability of commonly used
 software tools to capture, curate, manage, and process data within a tolerable elapsed
 time. Big data "size" is a constantly moving target.
 Big data requires a set of techniques and technologies with new forms of integration to
 reveal insights from datasets that are diverse, complex, and of a massive scale

11
1.1.2.3 List of BigData Platforms
a) Hadoop
b) Cloudera
c) Amazon Web Services
d) Hortonworks
e) MapR
f) IBM Open Platform
g) Microsoft HDInsight
h) Intel Distribution for Apache Hadoop
i) Datastax Enterprise Analytics
j) Teradata Enterprise Access for Hadoop
k) Pivotal HD

a) Hadoop
What is Hadoop?
 Hadoop is open-source, Java based programming framework and server software which
is used to save and analyze data with the help of 100s or even 1000s of commodity
servers in a clustered environment.
 Hadoop is designed to storage and process large datasets extremely fast and in fault
tolerant way.
 Hadoop uses HDFS (Hadoop File System) for storing data on cluster of commodity
computers. If any server goes down it know how to replicate the data and there is no loss
of data even in hardware failure.
 Hadoop is Apache sponsored project and it consists of many software packages which
runs on the top of the Apache Hadoop system.
 Top Hadoop based Commercial Big Data Analytics Platform
 Hadoop provides set of tools and software for making the backbone of the Big Data
analytics system.
 Hadoop ecosystem provides necessary tools and software for handling and analyzing
Big Data.
 On the top of the Hadoop system many applications can be developed and plugged-in to
provide ideal solution for Big Data needs.

b) Cloudera

12
 Cloudra is one of the first commercial Hadoop based Big Data Analytics Platform
offering Big Data solution.
 Its product range includes Cloudera Analytic DB, Cloudera Operational DB, Cloudera
Data Science & Engineering and Cloudera Essentials.
 All these products are based on the Apache Hadoop and provides real-time processing
and analytics of massive data sets.

Website: https://www.cloudera.com

c) Amazon Web Services

 Amazon is offering Hadoop environment in cloud as part of its Amazon Web Services
package.
 AWS Hadoop solution is hosted solution which runs on Amazon’s Elastic Cloud
Compute and Simple Storage Service (S3).
 Enterprises can use the Amazon AWS to run their Big Data processing analytics in the
cloud environment.
 Amazon EMR allows companies to setup and easily scale Apache Hadoop, Spark,
HBase, Presto, Hive, and other Big Data Frameworks using its cloud hosting
environment.
Website: https://aws.amazon.com/emr/

d) Hortonworks
 Hortonworks is using 100% open-source software without any propriety software.
Hortonworks were the one who first integrated support for Apache HCatalog.
 The Hortonworks is a Big Data company based in California.
 This company is developing and supports application for Apache Hadoop.
Hortonworks Hadoop distribution is 100% open source and its enterprise ready with following
features:

 Centralized management and configuration of clusters

 Security and data governance are built in feature of the system
 Centralized security administration across the system
Website: https://hortonworks.com/

13
e) MapR
 MapR is another Big Data platform which us using the Unix file system for handling
data.
 It is not using HDFS and this system is easy to learn anyone familiar with the Unix
system.
 This solution integrates Hadoop, Spark, and Apache Drill with a real-time data
processing feature.
 Website: https://mapr.com

f) IBM Open Platform

 IBM also offers Big Data Platform which is based on the Hadoop eco-system software.
 IBM well knows company in software and data computing.
It uses the latest Hadoop software and provides following features (IBM Open Platform
Features):

 Based on 100% Open source software

 Native support for rolling Hadoop upgrades
 Support for long running applications within YEARN.
 Support for heterogeneous storage which includes HDFS for in-memory and SSD in
addition to HDD
 Native support for Spark, developers can use Java, Python and Scala to written program
 Platform includes Ambari, which is a best tool for provisioning, managing & monitoring
Apache Hadoop clusters
 IBM Open Platform includes all the software of Hadoop ecosystem e.g. HDFS, YARN,
MapReduce, Ambari, Hbase, Hive, Oozie, Parquet, Parquet Format, Pig, Snappy, Solr,
Spark, Sqoop, Zookeeper, Open JDK, Knox, Slider
 Developer can download the trial Docker Image or Native installer for testing and
learning the system
 Application is well supported by IBM technology team

Website: https://www.ibm.com/analytics/us/en/technology/hadoop/

14
g) Microsoft HDInsight
 The Microsoft HDInsight is also based on the Hadoop distribution and it’s a commercial
Big Data platform from Microsoft.
 Microsoft is software giant which is into development of windows operating system for
Desktop users and Server users.
 This is the big Hadoop distribution offering which runs on the Windows and Azure
environment.
 It offer customized, optimized open source Hadoop based analytics clusters which uses
Spark, Hive, MapReduce, HBase, Strom, Kafka and R Server which runs on the Hadoop
system on windows/Azure environment.

Website: https://azure.microsoft.com/en-in/services/hdinsight/

h) Distribution for Apache Hadoop

 Intel also offers its package distribution of Hadoop software which includes company’s
Graph builder and Analytics toolkit.
 This distribution can be purchased with various channel partners and come with support
and yearly subscription.

Website: http://www.intel.com/content/www/us/en/software/intel-distribution-for-apache-
hadoop-software-solutions.html

i) Datastax Enterprise Analytics

 Datastax Enterprise Analytics is another play in the Big Data Analytics platform which
offers its own distribution which is based on Apache Cassandra database management
system which runs on the top of Apache Hadoop installation.
 It also included propriety system with a dashboard which is used for security
management, searching data, dashboard for viewing various details and visualization
engine.
 It can handle analysis of 10 million data points every second, so it’s a powerful system.
Features:

 It provides powerful indexing, search, analytics and graph functionality into the Big
Data system
 It supports advanced indexing and searching features

15
 It comes with powerful integrated analytics system
 It provides multi-model support into the platform. It supports key-value, tabular,
JSON/Document and graph data formats. Powerful search features enables the users to
get required data in real-time
Website: http://www.datastax.com/

j) Teradata Enterprise Access for Hadoop

 Teradata Enterprise Access for Hadoop is another player into Big Data Platform and it
offers package Hadoop distribution which again based on Hortonworks distribution.
 Teradata Enterprise Access for Hadoop offers Hardware and software in its Big Data
solution which can be used by enterprise to process its data sets.
Company offers:

 Teradata
 Teradata Aster and
 Hadoop
as part of its package solution.

Website: http://www.teradata.com

k) Pivotal HD
Pivotal HD offers is another Hadoop distribution with includes includes database tools
Greenplum and analytics platform Gemfire.
Features:

 It can be installed on-premise and in public clouds

 This system is based on the open source software
 It supports data evolution within the 3 years subscription period.
Indian railways, BMW, China Citic Bank and many other big players are using this distribution
of Big Data Platform.

16
Website: https://pivotal.io/

1.1.3 Open Source Big Data Platform

There are various open-source Big Data Platform which can be used for Big Data handling and
data analytics in real-time environment.
Both small and Big Enterprise can use these tools for managing their enterprise data for getting
best value from their enterprise data.

i) Apache Hadoop
 Apache Hadoop is Big Data platform and software package which is Apache sponsored
project.
 Under Apache Hadoop project various other software is being developed which runs on
the top of Hadoop system to provide enterprise grade data management and analytics
solutions to enterprise.
 Apache Hadoop is open-source, distributed file system which provides data processing
and analysis engine for analyzing large set of data.
 Hadoop can run on Windows, Linux and OS X operating systems, but it is mostly used
on Ubunut and other Linux variants.

ii) MapReduce
 The MapReduce engine was originally written by Google and this is the system which
enables the developers to write program which can run in parallel on 100 or even 1000s
of computer nodes to process vast data sets.
 After processing all the job on the different nodes it comes the results and return it to the
program which executed the MapReduce job.
 This software is platform independent and runs on the top of Hadoop ecosystem. It can
process tremendous data at very high speed in Big Data environment.

iii) GridGain
 GridGain is another software system for parallel processing of data just like MapRedue.
GridGain is an alternative of Apache MapReduce.
17

17
 GridGain is used for the processing of in-memory data and its is based on Apache Iginte
framework.
 GridGain is compatable with the Hadoop HDFS and runs on the top of Hadoop
ecosystem.
 Then enterprise version of GridGain can be purchased from official website of
GridGain. While free version can be downloaded from GitHub repository.

Website: https://www.gridgain.com/

iv) HPCC Systems

 HPCC Systems stands for "high performance computing cluster” and this system is
developed by LexisNexis Risk Solutions.
 According to the company this software is much faster than Hadoop and can be used in
the cloud environment.
 HPCC Systems is developed in C++ and compiled into binary code for distribution.
 HPCC Systems is open-source, massive parallel processing system which is installed in
cluster to process data in real-time.
 It requires Linux operating system and runs on the commodity servers connected with
high-speed network.
 It is scalable from one node to 1000s of nodes to provide performance and scalability.
 Website: https://hpccsystems.com/

v) Apache Storm
 Apache Storm is a software for real-time computing and distributed processing.
 Its free and open-source software developed at Apache Software foundation. It’s a real-
time, parallel processing engine.
 Apache Storm is highly scalable, fault-tolerant which supports almost all the
programming language.

vi) Apache Strom can be used in:

 Realtime analytics
 Online machine learning

18
 Continuous computation
 Distributed RPC
 ETL
 And all other places where real-time processing is required.
Apache Strom is used by Yahoo, Twitter, Spotify, Yelp, Flipboard and many other data giants.

Website: http://storm.apache.org/

vii) Apache Spark

 Apache Spark is software that runs on the top of Hadoop and provides API for real-time,
in-memory processing and analysis of large set of stored in the HDFS.
 It stores the data into memory for faster processing.
 Apache Spark runs program 100 times faster in-memory and 10 times faster on disk as
compared to the MapRedue.
 Apache Spark is here to faster the processing and analysis of big data sets in Big Data
environment.
 Apache Spark is being adopted very fast by the business to analyze their data set to get
real value of their data.
 Website: http://spark.apache.org/

viii) SAMOA
 SAMOA stands for Scalable Advanced Massive Online Analysis,
 It’s a system for mining the Big Data streams.
 SAMOA is open-source software distributed at GitHub, which can be used as distributed
machine learning framework also.
 Website: https://github.com/yahoo/samoa
Thus, the Big Data industry is growing very fast in 2017 and companies are fast moving
their data to Big Data Platform. There is huge requirement of Big Data in the job market;
many companies are providing training and certifications in Big Data technologies.
*********************

19
Big Data Architecture :
Big data architecture is designed to handle the ingestion, processing, and analysis of data that is too
large or complex for traditional database systems.

The big data architectures include the following components:

Data sources: All big data solutions start with one or more data sources.
Example,
o Application data stores, such as relational databases.
o Static files produced by applications, such as web server log files.
o Real-time data sources, such as IoT devices.
Data storage: Data for batch processing operations is stored in a distributed file store that can hold
high volumes of large files in various formats (also called data lake).
Example,
Azure Data Lake Store or blob containers in Azure Storage.
Batch processing: Since the data sets are so large, therefore a big data solution must process data
files using long-running batch jobs to filter, aggregate, and prepare the data for analysis.
Real-time message ingestion: If a solution includes real-time sources, the architecture must include
a way to capture and store real-time messages for stream processing.
Stream processing: After capturing real-time messages, the solution must process them by filtering,
aggregating, and preparing the data for analysis. The processed stream data is then written to an
output sink. We can use open-source Apache streaming technologies like Storm and Spark
Streaming for this.
Analytical data store: Many big data solutions prepare data for analysis and then serve the
processed data in a structured format that can be queried using analytical tools. Example: Azure
Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing.
Analysis and reporting: The goal of most big data solutions is to provide insights into the data
through analysis and reporting. To empower users to analyze the data, the architecture may include a
data modelling layer. Analysis and reporting can also take the form of interactive data exploration by
data scientists or data analysts.
Orchestration: Most big data solutions consist of repeated data processing operations, that
transform source data, move data between multiple sources and sinks, load the processed data into
an analytical data store, or push the results straight to a report. To automate these workflows, we can
use an orchestration technology such as Azure Data Factory.

Big Data Technology Components:

1. Ingestion :
The ingestion layer is the very first step of pulling in raw data.
It comes from internal sources, relational databases, non-relational databases,
social media, emails, phone calls etc.
There are two kinds of ingestions :
Batch, in which large groups of data are gathered and delivered together.
Streaming, which is a continuous flow of data. This is necessary for real-time
data analytics.
2. Storage :
Storage is where the converted data is stored in a data lake or warehouse and
eventually processed.
The data lake/warehouse is the most essential component of a big data
ecosystem.
It needs to contain only thorough, relevant data to make insights as valuable as
possible.
It must be efficient with as little redundancy as possible to allow for quicker
processing.
3. Analysis :
In the analysis layer, data gets passed through several tools, shaping it into
actionable insights.
There are four types of analytics on big data :

• Diagnostic: Explains why a problem is happening.

• Descriptive: Describes the current state of a business through historical
data.
• Predictive: Projects future results based on historical data.
• Prescriptive: Takes predictive analytics a step further by projecting best
future efforts.
4. Consumption :
The final big data component is presenting the information in a format digestible
to the end-user.
This can be in the forms of tables, advanced visualizations and even single
numbers if requested.
The most important thing in this layer is making sure the intent and meaning of
the output is understandable.

Big Data Characteristics

Big Data contains a large amount of data that is not being processed by traditional data
storage or the processing unit. It is used by many multinational
companies to process the data and business of many organizations. The data flow
would exceed 150 exabytes per day before replication.
There are five v's of Big Data that explains the characteristics.
5 V's of Big Data
o Volume
o Veracity
o Variety
o Value
o Velocity
Volume
The name Big Data itself is related to an enormous size. Big Data is a vast 'volumes' of
data generated from many sources daily, such as business processes, machines, social
media platforms, networks, human interactions, and many more.
Variety
Big Data can be structured, unstructured, and semi-structured that are being collected
from different sources. Data will only be collected from databases and sheets in the past,
But these days the data will comes in array forms, that are PDFs, Emails, audios, SM
posts, photos, videos, etc.

Veracity
Veracity means how much the data is reliable. It has many ways to filter or translate the
data. Veracity is the process of being able to handle and manage data efficiently. Big Data
is also essential in business development.
For example, Facebook posts with hashtags.
Value
Value is an essential characteristic of big data. It is not the data that we process or store.
It is valuable and reliable data that we store, process, and also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the speed by which
the data is created in real-time. It contains the linking of incoming data sets speeds, rate
of change, and activity bursts. The primary aspect of Big Data is to provide demanding
data rapidly.
Big data velocity deals with the speed at the data flows from sources like application
logs, business processes, networks, and social media sites, sensors, mobile
devices, etc.
1.2. CHALLENGES OF CONVENTIONAL SYSTEMS

1.2.1 Introduction to Conventional Systems

What is Conventional System?

Conventional Systems.
 The system consists of one or more zones each having either manually operated call
points or automatic detection devices, or a combination of both.
 Big data is huge amount of data which is beyond the processing capacity
ofconventional data base systems to manage and analyze the data in a specific time
interval.

Difference between conventional computing and intelligent computing

 The conventional computing functions logically with a set of rules and calculations
while the neural computing can function via images, pictures, and concepts.
 Conventional computing is often unable to manage the variability of data obtained in the
real world.
 On the other hand, neural computing, like our own brains, is well suited to situations that
have no clear algorithmic solutions and are able to manage noisy imprecise data. This
allows them to excel in those areas that conventional computing often finds difficult.

1.2.2 Comparison of Big Data with Conventional Data

Big Data Conventional Data

Huge data sets Data set size in control.
Unstructured data such as text, video, Normally structured data such as numbers
and audio. and categories, but it can take other forms
as well.

Hard-to-perform queries and analysis Relatively easy-to-perform queries and

analysis.
Needs a new methodology for analysis. Data analysis can be achieved by using
conventional methods.
Need tools such as Hadoop, Hive, Tools such as SQL, SAS, R, and Excel
Hbase, Pig, Sqoop, and so on. alone may be sufficient.

20
The aggregated or sampled or filtered data. Raw transactional data.

Used for reporting, basic analysis, and Used for reporting, advanced analysis, and
text mining. Advanced analytics is only in predictive modeling .
a starting stage in big data.
Big data analysis needs both Analytical skills are sufficient for
programming skills (such as Java) and conventional data; advanced analysis tools
analytical skills to perform analysis. don’t require expert programing skills.

Petabytes/exabytes of data. Millions/billions of accounts.

Billions/trillions of transactions. Megabytes/gigabytes of data.

Thousands/millions of accounts. Millions of transactions

Generated by big financial institutions, Generated by small enterprises and small

Facebook, Google, Amazon, eBay, banks.
Walmart, and so on.

1.2.2 List of challenges of Conventional Systems

The following list of challenges has been dominating in the case Conventional systems in real
time scenarios:

1) Uncertainty of Data Management Landscape

2) The Big Data Talent Gap
3) The talent gap that exists in the industry Getting data into the big data platform
4) Need for synchronization across data sources
5) Getting important insights through the use of Big data analytics

1) Uncertainty of Data Management Landscape:

 Because big data is continuously expanding, there are new companies and technologies
that are being developed everyday.

21
 A big challenge for companies is to find out which technology works bests for them
without the introduction of new risks and problems.

2) The Big Data Talent Gap:

 While Big Data is a growing field, there are very few experts available in this field.
 This is because Big data is a complex field and people who understand the complexity
and intricate nature of this field are far few and between.

3) The talent gap that exists in the industry Getting data into the big data platform:
 Data is increasing every single day. This means that companies have to tackle limitless
amount of data on a regular basis.
 The scale and variety of data that is available today can overwhelm any data practitioner
and that is why it is important to make data accessibility simple and convenient for
brand mangers and owners.

4) Need for synchronization across data sources:

 As data sets become more diverse, there is a need to incorporate them into an analytical
platform.
 If this is ignored, it can create gaps and lead to wrong insights and messages.

5) Getting important insights through the use of Big data analytics:

 It is important that companies gain proper insights from big data analytics and it is
important that the correct department has access to this information.
 A major challenge in the big data analytics is bridging this gap in an effective fashion.

Other Three challenges of Conventional systems

Three Challenges That big data face.

1. Data
2. Process
3. Management

1. Data Challenges

22
Volume

1.The volume of data, especially machine-generated data, is exploding,

2.how fast that data is growing every year, withnew sources of data that are emerging.
3.For example, in the year 2000, 800,000petabytes (PB) of data were stored in the world, and it
is expected to reach 35 zetta bytes (ZB) by2020 (according to IBM).

Social media plays a key role: Twitter generates 7+ terabytes (TB) of data every day. Facebook,
10 TB.
•Mobile devices play a key role as well, as there were estimated 6 billion mobile phones in
2011.
•The challenge is how to deal with the size of Big Data.

Variety, Combining Multiple Data Sets

•More than 80% of today’s information is unstructured and it is typically too big to manage
effectively.
•Today, companies are looking to leverage a lot more•data from a wider variety of sources both
inside and outside the organization.
•Things like documents, contracts, machine data, sensor data, social media, health records,
emails, etc. The list is endless really.

Variety•A lot of this data is unstructured, or has a complex structure that’s hard to represent in
rows and columns.

2. Processing

 More than 80% of today’s information isunstructured and it is typically too big to
manage effectively.

 Today, companies are looking to leverage a lot more data from a wider variety of
sources both inside and outside the organization.

 Things like documents, contracts, machine data, sensor data, social media, health
records, emails, etc. The list is endless really.

23
3. Management

 A lot of this data is unstructured, or has acomplex structure that’s hard to represent in
rows and columns.

Big Data Challenges

– The challenges include capture, duration, storage, search, sharing, transfer,

– analysis, and visualization.

• Big Data is trend to larger data sets

• due to the additional information derivable from analysis of a single large set of related
data,
– as compared to separate smaller sets with the same total amount of data, allowing
correlations to be found to
• "spot business trends, determine quality of research, prevent diseases, link
legal citations, combat crime, and determine real-time roadway traffic
conditions.”

Challenges of Big Data

The following are the five most important challenges of the Big Data

a) Meeting the need for speed

In today’s hypercompetitive business environment, companies not only have to find and
analyze the relevant data they need, they must find it quickly.

b) Visualization helps organizations perform analyses and make decisions much more
rapidly, but the challenge is going through the sheer volumes of data and accessing the
level of detail needed, all at a high speed.

c) The challenge only grows as the degree of granularity increases. One possible solution
is hardware. Some vendors are using increased memory and powerful parallel
processing to crunch large volumes of data extremely quickly

24
d) Understanding the data
 It takes a lot of understanding to get data in the RIGHT SHAPE so that you can use
 visualization as part of data analysis.

d) Addressing data quality

 Even if you can find and analyze data quickly and put it in the proper context for the
 audience that will be consuming the information, the value of data for DECISION-
MAKING PURPOSES will be jeopardized if the data is not accurate or timely.
This is a challenge with any data analysis.

e) Displaying meaningful results

 Plotting points on a graph for analysis becomes difficult when dealing with extremely
 large amounts of information or a variety of categories of information.
 For example, imagine you have 10 billion rows of retail SKU data that you’re trying to
 compare. The user trying to view 10 billion plots on the screen will have a hard time
 seeing so many data points.
 . By grouping the data together, or “binning,” you can more effectively visualize the
data.

f) Dealing with outliers

 The graphical representations of data made possible by visualization can communicate
 trends and outliers much faster than tables containing numbers and text.
 Users can easily spot issues that need attention simply by glancing at a chart. Outliers
typically represent about 1 to 5 percent of data, but when you’re working with massive
amounts of data, viewing 1 to 5 percent of the data is rather difficult
 We can also bin the results to both view the distribution of data and see the outliers.
 While outliers may not be representative of the data, they may also reveal previously
 unseen and potentially valuable insights.

 Visual analytics enables organizations to take raw data and present it in a meaningful
way that generates the most value. However, when used with big data, visualization is
bound to lead to some challenges.
**************

25
1.3. INTELLIGENT DATA ANALYSIS

1.3.1 INTRODUCTION TO INTELLIGENT DATA ANALYSIS (IDA)

Intelligent Data Analysis (IDA) is one of the hot issues in the field of
artificial intelligence and information.

What is Intelligent Data Analysis (IDA)?

IDA is

… an interdisciplinary study concerned with the effective analysis of data;

… used for extracting useful information from large quantities of online data; extracting
desirable knowledge or interesting patterns from existing databases;

 the distillation of information that has been collected, classified, organized, integrated,
abstracted and value-added;

 at a level of abstraction higher than the data, and information on which it is based and
can be used to deduce new information and new knowledge;

 usually in the context of human expertise used in solving problems.

 the distillation of information that has been collected, classified, organized, integrated,
abstracted and value-added;

 at a level of abstraction higher than the data, and information on which it is based and
can be used to deduce new information and new knowledge;

 usually in the context of human expertise used in solving problems.

Goal:

Goal of Intelligent data analysis is to extract useful knowledge, the process demands a
combination of extraction, analysis, conversion, classification, organization, reasoning, and so
on.

26
1,3,2 Uses / Benefits of IDA

Intelligent Data Analysis provides a forum for the examination of issues related to the research
and applications of Artificial Intelligence techniques in data analysis across a variety of
disciplines and the techniques include (but are not limited to):

The benefit areas are:

 Data Visualization
 Data pre-processing (fusion, editing, transformation, filtering, sampling)
 Data Engineering
 Database mining techniques, tools and applications
 Use of domain knowledge in data analysis
 Big Data applications
 Evolutionary algorithms
 Machine Learning(ML)
 Neural nets
 Fuzzy logic
 Statistical pattern recognition
 Knowledge Filtering and
 Post-processing

Intelligent Data Analysis (IDA)

Why IDA?

 Decision making is asking for information and knowledge

 Data processing can give them

 Multidimensionality of problems is looking for methods for adequate and deep data
processing and analysis

 Epidemiological study (1970-1990)

27
 Sample of examinees died from cardiovascular diseases during the period

 Question: Did they know they were ill?

1 – they were healthy

2 – they were ill (drug treatment, positive clinical and laboratory findings)

1.3.4 Intelligent Data Analysis

Knowledge Acquisition
 The process of eliciting, analyzing, transforming, classifying, organizing and integrating
knowledge and representing that knowledge in a form that can be used in a computer
system.

Knowledge in a domain can be expressed as a number of rules

A Rule :

A formal way of specifying a recommendation, directive, or strategy, expressed as "IF

premise THEN conclusion" or "IF condition THEN action".

How to discover rules hidden in the data?

1.3.4 Intelligent Data Examples:

Example of IDA

 Epidemiological study (1970-1990)

 Sample of examinees died from cardiovascular diseases during the period

Question: Did they know they were ill?

1 – they were healthy

2 – they were ill (drug treatment, positive clinical and laboratory findings)

28
Illustration of IDA by using See5

 application.names - lists the classes to which cases may belong and the attributes
used to describe each case.

 Attributes are of two types: discrete attributes have a value drawn from a set of
possibilities, and continuous attributes have numeric values.

 application.data - provides information on the training cases from which See5 will
extract patterns.

 The entry for each case consists of one or more lines that give the values for all
attributes.

 application.data - provides information on the training cases from which See5 will
extract patterns.

 The entry for each case consists of one or more lines that give the values for all
attributes.

 application.test - provides information on the test cases (used for evaluation of

results).

 The entry for each case consists of one or more lines that give the values for all
attributes.

Goal 1.1 :
application.names – example
gender:M,F
activity:1,2,3
age: continuous
smoking: No, Yes
…

29
Goal:1,2 :
application.data – example
M,1,59,Yes,0,0,0,0,119,73,103,86,247,87,15979,?,?,?,1,73,2.5
M,1,66,Yes,0,0,0,0,132,81,183,239,?,783,14403,27221,19153,23187,1,73,2.6
M,1,61,No,0,0,0,0,130,79,148,86,209,115,21719,12324,10593,11458,1,74,2.5
… …

Result:
Results – example

Rule 1: (cover 26)

gender = M
SBP > 111
oil_fat > 2.9
-> class 1 [0.929]

Rule 1: (cover 26)

gender = M
SBP > 111
oil_fat > 2.9
-> class 1 [0.929]

Rule 4: (cover 14)

smoking = Yes
SBP > 131
glucose > 93
glucose <= 118
oil_fat <= 2.9
-> class 2 [0.938]

Rule 15: (cover 2)

SBP <= 111
oil_fat > 2.9

30
-> class 2 [0.750]

Evaluation on training data

(199 cases):
(a) (b) <-classified as
---- ----
107 3 (a): class 1
17 72 (b): class 2

Results on (training set):

Sensitivity=0.97
Specificity=0.81

Sensitivity=0.98
Specificity=0.90

Evaluation of IDA results

 Absolute & relative accuracy

 Sensitivity & specificity
 False positive & false negative
 Error rate
 Reliability of rules
 Etc.
******************

31
1. 4 NATURE OF DATA

1.4.1 INTRODUCTION
Data

 Data is a set of values of qualitative or quantitative variables; restated, pieces

of data are individual pieces of information.
 Data is measured, collected and reported, and analyzed, whereupon it can be visualized
using graphs or images.

Properties of Data
For examining the properties of data, reference to the various definitions of data.

Reference to these definitions reveals that following are the properties of data:
a) Amenability of use
b) Clarity
c) Accuracy
d) Essence
e) Aggregation
f) Compression
g) Refinement

.
a) Amenability of use: From the dictionary meaning of data it is learnt that data are facts
used in deciding something. In short, data are meant to be used as a base for arriving at
definitive conclusions.

b) Clarity: Data are a crystallized presentation. Without clarity, the meaning desired to be
communicated will remain hidden.

c) Accuracy: Data should be real, complete and accurate. Accuracy is thus, an essential
property of data.

d) Essence: A large quantities of data are collected and they have to be Compressed and
refined. Data so refined can present the essence or derived qualitative value, of the
matter.

32
e) Aggregation: Aggregation is cumulating or adding up.

f) Compression: Large amounts of data are always compressed to make them more
meaningful. Compress data to a manageable size.Graphs and charts are some examples
of compressed data.

g) Refinement: Data require processing or refinement. When refined, they are capable of
leading to conclusions or even generalizations. Conclusions can be drawn only when
data are processed or refined.

1.4.2 TYPES OF DATA

 In order to understand the nature of data it is necessary to categorize them into various
types.
 Different categorizations of data are possible.
 The first such categorization may be on the basis of disciplines, e.g., Sciences, Social
Sciences, etc. in which they are generated.
 Within each of these fields, there may be several ways in which data can be categorized into
types.

There are four types of data:

 Nominal
 Ordinal
 Interval
 Ratio
Each offers a unique set of characteristics, which impacts the type of analysis that can be
performed.

33
The distinction between the four types of scales center on three different characteristics:

1. The order of responses – whether it matters or not

2. The distance between observations – whether it matters or is interpretable
3. The presence or inclusion of a true zero
1.4.2.1 Nominal Scales
Nominal scales measure categories and have the following characteristics:

 Order: The order of the responses or observations does not matter.

 Distance: Nominal scales do not hold distance. The distance between a 1 and a 2 is not
the same as a 2 and 3.
 True Zero: There is no true or real zero. In a nominal scale, zero is uninterruptable.
Appropriate statistics for nominal scales: mode, count, frequencies
Displays: histograms or bar charts

1.4.2.2 Ordinal Scales

At the risk of providing a tautological definition, ordinal scales measure, well, order. So, our
characteristics for ordinal scales are:

 Order: The order of the responses or observations matters.

 Distance: Ordinal scales do not hold distance. The distance between first and second is
unknown as is the distance between first and third along with all observations.

34
 True Zero: There is no true or real zero. An item, observation, or category cannot finish
zero.
Appropriate statistics for ordinal scales: count, frequencies, mode
Displays: histograms or bar charts

1.4.2.3 Interval Scales

Interval scales provide insight into the variability of the observations or data.
Classic interval scales are Likert scales (e.g., 1 - strongly agree and 9 - strongly disagree) and
Semantic Differential scales (e.g., 1 - dark and 9 - light).
In an interval scale, users could respond to “I enjoy opening links to thwebsite from a company
email” with a response ranging on a scale of values.
The characteristics of interval scales are:

 Order: The order of the responses or observations does matter.

 Distance: Interval scales do offer distance. That is, the distance from 1 to 2 appears the
same as 4 to 5. Also, six is twice as much as three and two is half of four. Hence, we can
perform arithmetic operations on the data.
 True Zero: There is no zero with interval scales. However, data can be rescaled in a
manner that contains zero. An interval scales measure from 1 to 9 remains the same as
11 to 19 because we added 10 to all values. Similarly, a 1 to 9 interval scale is the same
a -4 to 4 scale because we subtracted 5 from all values. Although the new scale contains
zero, zero remains uninterruptable because it only appears in the scale from the
transformation.
Appropriate statistics for interval scales: count, frequencies, mode, median, mean, standard
deviation (and variance), skewness, and kurtosis.
Displays: histograms or bar charts, line charts, and scatter plots.

1.4.2.4 Ratio Scales

Ratio scales appear as nominal scales with a true zero.
They have the following characteristics:

 Order: The order of the responses or observations matters.

 Distance: Ratio scales do do have an interpretable distance.

35
 True Zero: There is a true zero.
Income is a classic example of a ratio scale:

 Order is established. We would all prefer $100 to $1!

 Zero dollars means we have no income (or, in accounting terms, our revenue exactly
equals our expenses!)
 Distance is interpretable, in that $20 appears as twice $10 and $50 is half of a $100.
For the web analyst, the statistics for ratio scales are the same as for interval scales.
Appropriate statistics for ratio scales: count, frequencies, mode, median, mean, standard
deviation (and variance), skewness, and kurtosis.
Displays: histograms or bar charts, line charts, and scatter plots.
The table below summarizes the characteristics of all four types of scales.

Nominal Ordinal Interval Ratio

Order Matters No Yes Yes Yes

Distance Is No No Yes Yes

Interpretable

Zero Exists No No No Yes

1.4.3 DATA CONVERSION

 We can convert or transform our data from ratio to interval to ordinal to nominal.
However, we cannot convert or transform our data from nominal to ordinal to interval
to ratio.

 Scaled data can be measured in exact amounts.

For example, 60 degrees , 12.5 feet, 80 Miles per hour

 Scaled data can be measured with equal intervals.

For example, Between 0 and 1 is 1 inch, Between 13 and 14 is also 1 inch

36
 .Ordinal or ranked data provides comparative Amounts

Example:
1st Place 2nd Place 3rd Place

 Not equal intervals

1st Place 2nd Place 3rd Place

19.6 feet 18.2 feet 12.4 feet

1.4.4 DATA SELECTION

Another Example that handle the question as :

What is the average driving speed of teenagers on the freeway?

a) Scaled
b) Ordinal
Scaled – Speed:- Speed can be measured in exact amounts withequal intervals.

Example :
60 degrees 12.5 feet 80 Miles per hour

 Ordinal or ranked data provides comparative amounts.

For example, 1st Place 2nd Place 3rd Place

 Percentiles provide comparative amounts.

In this case, 93% of all hospital have lower patient satisfaction scores than Eastridge hospital.
31% have lower satisfaction scores than Westridge Hospital.

Thus the nature of data and its value have great influence on data insight in it.

***********************

37
5. ANALYTIC PROCESS AND TOOLS

• There are 6 analytic processes:

1. Deployment

2. Business Understanding

3. Data Exploration

4. Data Preparation

5. Data Modeling

6. Data Evaluation

38
Step 1: Deployment

• Here we need to:

– plan the deployment and monitoring and maintenance,

– we need to produce a final report and review the project.

– In this phase,

• we deploy the results of the analysis.

• This is also known as reviewing the project.

Step 2: Business Understanding

• Business Understanding

– The very first step consists of business understanding.

– Whenever any requirement occurs, firstly we need to determine the business

objective,

– assess the situation,

– determine data mining goals and then

– produce the project plan as per the requirement.

• Business objectives are defined in this phase.

Step 3: Data Exploration

• The second step consists of Data understanding.

– For the further process, we need to gather initial data, describe and explore the
data and verify data quality to ensure it contains the data we require.

39
– Data collected from the various sources is described in terms of its application
and the need for the project in this phase.

– This is also known as data exploration.

• This is necessary to verify the quality of data collected.

Step 4: Data Preparation

• From the data collected in the last step,

– we need to select data as per the need, clean it, construct it to get useful
information and

– then integrate it all.

• Finally, we need to format the data to get the appropriate data.

• Data is selected, cleaned, and integrated into the format finalized for the analysis in this
phase.

Step 5: Data Modeling

• we need to
– select a modeling technique, generate test design, build a model and assess the
model built.
• The data model is build to
– analyze relationships between various selected objects in the data,
– test cases are built for assessing the model and model is tested and implemented
on the data in this phase.

• Where processing is hosted?

– Distributed Servers / Cloud (e.g. Amazon EC2)
• Where data is stored?
– Distributed Storage (e.g. Amazon S3)
• What is the programming model?
– Distributed Processing (e.g. MapReduce)

40
• How data is stored & indexed?
– High-performance schema-free databases (e.g. MongoDB)
• What operations are performed on data?
– Analytic / Semantic Processing

• Big data tools for HPC and supercomputing

– MPI
• Big data tools on clouds
– MapReduce model
– Iterative MapReduce model
– DAG model
– Graph model
– Collective model
• Other BDA tools
– SaS
– R
– Hadoop

Thus the BDA tools are used through out the BDA applications development.
******************

1.6 ANALYSIS AND REPORTING

1.6.1 INTRODUCTION TO ANALYSIS AND REPORTING

What is Analysis?
• The process of exploring data and reports
– in order to extract meaningful insights,
– which can be used to better understand and improve business performance.

• What is Reporting ?
• Reporting is

41
– “the process of organizing data
– into informational summaries
– in order to monitor how different areas of a business are performing.”

1.6.2 COMPARING ANALYSIS WITH REPORTING

• Reporting is “the process of organizing data into informational summaries in order to

monitor how different areas of a business are performing.”
• Measuring core metrics and presenting them — whether in an email, a slidedeck,
or online dashboard — falls under this category.
• Analytics is “the process of exploring data and reports in order to extract meaningful
insights, which can be used to better understand and improve business performance.”
• Reporting helps companies to monitor their online business and be alerted to when data
falls outside of expected ranges.
• Good reporting
• should raise questions about the business from its end users.
• The goal of analysis is
• to answer questions by interpreting the data at a deeper level and providing
actionable recommendations.

• A firm may be focused on the general area of analytics (strategy, implementation,

reporting, etc.)
– but not necessarily on the specific aspect of analysis.
• It’s almost like some organizations run out of gas after the initial set-up-related
activities and don’t make it to the analysis stage

A reporting activity deliberately proposes Analysis activity.

42
1.6.3 CONTRAST BETWEEN ANALYSIS AND REPORTING

The basis differences between Analysis and Reporting are as follows:

Analysis Reporting
Provides what is needed Provides what is asked for
Is typically customized Is Typically standardized
Involves a person Does not involve a person
Is extremely flexible Is fairly Inflexible

• Reporting translates raw data into information.

• Analysis transforms data and information into insights.
• reporting shows you what is happening
• while analysis focuses on explaining why it is happening and what you can do about it.

 Reports are like Robots n monitor and alter you and where as analysis is like parents - c
an figure out what is going on (hungry, dirty diaper, no pacifier, , teething, tired, ear
infection, etc).
 Reporting and analysis can go hand-in-hand:
 Reporting provides no limited context about what is happening in the data. Context is
critical to good analysis.
 Reporting translate a raw data into information
 Reporting usually raises a question – What is happening ?
 Analysis transforms the data into insights - Why is it happening ? What you can do
about it?

Thus, Analysis and Reporting is synonym to each other with respect their need and utilizing
in the needy context.

*****************

Supplement Postprocessing Sampling
No ratings yet
Supplement Postprocessing Sampling
72 pages
u1 a clsrm
No ratings yet
u1 a clsrm
33 pages
Introduction to Big Data Analytics_thendral1
No ratings yet
Introduction to Big Data Analytics_thendral1
26 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
BD U1.PDF.crdownload
No ratings yet
BD U1.PDF.crdownload
65 pages
Control Statements in Python
No ratings yet
Control Statements in Python
18 pages
Unit1-BDT
No ratings yet
Unit1-BDT
96 pages
Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
Unit-1
No ratings yet
Unit-1
107 pages
big_data-intro
No ratings yet
big_data-intro
31 pages
unit 1
No ratings yet
unit 1
20 pages
Unit-1 Module Updated
No ratings yet
Unit-1 Module Updated
48 pages
UNIT- 1_DA_Notes
No ratings yet
UNIT- 1_DA_Notes
51 pages
Bigdata 201126054145 PDF
No ratings yet
Bigdata 201126054145 PDF
23 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Bigdatappt
No ratings yet
Bigdatappt
31 pages
Zogorijomitoga Tidoku
No ratings yet
Zogorijomitoga Tidoku
2 pages
Converted 4011171
No ratings yet
Converted 4011171
144 pages
Big Data
No ratings yet
Big Data
30 pages
Hamid Seminar Doc
No ratings yet
Hamid Seminar Doc
57 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Object Oriented Programming Concepts Limitations and Application Trends
No ratings yet
Object Oriented Programming Concepts Limitations and Application Trends
4 pages
Itfm Assignment Group 8
100% (1)
Itfm Assignment Group 8
16 pages
Seminar On: Big Data
No ratings yet
Seminar On: Big Data
23 pages
Unit 1
No ratings yet
Unit 1
76 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
sap abap
No ratings yet
sap abap
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data
No ratings yet
Big Data
31 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Knoppix Cheatcodes
No ratings yet
Knoppix Cheatcodes
4 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
55 pages
Device Tree Overview
No ratings yet
Device Tree Overview
23 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
Dual Step Hybrid Routing Protocol For Network Lifetime Enhancement in WSN-IoT Environment
No ratings yet
Dual Step Hybrid Routing Protocol For Network Lifetime Enhancement in WSN-IoT Environment
9 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
Blood Bank and Donor Management Report PHP
No ratings yet
Blood Bank and Donor Management Report PHP
91 pages
UNIT 1 QUESTION&ANSWERS
No ratings yet
UNIT 1 QUESTION&ANSWERS
36 pages
Unit 1 and Unit 2 notes bda
No ratings yet
Unit 1 and Unit 2 notes bda
11 pages
Lec 1 - Introduction to Big Data
No ratings yet
Lec 1 - Introduction to Big Data
37 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Simulation of The Central Server Algorithm
No ratings yet
Simulation of The Central Server Algorithm
11 pages
IMTC634_Data Science_Chapter 11
No ratings yet
IMTC634_Data Science_Chapter 11
22 pages
BIGDATA UNITS
No ratings yet
BIGDATA UNITS
80 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Deep Learning Technique-Enabled Web Application Firewall For The Detection of Web Attacks
No ratings yet
Deep Learning Technique-Enabled Web Application Firewall For The Detection of Web Attacks
16 pages
Big-Data-ppt
No ratings yet
Big-Data-ppt
30 pages
BDU1
No ratings yet
BDU1
39 pages
BIG DATA (UNIT 1)
No ratings yet
BIG DATA (UNIT 1)
32 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
CCENT Lab 2 2 Connecting To The Internet v1.0.1
No ratings yet
CCENT Lab 2 2 Connecting To The Internet v1.0.1
38 pages
SAP Memory Management
No ratings yet
SAP Memory Management
3 pages
Computer Science - Paper 1 - Mock 1
No ratings yet
Computer Science - Paper 1 - Mock 1
10 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
SE Unit-4 NOTES
No ratings yet
SE Unit-4 NOTES
11 pages
Lesson 13 App UI Design
No ratings yet
Lesson 13 App UI Design
59 pages
Crime Management System
No ratings yet
Crime Management System
8 pages
117769
No ratings yet
117769
20 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Chapter - 1: Server Side Programming Using PHP
100% (2)
Chapter - 1: Server Side Programming Using PHP
118 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Unit I-KCS-061
No ratings yet
Unit I-KCS-061
42 pages
A Python Based Virtual Assistant Using Raspberry Pi For Home Automation
No ratings yet
A Python Based Virtual Assistant Using Raspberry Pi For Home Automation
6 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Acc 025
No ratings yet
Acc 025
2 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
M221 Modbus Master Example
No ratings yet
M221 Modbus Master Example
20 pages
ML MID-1 QB With Answers
No ratings yet
ML MID-1 QB With Answers
10 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
What Is Big Data?
No ratings yet
What Is Big Data?
3 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Guru Nanak College TN 2023 Apti Results
No ratings yet
Guru Nanak College TN 2023 Apti Results
14 pages
Cryptography: Presented by Blances Sanchez
No ratings yet
Cryptography: Presented by Blances Sanchez
91 pages
TSN2101 - Operating Systems: Trimester 1, 2013/2014
No ratings yet
TSN2101 - Operating Systems: Trimester 1, 2013/2014
42 pages
Company Profile Template 39
100% (2)
Company Profile Template 39
5 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
CD CT 3 2022 Paper
No ratings yet
CD CT 3 2022 Paper
4 pages
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
No ratings yet
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
42 pages
BDM 1
No ratings yet
BDM 1
37 pages
Big Data
No ratings yet
Big Data
31 pages
Human Machine Interfaces (Hmis) : Objectives
No ratings yet
Human Machine Interfaces (Hmis) : Objectives
12 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
4.2.2.13 Lab - Configuring and Verifying Extended ACLs - ILM
No ratings yet
4.2.2.13 Lab - Configuring and Verifying Extended ACLs - ILM
16 pages
Certified Blackhat Methodology To Unethical Hacking
100% (1)
Certified Blackhat Methodology To Unethical Hacking
135 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet