Unit 1 Topic 0 Introduction to Big Data
Unit 1 Topic 0 Introduction to Big Data
Xplenty Atlas.ti
Talend R-Programming
Xplenty
A cloud-based ETL solution providing simple visualized data
pipelines for automated data flows across a wide range of
sources and destinations. Xplenty’s powerful on-platform
transformation tools allow you to clean, normalize, and
transform data while also adhering to compliance best practices.
Features:
Powerful, code-free, on-platform data transformation offering
Rest API connector – pull in data from any source that has a Rest
API
Destination flexibility – send data to databases, data
warehouses, and Salesforce
Security focused – field-level data encryption and masking to
meet compliance requirements
Atlas.ti
All-in-oneresearch software. This big data analytic tool
gives you all-in-one access to the entire range of
platforms. It used for qualitative data analysis and mixed
methods research in academic, market, and user
experience research.
Features:
Can export information on each source of data.
It offers an integrated way of working with your data.
Allows you to rename a Code in the Margin Area
Helps you to handle projects that contain thousands of
documents and coded data segments.
Analytics
Tool that provides visual analysis and dash boarding.
It allows to connect multiple data sources, including
business applications, databases, cloud drives, and
more.
Features:
Offers visual analysis and dash boarding.
It helps to analyze data in depth.
Provides collaborative review and analysis.
Can embed reports to websites, applications, blogs,
and more.
Azure HDInsight
Spark and Hadoop service in the cloud. It provides big data
cloud offerings in two categories, Standard and Premium. It
provides an enterprise-scale cluster for the organization to run
their big data workloads.
Features:
Reliable analytics with an industry-leading SLA
It offers enterprise-grade security and monitoring
Protect data assets and extend on-premises security and
governance controls to the cloud
High-productivity platform for developers and scientists
Integration with leading productivity applications
Deploy Hadoop in the cloud without purchasing new hardware
Talend
Big data analytics software that simplifies and automates big
data integration. Its graphical wizard generates native code.
It also allows big data integration, master data management
and checks data quality.
Features:
Accelerate time to value for big data projects
Simplify ETL & ELT for big data
Talend Big Data Platform simplifies using MapReduce and
Spark by generating native code
Smarter data quality with machine learning and natural
language processing
Agile DevOps to speed up big data projects
R-Programming
Language for statistical computing and graphics. It
also used for big data analysis. It provides a wide
variety of statistical tests.
Features:
Effective data handling and storage facility,
It provides a suite of operators for calculations on
arrays, in particular, matrices,
It provides coherent, integrated collection of big data
tools for data analysis
It provides graphical facilities for data analysis which
display either on-screen or on hardcopy
Others
Apache Hadoop: A framework that allows you to store big
data in a distributed environment for parallel processing.
Apache Pig: A Platform that is used for analyzing large
datasets by representing them as data flows. Pig is
designed to provide an abstraction over MapReduce which
reduces the complexities of writing a MapReduce program.
Apache Hbase: A multidimensional, distributed, open-
source, and NoSQL database written in Java. It runs on top
of HDFS providing Bigtable-like capabilities for Hadoop.
Apache Spark: Open-source general-purpose cluster-
computing framework. It provides an interface for
programming all clusters with implicit data parallelism and
fault tolerance.
Big Data Case studies
Walmart leverages Big Data and Data Mining to
create personalized product recommendations for
its customers.
Hadoop Block
Example Practice
Question: A file of size 612 MB, and using the
default block configuration (128 MB). Computer
How many blocks will create and last block size.
Conti…
Answer:
(128*4+100=612).
Answer:
a. No. of block = 5, last block size = 172
b. No. of block = 6, last block size = 44
c. No. of block = 7, last block size = 0
d. No. of block = 6, last block size = 128
Practice
Question: How many blocks will create if file
of size 400 MB using the default block
configuration. Also compute the size of last
block.
Answer: ?
Conti…
Practice
Question: How many blocks will create if file
of size 2 GB 500 MB using the default block
configuration. Also compute the size of last
block.
Answer: ?
THANK
YOU