0% found this document useful (0 votes)
12 views

Big Data Processing Concepts

Big Data Processing Concepts
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Big Data Processing Concepts

Big Data Processing Concepts
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Big Data Processing Concepts

S.Kavitha
Head & Assistant Professor
Department of Computer Science
Sri Sarada Niketan College of Science for Women,Karur.
Parallel Data Processing
• Parallel data processing involves the simultaneous
execution of multiple sub-tasks that
• collectively comprise a larger task. The goal is to
reduce the execution time by dividing a
• single larger task into multiple smaller tasks that run
concurrently.
• Although parallel data processing can be achieved
through multiple networked machines,
• it is more typically achieved within the confines of a
single machine with multiple
• processors
Distributed Data Processing
• Distributed data processing is closely related
to parallel data processing in that the same
• principle of “divide-and-conquer” is applied.
However, distributed data processing is
• always achieved through physically separate
machines that are networked together as a
• cluster.
Hadoop
• Hadoop is an open-source framework for large-scale
data storage and data processing that
• is compatible with commodity hardware. The Hadoop
framework has established itself as
• a de facto industry platform for contemporary Big
Data solutions. It can be used as an
• ETL engine or as an analytics engine for processing
large amounts of structured, semistructured
• and unstructured data. From an analysis perspective,
Hadoop implements the
• MapReduce processing framework.
Processing Workloads
• A processing workload in Big Data is defined
as the amount and nature of data that is
• processed within a certain amount of time.
Workloads are usually divided into two types:
• batch
• transactional
Batch
• Batch processing, also known as offline
processing, involves processing data in
batches
• and usually imposes delays, which in turn
results in high-latency responses. Batch
• workloads typically involve large quantities of
data with sequential read/writes and
• comprise of groups of read or write queries.
Transactional
• Transactional processing is also known as online
processing. Transactional workload
• processing follows an approach whereby data is
processed interactively without delay,
• resulting in low-latency responses. Transaction workloads
involve small amounts of data
• with random reads and writes.
• OLTP and operational systems, which are generally write-
intensive, fall within this
• category. Although these workloads contain a mix of
read/write queries, they are generally
• more write-intensive than read-intensive.
Cluster
• In the same manner that clusters provide
necessary support to create horizontally scalable
• storage solutions, clusters also provides the
mechanism to enable distributed data
• processing with linear scalability. Since clusters
are highly scalable, they provide an ideal
• environment for Big Data processing as large
datasets can be divided into smaller datasets
• and then processed in parallel in a distributed
manner.
Processing in Batch Mode
• In batch mode, data is processed offline in
batches and the response time could vary from
• minutes to hours. As well, data must be
persisted to the disk before it can be
processed.
• Batch mode generally involves processing a
range of large datasets, either on their own or
• joined together, essentially addressing the
volume and variety characteristics of Big Data
• datasets.

You might also like