Unit -5 Updated Mhm
Unit -5 Updated Mhm
Concurrent reads are done as each client is attached to different server and all
clients can read from the servers simultaneously, although having concurrent
reads leads to eventual consistency as master is not involved.
There can be cases where client may have an outdated view, which gets
updated with a little delay.
UNIT-5
Chapter-2
Apache Flume
• Introduction,
• Architecture,
• DataFlow,
• Features and Limitations,
• Applications.
Introduction to Apache Flume
• Flume event is the basic unit of the data that is to be transported inside
Flume. A Flume event has a payload of the byte array.
• This is to be transferred from the source to the destination followed by
optional headers.
• The below figure depicts the structure of the Flume event.
Apache flume Architecture:
Apache flume Architecture:
1. Data Generators
Data generators generate real-time streaming data.
The data generated by data generators are collected by individual Flume agents that are
running on them.
The common data generators are Facebook, Twitter, etc.
2. Flume Agent
The agent is a JVM process in Flume.
It receives events from the clients or other agents and transfers it to the destination or other
agents.
It is a JVM process that consists of three components that are a source, channel, and sink
through which data flow occurs in Flume.
a. Source
• A Flume source is the component of Flume Agent which consumes data (events) from data generators like a
web server and delivers it to one or more channels.
• The data generator sends data (events) to Flume in a format recognized by the target Flume source.
• Flume supports different types of sources. Each source receives events (data) from a specific data generator.
• Example of Flume sources: Avro source, Exec source, Thrift source, NetCat source, HTTP source, Scribe
source, twitter 1% source, etc.
b. Channel
• When a Flume source receives an event from a data generator, it stores it on one or more channels. A Flume
channel is a passive store that receives events from the Flume source and stores them till Flume sinks
consume them.
• Channel acts as a bridge between Flume sources and Flume sinks.
• Flume channels are fully transactional and can work with any number of Flume sources and sinks.
• Example of Flume Channel− Custom Channel, File system channel, JDBC channel, Memory channel, etc.
• c. Sink
• The Flume sink retrieves the events from the Flume channel and pushes them on the centralized store like
HDFS, HDFS, or passes them to the next agent.
•
Example of Flume Sink− HDFS sink, AvHBase sink, Elasticsearch sink, etc.
3. Data collector
• The data collector collects the data from individual agents and aggregates
them.
• It pushes the collected data to a centralized store.
4. Centralized store
• Centralized stores are Hadoop HDFS, HBase, etc.
Applications of Apache flume(USE cases)