0% found this document useful (0 votes)
240 views

Understand Azure Event Hubs

The document provides a high-level overview of Azure Event Hubs, including: - Event Hubs allow publishing of telemetry and other data from devices/services to the cloud at large scales. - The key components are events, publishers, partitions, consumer groups, and a protocol like AMQP or HTTP. - Events are stored across multiple partitions to allow for massive scale ingestion and consumption of data.

Uploaded by

elisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
240 views

Understand Azure Event Hubs

The document provides a high-level overview of Azure Event Hubs, including: - Event Hubs allow publishing of telemetry and other data from devices/services to the cloud at large scales. - The key components are events, publishers, partitions, consumer groups, and a protocol like AMQP or HTTP. - Events are stored across multiple partitions to allow for massive scale ingestion and consumption of data.

Uploaded by

elisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Understand

Azure Event
Hubs
A glance at the top level architecture
of Azure Event Hubs.

made with
About The Author
Saravana Kumar (Microsoft Integration MVP)

Saravana Kumar is the founder and CTO of BizTalk360. A product focused


on enhancing the administration, support, monitoring and operation side of
Microsoft BizTalk Server. He got strong passion for BizTalk administration
and management. He helps customer using BizTalk server to smoothly run
their environments. He got over 18 years experience in IT, and over 12 years
purely on Integration areas using BizTalk Server.

Saravana is a frequent speaker in some of the top BizTalk groups across


Europe. He is also the key person who runs the London Connected Systems
user group, organising events and connecting people. He regularly writes
articles in his blog at http://blogs.biztalk360.com/author/saravana/ about
topics related to integration. His blog has attracted over 11,000 daily
visitors.
Understand Azure Event
Hubs
Saravana Kumar (Microsoft Integration MVP)

For the past few months, I got hooked up with BizTalk360 and isolated
myself from any deep technology learning. Now BizTalk360 version 8.0 is
out of my way the team is kind of settled down, I thought it’s time to look
deep into some of the technology areas I was intending to learn for some
time. As part of the process, I decided to put my foot first on Azure Event
Hubs.

Azure Event hubs are not entirely new to me, I’ve been doing small bits and
pieces here and there as hobby/prototype projects for a while. Now it’s
time to understand bit more deeper into have various pieces fit together. I’ll
try to cover my learning’s as much as possible in these blogs, so it might
help people to get started.

What is Azure Event Hubs?


Before going into the technology side of things, let’s try to understand in
layman terms why do we need this technology. The cloud consumption is
getting more and more in many organisations, and there are lot of new
breed to Software as a Service (SaaS) solutions popping up every day. One
of the common pattern you’ll see in the cloud adoption is how you are
going to move data (either from on-premise or devices) into the cloud. Let’s
take an example, you have some kind of monitoring service that monitors
your server CPU utilization every 5 seconds, and you wanted to push this
information to the cloud and store it in a persistent store like SQL Azure,
MongoDb, Blob so on. The traditional way of implementing this solution will
be, you would have written some kind of web service (ex: ASP.NET WebAPI),
deployed it into the cloud (as web role or VM’s) and you would have
constantly fired messages to that WebAPI end point. It may initially sound
easy to implement such solution, but over a period once you start to scale
(ex: you wanted to monitor CPU utilization in 1000’s of servers) then
building the robustness and reliability of that webapi end point solution will
become complex. In addition, you can imagine the cost implications of
setting up such infrastructure.

This is the exact problem Azure Event Hubs is going to solve for us. You
simply provision event hub using the portal, which will give you the end
points, and you can start firing the messages to that end point either via
HTTPS or AMQP, the collected (ingested) data will get stored in EventHubs,
then you can later read it using readers at your own phase. The data is
retained for a period of time automatically.

Azure event hubs are designed for scale, it can process millions and millions
of message on both directions – inbound and outbound. Some of the real
world use cases include getting telemetry data from cars, games,
application so on, IoT scenario where millions of devices push data to the
cloud, gaming scenarios where you push user activities at scale to the cloud
etc. There are tons of use cases why we need a technology like Azure Event
Hubs.

Understand the building blocks


Let’s take a quick look at the top level architecture of Azure Event hubs and
try to understand all the building blocks that makes it powerful.
There are the important terminologies we need to learn when it comes to
Azure Event Hubs

EventData (message)
Publishers (or producers)
Partitions
Partition Keys / Partition Id
Receivers (or consumer)
Consumer Groups
Event Processor Host
Transport Protocol (HTTP, AMQP)
Throughput Units

There is also security aspect which we will cover later.

Event Data
In the context of event hubs, messages are referred to as event data. Event
data contains the body of the event which is a binary stream (you can place
any binary content like serialized JSON, XML, etc), a user defined property
bag (name-value pair) and various system metadata about the event like
offset in the partition, and it’s number in the stream sequence.

EventData class is included in the .NET Azure Service Bus client library. It’s
has the same sender model (BrokeredMessage) used in Service Bus queues
and topics at the protocol level.

Publishers (or producer)


Any entity that sends events (messages) to an Event Hub is a publisher.
Event publishers can publish events using either HTTPS or AMQP protocol.
For simple scenario where your volume of published events are low you can
choose HTTPs, if you are dealing with high volume publishing then AMQP
will give you better performance, latency and throughput. Event publishers
use Shared Access Signature (SAS) token to identify themselves to an Event
Hub, they can have a unique identity or use a common SAS token
depending on the requirements of the scenario (we will cover security
separately).
You can publish events individually or batched. A single publication whether
it's an individual event or batched event has a maximum size limit of 256kb.
Publishing events larger than this will result in an exception (quota
exceeded).

Publishing events with a .NET client using the Azure Service Bus client
library is just 3 lines of code

Publishing a batch of events at one go

Partitions
As you can see from the image, there are lot of partitions inside an event
hub. Partitions are one of the key differentiation in the way the data is
stored and retrieved compared to other Service Bus technologies like Queue
and Topics. The traditional Queue and Topics are designed based on the
“Competing Consumer” pattern in which each consumer attempts to read
from the same queue (imagine a single lane road, where the vehicles goes
one after the other, and there is no option to overtake), whereas Event hubs
are designed based on “Partitioned consumer pattern” (think of it like
parallel lanes in motorways, where traffic flows through multiple lanes, if
one lane is busy vehicles choose another lane to move fast). The single lane
queue model will ultimately results in scale limits, hence event hubs uses
partitioned consumer pattern to achieve the massive scale required.

The other important differentiation between normal Queue and Event Hub
partitions is, in the Queues once the message is read it’s taken out of the
queue and it’s not available (if there are errors, it will move to dead letter
queue), whereas in Event hubs partitions the data in the partition stays
there even after it’s been read by the consumer. This allows the consumers
to come and read the data again if required (example lost connection). The
events stored in partitions gets deleted only after the retention period is
expired (example 1 day). You cannot manually delete data from partitions.

The events gets stored in partition in sequence, as the newer events arrive
they are added to the end of the sequence (more or less like how events
gets added to queues).

Event hub uses different sender usage patterns,

Publisher can target the event data to a specific partition using a


partition-id
Group of publishers can use a common hash (PartitionKey) for automatic
hash-based distribution
Automatic rand distribution using round-robin mechanism if publisher
didn’t specify any PartitionKey or Partition Id

Due to the varying distribution patterns, the partitions will all grow in
different sizes from one another. It’s also important to know your event
data will live in only one partition, it will not get duplicated. Partitions
underneath has private blob storage managed by Azure.

The number of partitions is specified at the event hub creation time and it
cannot be modified later (so care should be taken). Using the standard
Azure portal you can create between 2 to 32 partitions (default 4), however
if required, you can create up to 1024 partitions by contacting Azure
support.
Partition Keys / Partition Ids
In the main image, you can see the concepts of partition keys (represented
as red and orange dots). The partition key is the value you pass in the event
data for the purpose of event organisation (grouping and making sure they
live in the same place). There may be a requirement you wanted to move
data from all the servers in a particular data centre to get stored in a
specific partition. If you note carefully, the event published have no
knowledge about the actual partition, they simply specify a partition key
and the event hubs makes sure the keys with same partition key are stored
in the same partition. This decoupling of key and partition insulates the
sender from needing to know too much about the downstream processing
and storage of events.

If you do not specify the partition keys, event hub will just store incoming
events in different partition on round-robin basis.

You can also send the event data directly to specific partitions if required.
This is generally not a good practice you should either leave the publishing
of events to event hubs in round-robin model, or you can take advantage of
the PartitionKeys concepts as explained above, which is a bit more abstract
and event hub will take care of grouping them together in relevant
partitions.

However if you have some special needs and want to write events to
particular partitions, then you can do so easily by using the partition id as
shown in the below code snippet.
Receivers (or consumers)
Any entity (applications) that read event data from an Event Hub is a
receiver (or consumer). You can view from the main diagram above there
can be multiple receivers for the same event hub, they can read the same
partition data at their own pace. There is also an important thing to not,
event consumers connect only via AMQP (whereas in the receive side we
have both HTTPS and AMQP), this is because the events are pushed to the
consumer from event hub via the AMQP channel, the client does not need
to pull for data availability. This model is important both for scalability
purpose as well as to avoid each consumers writing their own logic for
checking new data availability and putting unnecessary load on the
platform.

Consumer Groups
The data stored in your event hub (in all partitions) are not going to change,
once it got stored in a partition they are going to live there until the
retention period is elapsed (in which case the event data will be deleted).
The data can be consumed (or read) by different consumers (applications)
based on their own requirements. Some consumers want to read it carefully
only once, some consumers may go back and read historical data again
and again so on.

You can see there are varying requirements for each consumer, in order to
support this, event hub uses consumer groups. A consumer group is simply
a view of the data in the entire event hub. The data in the event hub can
only be accessed via consumer group, you cannot access the partitions
directly to fetch the data. When you create an event hub, a default
consumer group is also created. Azure also uses consumer group as
differentiating factor between multiple pricing tiers (on Basic tier you
cannot have more than 1 consumer group, whereas in the Standard tier you
can have up to 20 consumer groups)

Event Processor Host / Direct Receivers (Event Hub


Receiver)
Event hubs primarily provide two different ways you can consume events.
Either using the direct receivers or via a higher level abstracted, intelligent
host/agent called “Event Processor Host”
When you are building your receivers (consumers) to access the event data
from event hub via the consumer groups, there are bunch of things you
need to take care of as a consumer responsibility. This will include things
like

Acquiring and Renewing leases


Maintaining offset, check points and leader election handling (Epoch),
and
Thread safety

If you are going down building your own Direct Receivers, you need to
manually take care of all the above points. Your code will look something
like this

this is going to be a repeated task for every customer and it will require a
level of knowledge to write the receivers in an efficient way. To address this
challenge, Azure Event Hubs comes with a higher level abstracted intelligent
host/agent called “Event Processor Host”, you simply implement the
IEventProcessor interface and 3 core methods OpenAsync,
ProcessEventsAsync and CloseAsync. It takes an Azure blob parameter to
manage offset/checkpoint against various partitions. This is the simplest
way to consume events from event hubs. The new code will look like
Transport Protocols (HTTPs/AMQP)
From the main picture from the beginning you can see there are 2 different
types of transport protocols used for communications with Event Hubs. On
the publisher (receiver) side, you can either use HTTPs if you are going to
push low volume of events to the event hubs, or you can use AMQP for high
throughput, better performance scenarios. On the consuming side, since
event hubs uses push based model to push events to listeners/receivers,
AMQP is the only option.

Throughput Units
Throughput units are basics of how you can scale the traffic coming in and
going out of Event hubs. Throughput units are one of the key pricing
parameters. Throughput units are purchased at event hub namespaces
level and are applicable to all the event hubs in a given names space. The
other important thing to keep in mind is a single partition can scale to only
one throughput unit, so it makes sense to have less number of throughput
units than the total number of partitions you have. Example

Namespace: IOTRECEIVER
Event Hub #1 : 4 partitions
Event Hub #2 : 2 partitions

In the above case, there is no point to have more than 6 throughput units
for the namespace IOTRECEIVER. You probably need only 1 or 2 throughput
units, unless otherwise you have a huge volume of traffic.

Throughput units handle 1mb or 1000 events (event data) per second on
the publishing side and 2mb/second on the consuming side. Think of
throughput units like pipes, if you need more water to flow through, you get
more pipes. The circumference of the pipe is fixed it can only take so much
water, so if you either need to fill the tank or take more water from the tank
the only way to do it is by adding/fixing more pipes. Every pipe you add is
going to cost you money.
Summary
Now we covered all the building
blocks we saw in the original
picture at the beginning of the
article. Hopefully, this article would
have given you all the bits and
pieces requires to understand
Azure Event Hubs, there are still
few topics I would like to cover that
includes security mechanism,
metrics data, consumer groups
working etc, I’ll hopefully try to
cover them in future articles.

Manage Event Hubs with


ServiceBus360

made with

You might also like