0% found this document useful (0 votes)

132 views

The Origin of The Name "Zookeeper"

Apache Zookeeper is a coordination service that enables synchronization across distributed applications like Hadoop. It helps manage configuration, implement reliable messaging, run redundant services, and synchronize process execution across nodes in a cluster. It was developed at Yahoo and named Zookeeper because distributed systems are difficult to manage like a "zoo".

Uploaded by

akurathikotaiah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views

The Origin of The Name "Zookeeper"

Uploaded by

akurathikotaiah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Apache Zookeeper is a coordination service for distributed application that enables synchronization

across a cluster.
So, in case of Hadoop, ZooKeeper will help you with coordination between Hadoop nodes.

For example, it makes it easier to:

 Manage configuration across nodes. If you have dozens or hundreds of nodes, it

becomes hard to keep configuration in sync across nodes and quickly make changes.
ZooKeeper helps you quickly push configuration changes.
 Implement reliable messaging. With ZooKeeper, you can easily implement a
producer/consumer queue that guarantees delivery, even if some consumers or even one of
the ZooKeeper servers fails.
 Implement redundant services. With ZooKeeper, a group of identical nodes (e.g. database
servers) can elect a leader/master and let ZooKeeper refer all clients to that master server. If
the master fails, ZooKeeper will assign a new leader and notify all clients.
 Synchronize process execution. With ZooKeeper, multiple nodes can coordinate the start
and end of a process or calculation. This ensures that any follow-up processing is done only
after all nodes have finished their calculations.

The Origin of the Name “ZooKeeper”

ZooKeeper was developed at Yahoo! Research. Yahoo had been working on ZooKeeper for
a while and pitching it to other groups. At the time the ZooKeeper group had been working
with the Hadoop team and had started a variety of projects with the names of animals,
Apache Pig being the most well known. As the group started talking about different possible
names, one of the group members mentioned that they should avoid another animal name
because it started to sound like a zoo. That is when it clicked: distributed systems are a zoo.
They are chaotic and hard to manage, and ZooKeeper is meant to keep them under control.

Projects which uses ZooKeeper

 Apache BookKeeper- BookKeeper(ZooKeeper subproject) is a replicated service to
reliably log streams of records.
 Apache Hadoop MapReduce- The next generation of Hadoop MapReduce (colled
"Yarn") uses ZooKeeper.
 Apache HBase- HBase is the Hadoop database. Its an open-source, distributed,
column-oriented store modeled after the Google paper, Bigtable. HBase uses
ZooKeeper for master election, server lease management, bootstrapping, and
coordination between servers.
 Apache Kafka- Kafka is a distributed publish/subscribe messaging system. Kafka
queue consumers uses Zookeeper to store information on what has been consumed
from the queue.
 Apache Storm- Storm uses Zookeeper to store all state so that it can recover from
an outage in any of its (distributed) component services.

World without ZooKeeper

When designing a distributed system, there is typically a need for designing and developing
some coordination services:

 Name service— A naming service is a service that maps a name to some

information associated with that name. A telephone directory is a name service that
maps the name of a person to his/her telephone number. In the same way, a DNS
service is a name service that maps a domain name to an IP address. In your
distributed system, you may want to keep a track of which servers or services are up
and running and look up their status by name.
 Locking— To allow for serialized access to a shared resource in your distributed
system, you may need to implement distributed mutexes.
 Synchronization— Hand in hand with distributed mutexes is the need for
synchronizing access to shared resources. Whether implementing a producer-
consumer queue or a barrier.
 Configuration management— The configuration of your distributed system must
centrally stored and managed.This means that any new nodes joining should pick up
the up-to-date centralized configuration as soon as they join the system.
 Leader election— Your distributed system may have to deal with the problem of
nodes going down, and you may want to implement an automatic fail-over strategy.
You can do this by leader election.

Previous systems in a distributed sytems have implemented components like distributed

lock managers or have used distributed databases for coordination. While it's possible to
design and implement all of these services from scratch, it's extra work and difficult to
debug any problems, race conditions, or deadlocks. Just like you don't go around writing
your own hashing function in your code, there was a need that people shouldn't go around
writing their own name services or leader election services from scratch every time they
need it. Moreover, you could hack together a very simple group membership service
relatively easily, but it would require much more work to write it to provide reliability,
replication, and scalability. This led to the development and open sourcing of Apache
ZooKeeper, an out-of-the box reliable, scalable, and high-performance coordination service
for distributed systems.

ZooKeeper, in fact, borrows a number of concepts from these prior systems. It does not
expose a lock interface or a general purpose interface for storing data, however. The design
of ZooKeeper is specialized and very focused on coordination tasks. It is certainly possible
to build distributed systems without using ZooKeeper. ZooKeeper, however, offers
developers the possibility of focusing more on application logic rather than on arcane
distributed systems concepts. Programming distributed systems without ZooKeeper is
possible, but more difficult.

Why is Distributed Systems Coordination

Hard?
When an application starts up, all of the different processes needs to find the application
configuration. Over time this configuration may change. We could shut everything down,
redistribute configuration files, and restart, but that may incur extended periods of
application downtime during reconfiguration. Also as the load changes, we want to be able
to add or remove new machines and processes.

The problems described above are functional problems that you can design solutions for
and you can test your solutions before deployment. But the truly difficult problems
encounter, when the distributed applications have to do with faults specifically, crashes and
communication faults. These failures can crop up at any point, and it may be impossible to
enumerate all the different cases that need to be handled.

One of the diferences between single machine and distributed applications is: When a
single machine crashes, all the processes running on that machine fail. If there are multiple
processes running on the machine and a process fails, the other processes can find out
about the failure from the operating system. The operating system can also provide strong
messaging guarantees between processes. All of this changes in a distributed environment:
if a machine or process fails, other machines will keep running and may need to take over
for the faulty processes. To handle faulty processes, the processes that are still running
must be able to detect the failure; messages may be lost, and there may even be clock drift.

Okay, so we cannot have an ideal fault-tolerant, distributed, real-world system that

transparently takes care of all problems that might ever occur. We can strive for a slightly
less ambitious goal, though.

Having pointed out that the perfect solution is impossible, we can repeat that ZooKeeper is
not going to solve all the problems that the distributed application developer has to face. It
does give the developer a nice framework to deal with these problems, though.

What does a ZooKeeper do?

ZooKeeper is itself a distributed application providing services for developing a distributed
application. It coordinates a group of nodes within the cluster and maintains shared data
with effective synchronization techniques. Some of the services provided by zookeeper are:

 ZooKeeper exposes a simple interface for Naming service which identifies the nodes
in a cluster by name simialr to DNS.
 ZooKeeper provides for an easy way for you to implement distributed mutexes to
allow for serialized access to a shared resource in your distributed system.
 You can use ZooKeeper to centrally store and manage the configuration of your
distributed system. This means that any new nodes joining will pick up the up-to-date
centralized configuration from ZooKeeper as soon as they join the system. This also
allows you to centrally change the state of your distributed system by changing the
centralized configuration through one of the ZooKeeper clients.
 ZooKeeper provides off-the-shelf support for leader election which will deal with the
problem of nodes going down.

Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
From Everand
Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
Elara Drevyn
No ratings yet
Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
F5 Config Guide PDF
100% (3)
F5 Config Guide PDF
153 pages
Comparison Between Network Topologies
72% (32)
Comparison Between Network Topologies
2 pages
Comprehensive Guide On Metasploitable 2
No ratings yet
Comprehensive Guide On Metasploitable 2
6 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Zookeeper Tutorial
100% (1)
Zookeeper Tutorial
43 pages
Cap. 5: Reto de Integración de Habilidades de Packet Tracer: Topology Diagram
No ratings yet
Cap. 5: Reto de Integración de Habilidades de Packet Tracer: Topology Diagram
6 pages
HUAWEI ENS V900 Product Description
No ratings yet
HUAWEI ENS V900 Product Description
51 pages
Unit5_BDA
No ratings yet
Unit5_BDA
75 pages
zookeeper
No ratings yet
zookeeper
14 pages
Zookeeper HBase SPARK
No ratings yet
Zookeeper HBase SPARK
25 pages
Apache ZooKeeper
No ratings yet
Apache ZooKeeper
3 pages
Zookeeper Tutorial: What Is, Architecture of Apache Zookeeper
No ratings yet
Zookeeper Tutorial: What Is, Architecture of Apache Zookeeper
10 pages
Zookeeper
100% (1)
Zookeeper
42 pages
Apache Zookeeper
No ratings yet
Apache Zookeeper
28 pages
Hadoop questions
No ratings yet
Hadoop questions
61 pages
Unit-5 BDA
No ratings yet
Unit-5 BDA
96 pages
Report
No ratings yet
Report
4 pages
Zookeeper Tomwheeler Ll-20120607
No ratings yet
Zookeeper Tomwheeler Ll-20120607
23 pages
Zookeeper and Hbase
No ratings yet
Zookeeper and Hbase
43 pages
Unit V-HBase
No ratings yet
Unit V-HBase
10 pages
Zookeeper: Coordinating Your Cluster
No ratings yet
Zookeeper: Coordinating Your Cluster
13 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Apache ZooKeeper Introduction
No ratings yet
Apache ZooKeeper Introduction
4 pages
Zookeeper
No ratings yet
Zookeeper
59 pages
Apache Zookeeper
No ratings yet
Apache Zookeeper
31 pages
Lecture 5 Archof Confand Cood Systems
No ratings yet
Lecture 5 Archof Confand Cood Systems
42 pages
Zookeeper
No ratings yet
Zookeeper
28 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Zookeeper
No ratings yet
Zookeeper
4 pages
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Apache ZooKeeper - Mesosphere
No ratings yet
Apache ZooKeeper - Mesosphere
27 pages
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Zookeeper Started
No ratings yet
Zookeeper Started
7 pages
What is Zoo Keeper_ List the Benefits of It.
No ratings yet
What is Zoo Keeper_ List the Benefits of It.
1 page
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Java: Tips and Tricks to Programming Code with Java
From Everand
Java: Tips and Tricks to Programming Code with Java
Charlie Masterson
No ratings yet
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
From Everand
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
Charlie Masterson
No ratings yet
Learning SaltStack - Second Edition
From Everand
Learning SaltStack - Second Edition
Colton Myers
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
2.2 BA ZC426 RTA Apache ZooKeeper
No ratings yet
2.2 BA ZC426 RTA Apache ZooKeeper
24 pages
Terraform for Developers, Second Edition: Essentials of Infrastructure Automation and Provisioning
From Everand
Terraform for Developers, Second Edition: Essentials of Infrastructure Automation and Provisioning
Kimiko Lee
No ratings yet
Terraform for Developers, Second Edition
From Everand
Terraform for Developers, Second Edition
Kimiko Lee
No ratings yet
Zookeeper Getting Started Guide
No ratings yet
Zookeeper Getting Started Guide
5 pages
MySQL Lab Manual
From Everand
MySQL Lab Manual
Manish Soni
No ratings yet
Zookeeper Tutorial
No ratings yet
Zookeeper Tutorial
24 pages
Learning Azure DevOps: Outperform DevOps using Azure Pipelines, Artifacts, Boards, Azure CLI, Test Plans and Repos
From Everand
Learning Azure DevOps: Outperform DevOps using Azure Pipelines, Artifacts, Boards, Azure CLI, Test Plans and Repos
Myra Kelnor
No ratings yet
Learning Azure DevOps
From Everand
Learning Azure DevOps
Myra Kelnor
No ratings yet
Unit -5 Updated Mhm
No ratings yet
Unit -5 Updated Mhm
25 pages
Module 12 Zookeeper - Cluster Distributed Coordination Service
No ratings yet
Module 12 Zookeeper - Cluster Distributed Coordination Service
26 pages
Mastering Apache: From Basics to Advanced Administration
From Everand
Mastering Apache: From Basics to Advanced Administration
Dargslan
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
MIT 6.824 - Lecture 8 - ZooKeeper
No ratings yet
MIT 6.824 - Lecture 8 - ZooKeeper
1 page
Build Your First Home Server
From Everand
Build Your First Home Server
R.R. Arnob
No ratings yet
Software Containers: The Complete Guide to Virtualization Technology. Create, Use and Deploy Scalable Software with Docker and Kubernetes. Includes Docker and Kubernetes.
From Everand
Software Containers: The Complete Guide to Virtualization Technology. Create, Use and Deploy Scalable Software with Docker and Kubernetes. Includes Docker and Kubernetes.
Jordan Lioy
No ratings yet
Oracle Recovery Appliance Handbook: An Insider’S Insight
From Everand
Oracle Recovery Appliance Handbook: An Insider’S Insight
Ramesh Raghav
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Infrastructure as Code with OpenTofu: A perfect Terraform alternative to manage compute, storage, networking and other infrastructure resources
From Everand
Infrastructure as Code with OpenTofu: A perfect Terraform alternative to manage compute, storage, networking and other infrastructure resources
Tyran Vosk
No ratings yet
Infrastructure as Code with OpenTofu
From Everand
Infrastructure as Code with OpenTofu
Tyran Vosk
No ratings yet
Zookeeper Programmers
No ratings yet
Zookeeper Programmers
20 pages
Mini Project2 Final
No ratings yet
Mini Project2 Final
9 pages
Binary Search
No ratings yet
Binary Search
3 pages
Sorting Algorithms
No ratings yet
Sorting Algorithms
7 pages
Btree All Operations Notes With Programs
No ratings yet
Btree All Operations Notes With Programs
16 pages
C Program To Implement Stack Using Array
100% (1)
C Program To Implement Stack Using Array
3 pages
Graphical Input and Output: Example 1
No ratings yet
Graphical Input and Output: Example 1
2 pages
A Circular Queue in C
No ratings yet
A Circular Queue in C
4 pages
Program On Circular Queue
No ratings yet
Program On Circular Queue
3 pages
Data Structures Introduction
No ratings yet
Data Structures Introduction
6 pages
Transaction Management and Concurrency Control
No ratings yet
Transaction Management and Concurrency Control
9 pages
DataInputStream Notes
No ratings yet
DataInputStream Notes
3 pages
Q11
No ratings yet
Q11
4 pages
4 Distributed Systems Types: 1. Client-Server
No ratings yet
4 Distributed Systems Types: 1. Client-Server
3 pages
NORMALIZATION
No ratings yet
NORMALIZATION
9 pages
Circular Queue Operations
No ratings yet
Circular Queue Operations
4 pages
Oracle Lab Record
No ratings yet
Oracle Lab Record
3 pages
Queues Notes
100% (1)
Queues Notes
8 pages
Merge Two Arrays
No ratings yet
Merge Two Arrays
3 pages
Strings: Create String
No ratings yet
Strings: Create String
3 pages
Stack in C
No ratings yet
Stack in C
6 pages
What Is An Interface
No ratings yet
What Is An Interface
6 pages
Prim and Krushkal Algorithm
No ratings yet
Prim and Krushkal Algorithm
4 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Functions: C Program To Find Maximum and Minimum Using Functions
No ratings yet
Functions: C Program To Find Maximum and Minimum Using Functions
20 pages
LAB 1 DEP50033 SESI 1 (1) .Doc
No ratings yet
LAB 1 DEP50033 SESI 1 (1) .Doc
5 pages
TSB17512_KnownIssues
No ratings yet
TSB17512_KnownIssues
14 pages
AXIS P5676-LE PTZ Camera: High-Performance 4 MP PTZ With IR
No ratings yet
AXIS P5676-LE PTZ Camera: High-Performance 4 MP PTZ With IR
3 pages
Protocol Layering
No ratings yet
Protocol Layering
31 pages
ati-x530l-series-ds-3
No ratings yet
ati-x530l-series-ds-3
8 pages
Plesk Administrator Guide
No ratings yet
Plesk Administrator Guide
601 pages
MCP
No ratings yet
MCP
2 pages
Mac Device Setup - EN
No ratings yet
Mac Device Setup - EN
16 pages
Stvepicf9 q3 Las1 Final
100% (1)
Stvepicf9 q3 Las1 Final
9 pages
Nokia D500 DSLAM PublicInterfaceSpecification
No ratings yet
Nokia D500 DSLAM PublicInterfaceSpecification
20 pages
Avaya Communication Manager: System Capacities Table Release 3.1
No ratings yet
Avaya Communication Manager: System Capacities Table Release 3.1
36 pages
User Datagram Protocol
No ratings yet
User Datagram Protocol
6 pages
Case Study Maastricht University
No ratings yet
Case Study Maastricht University
4 pages
Microsoft Lync Training
No ratings yet
Microsoft Lync Training
37 pages
Specification Comparison List
No ratings yet
Specification Comparison List
4 pages
Bluetooth Mesh Developer Study Guide - 1. START HERE - Orientation Guide
No ratings yet
Bluetooth Mesh Developer Study Guide - 1. START HERE - Orientation Guide
5 pages
CTAP Sandboxing Sample Report v2
No ratings yet
CTAP Sandboxing Sample Report v2
17 pages
Website Details 2012 09 01 5
No ratings yet
Website Details 2012 09 01 5
5 pages
DOVECOT
No ratings yet
DOVECOT
6 pages
1800 LTE Onfiguration Guide-E2E PW Ver1.4
No ratings yet
1800 LTE Onfiguration Guide-E2E PW Ver1.4
68 pages
Data Communication and Computer Networks
No ratings yet
Data Communication and Computer Networks
244 pages
Key Factor Affcet Throughput
No ratings yet
Key Factor Affcet Throughput
9 pages
Windows Server Deployment Proposal Paper
No ratings yet
Windows Server Deployment Proposal Paper
3 pages
Network Analysis With Wireshark
No ratings yet
Network Analysis With Wireshark
17 pages
Single Line Network Design
No ratings yet
Single Line Network Design
1 page

The Origin of The Name "Zookeeper"

Uploaded by

The Origin of The Name "Zookeeper"

Uploaded by

Apache Zookeeper is a coordination service for distributed application that enables synchronization

For example, it makes it easier to:

 Manage configuration across nodes. If you have dozens or hundreds of nodes, it

The Origin of the Name “ZooKeeper”

Projects which uses ZooKeeper

World without ZooKeeper

 Name service— A naming service is a service that maps a name to some

Previous systems in a distributed sytems have implemented components like distributed

Why is Distributed Systems Coordination

Okay, so we cannot have an ideal fault-tolerant, distributed, real-world system that

What does a ZooKeeper do?

You might also like