Design
Design
To create such a system, there will be a Main Server. The Main Server is the one that will
implicitly entertain all the tasks. It will accept tasks from all the users/clients and will assign
them to its workers or subservers. There will be three main entities:
Main Server
Sub servers/Chunk servers/ workers
Clients/users
A client will connect to the server and issue a request like getting a word count in a document or
finding its instances. The main server checks that which of its sub servers are free and assigns
then task to it. This can be implemented by a round robin queue, aka all the free sub servers are
in a queue, main server pops the first subserver from the queue and assigns one task to it. When
the subserver is free, it is inserted at the end of the queue. There are a fixed number of
subservers but any number of users can make a request, they’ll only be serviced if there are free
workers.
To implement this, python will be used. There will be one Main server bound to an address at
which clients will make requests and subservers will receive requests from that address and as
well as return the outputs. Same data and files would be available to all the servers and
subservers, but that’s usually not the case. In the Google file system, specific data is assigned to
each subserver and the main server only assigns tasks related to that specific data to the
subserver.
Sub servers will be in constant communication with main server through heartbeat signals.
There will be no direct communication between clients and subservers. If a subserver doesn’t
reply to the heartbeat signal and the main server declares it dead, then the mainserver assigns that
task to another subserver and that subserver is removed from the queue until it comes back
online. There is only one main server, so if it goes down due to any reason, the whole system
will go down.