Version Control Systems
Version Control Systems
Version control systems are a category of software tools that helps in recording
changes made to files by keeping a track of modifications done to the code.
Why Version Control system is so Important?
As we know that a software product is developed in collaboration by a group of
developers they might be located at different locations and each one of them
contributes in some specific kind of functionality/features. So in order to
contribute to the product, they made modifications in the source code(either by
adding or removing). A version control system is a kind of software that helps
the developer team to efficiently communicate and manage(track) all the
changes that have been made to the source code along with the information
like who made and what change has been made. A separate branch is created
for every contributor who made the changes and the changes aren’t merged
into the original source code unless all are analyzed as soon as the changes
are green signalled they merged to the main source code. It not only keeps
source code organized but also improves productivity by making the
development process smooth.
Benefits of the version control system:
a) Enhances the project development speed by providing efficient collaboration,
b) Leverages the productivity, expedite product delivery, and skills of the
employees through better communication and assistance,
c) Reduce possibilities of errors and conflicts meanwhile project development
through traceability to every small change,
d) Employees or contributor of the project can contribute from anywhere
irrespective of the different geographical locations through this VCS,
e) For each different contributor of the project a different working copy is
maintained and not merged to the main file unless the working copy is
validated. A most popular example is Git, Helix core, Microsoft TFS,
f) Helps in recovery in case of any disaster or contingent situation,
g) Informs us about Who, What, When, Why changes have been made.
Use of Version Control System:
A repository: It can be thought of as a database of changes. It contains all
the edits and historical versions (snapshots) of the project.
Copy of Work (sometimes called as checkout): It is the personal copy of
all the files in a project. You can edit to this copy, without affecting the work
of others and you can finally commit your changes to a repository when you
are done making your changes.
Types of Version Control Systems:
Local Version Control Systems
Centralized Version Control Systems
Distributed Version Control Systems
Local Version Control Systems: It is one of the simplest forms and has a
database that kept all the changes to files under revision control. RCS is one of
the most common VCS tools. It keeps patch sets (differences between files) in
a special format on disk. By adding up all the patches it can then re-create what
any file looked like at any point in time.
Centralized Version Control Systems: Centralized version control systems
contain just one repository and each user gets their own working copy. You
need to commit to reflecting your changes in the repository. It is possible for
others to see your changes by updating.
Two things are required to make your changes visible to others which are:
You commit
They update
In centralized source control, there is a server and a client. The server is the master
repository that contains all of the versions of the code. To work on any project, firstly
user or client needs to get the code from the master repository or server. So the client
communicates with the server and pulls all the code or current version of the code from
the server to their local machine. In other terms we can say, you need to take an update
from the master repository and then you get the local copy of the code in your system. So
once you get the latest version of the code, you start making your own changes in the
code and after that, you simply need to commit those changes straight forward into the
master repository. Committing a change simply means merging your own code into the
master repository or making a new version of the source code. So everything is
centralized in this model.
There will be just one repository and that will contain all the history or version of the
code and different branches of the code. So the basic workflow involves in the
centralized source control is getting the latest version of the code from a central
repository that will contain other people’s code as well, making your own changes in the
code, and then committing or merging those changes into the central repository.
In distributed version control most of the mechanism or model applies the same as
centralized. The only major difference you will find here is, instead of one single
repository which is the server, here every single developer or client has their own server
and they will have a copy of the entire history or version of the code and all of its
branches in their local server or machine. Basically, every client or user can work locally
and disconnected which is more convenient than centralized source control and that’s
why it is called distributed.
You don’t need to rely on the central server, you can clone the entire history or copy of
the code to your hard drive. So when you start working on a project, you clone the code
from the master repository in your own hard drive, then you get the code from your own
repository to make changes and after doing changes, you commit your changes to your
local repository and at this point, your local repository will have ‘change sets‘ but it is
still disconnected with the master repository (master repository will have different ‘sets
of changes‘ from each and every individual developer’s repository), so to communicate
with it, you issue a request to the master repository and push your local repository code to
the master repository. Getting the new change from a repository is called “pulling” and
merging your local repository’s ‘set of changes’ is called “pushing“.
It doesn’t follow the way of communicating or merging the code straight forward to the
master repository after making changes. Firstly you commit all the changes in your own
server or repository and then the ‘set of changes’ will merge to the master repository.
Below is the diagram to understand the difference between these two in a better way:
Basic Difference with Pros and Cons
Centralized version control is easier to learn than distributed. If you are a beginner
you’ll have to remember all the commands for all the operations in DVCS and
working on DVCS might be confusing initially. CVCS is easy to learn and easy to set
up.
DVCS has the biggest advantage in that it allows you to work offline and gives
flexibility. You have the entire history of the code in your own hard drive, so all the
changes you will be making in your own server or to your own repository which
doesn’t require an internet connection, but this is not in the case of CVCS.
DVCS is faster than CVCS because you don’t need to communicate with the remote
server for each and every command. You do everything locally which gives you the
benefit to work faster than CVCS.
Working on branches is easy in DVCS. Every developer has an entire history of the
code in DVCS, so developers can share their changes before merging all the ‘sets of
changes to the remote server. In CVCS it’s difficult and time-consuming to work on
branches because it requires to communicate with the server directly.
If the project has a long history or the project contain large binary files, in that case,
downloading the entire project in DVCS can take more time and space than usual,
whereas in CVCS you just need to get few lines of code because you don’t need to
save the entire history or complete project in your own server so there is no
requirement for additional space.
If the main server goes down or it crashes in DVCS, you can still get the backup or
entire history of the code from your local repository or server where the full revision
of the code is already saved. This is not in the case of CVCS, there is just a single
remote server that has entire code history.
Merge conflicts with other developer’s code are less in DVCS. Because every
developer work on their own piece of code. Merge conflicts are more in CVCS in
comparison to DVCS.
In DVCS, sometimes developers take the advantage of having the entire history of the
code and they may work for too long in isolation which is not a good thing. This is
not in the case of CVCS.
1. CENTRALIZED SYSTEMS:
We start with centralized systems because they are the most intuitive and easy to
understand and define.
Centralized systems are systems that use client/server architecture where one or more
client nodes are directly connected to a central server. This is the most commonly used
type of system in many organizations where a client sends a request to a company server
and receives the response.
2. DECENTRALIZED SYSTEMS:
These are other types of systems that have been gaining a lot of popularity, primarily
because of the massive hype of Bitcoin. Now many organizations are trying to find the
application of such systems.
In decentralized systems, every node makes its own decision. The final behavior of the
system is the aggregate of the decisions of the individual nodes. Note that there is no
single entity that receives and responds to the request.
3. DISTRIBUTED SYSTEMS:
This is the last type of system that we are going to discuss. Let’s head right into it!
In decentralized systems, every node makes its own decision. The final behaviour of the
system is the aggregate of the decisions of the individual nodes. Note that there is no
single entity that receives and responds to the request.
Figure – Distributed system visualization
Example –
Google search system. Each request is worked upon by hundreds of computers which
crawl the web and return the relevant results. To the user, Google appears to be one
system, but it actually is multiple computers working together to accomplish one single
task (return the results to the search query).
Characteristics of Distributed System – :
Concurrency of components: Nodes apply consensus protocols to agree on the same
values/transactions/commands/logs.
Lack of a global clock: All nodes maintain their own clock.
Independent failure of components: In a distributed system, nodes fail
independently without having a significant effect on the entire system. If one node
fails, the entire system sans the failed node continues to work.
Scaling –
Horizontal and vertical scaling is possible.
Components of Distributed System –
Components of Distributed System are,
Node (Computer, Mobile, etc.)
Communication link (Cables, Wi-Fi, etc.)
Architecture of Distributed System –
peer-to-peer – all nodes are peers of each other and work towards a common goal
client-server – some nodes become server nodes for the role of coordinator, arbiter,
etc.
n-tier architecture – different parts of an application are distributed in different nodes
of the systems and these nodes work together to function as an application for the
user/client
Limitations of Distributed System –
Difficult to design and debug algorithms for the system. These algorithms are difficult
because of the absence of a common clock; so no temporal ordering of
commands/logs can take place. Nodes can have different latencies which have to be
kept in mind while designing such algorithms. The complexity increases with the
increase in the number of nodes. Visit this link for more information
No common clock causes difficulty in the temporal ordering of events/transactions
Difficult for a node to get the global view of the system and hence take informed
decisions based on the state of other nodes in the system
Advantages of Distributed System –
Low latency than a centralized system – Distributed systems have low latency
because of high geographical spread, hence leading to less time to get a response
Disadvantages of Distributed System –
Difficult to achieve consensus
The conventional way of logging events by absolute time they occur is not possible
here
Applications of Distributed System –
Cluster computing – a technique in which many computers are coupled together to
work so that they achieve global goals. The computer cluster acts as if they were a
single computer
Grid computing – All the resources are pooled together for sharing in this kind of
computing turning the systems into a powerful supercomputer; essentially.
Use Cases –
SOA-based systems
Multiplayer online games
Organizations Using –
Apple, Google, Facebook.