Web Thrones : what happens when we enter google.com in the browser
introduction
What happens when we enter google.com in the browser? This may be a famous interview question, but without a doubt, we, as tech enthusiasts, have all wondered how it works. Is it magic? Maybe, but in the realm of the internet and technologies, there's a rule that can help you discover the magic of things and make them crystal clear. The rule is 'divide and rule.' So, in this article, I am going to try to divide the process of entering google.com, from typing that desired URL to retrieving the content of the requested link (webpage).
So, much like the TV show "Game of Thrones", where houses utilize their power to control the realm and vie for the throne, the web stack has its own set of houses. However, in contrast, they employ their power to serve us, the clients, and fulfill the requested demands.
We can categorize these web stack houses (components) as follows:
House of DNS
House of TCP/IP
House of Firewalls
House of Load Balancers
House of Web servers
House of App Servers
Now, we are going to delve into what happens under the hood by explaining each house and its role in this series of processes.
📌 if you don't want to go through all this pargarphs , you can jump into the last diagram and the "conclusion".
Typing the URL(https://www.google.com)
When we open a web browser, such as Chrome, Mozilla, Brave, etc., we enter a URL, which is the address of the webpage we are looking for. The term "URL" stands for Uniform Resource Locator, and it is composed of several components, including the protocol used, the domain name, the path, and, if applicable, the file.
In our example, the protocol in use is HTTPS, which stands for Hypertext Transfer Protocol Secure, providing a secure communication channel over the Internet. The domain name is google.com, representing the specific website we want to access. The path is indicated by the forward slash '/', and in this case, it is the root directory.
Understanding the URL is just the first step; the subsequent task involves transforming that URL (name) into a functional entity. In the digital realm, words lack inherent meaning, and numerical values take precedence. Hence, it becomes imperative to convert the name into a sequence of numbers for practical use within the vast domain of IT. This is where the Domain Name System (DNS) comes into action, playing a crucial role in facilitating this translation process..
The DNS : resolving the domain name into an IP address.
The DNS is responsible for resolving a domain name into an IP address. In the context of DNS, "resolving" means finding the corresponding IP address for a given domain name. It's akin to a phone number book where each domain name is associated with a specific IP address, much like a phone number is linked to an individual. For example, when resolving "google.com," the process involves retrieving the IP address set for that specific name. Computers and devices on the internet communicate with each other using these IP addresses. In essence, DNS serves as a crucial system, acting like a phone book, facilitating the connection and communication between various devices through their respective IP addresses.
and this process happens as follows :
Host Files and Browser Caches:Before initiating a DNS query, the browser initially examines its local resources for the IP address. This involves checking the host files, such as /etc/hosts on Unix-like systems (Linux and macOS). On Windows, the host file is typically situated at C:\Windows\System32\drivers\etc\hosts on the user's device. Simultaneously, the browser also inspects its cache. Host files contain mappings of domain names to IP addresses, and the browser cache stores recent DNS resolutions. If the necessary information is discovered in either of these local resources, the browser can bypass the DNS query process, resulting in a quicker page load.
DNS Query Initiation:If the IP address is not found in the local resources, the browser initiates a DNS query to resolve the domain name www.google.com to an IP address.
Root Name Server:The DNS query starts at the root name server, which is the initial point of contact in the DNS hierarchy. The root name server provides information about the Top-Level Domain (TLD) authoritative name server responsible for the specific TLD of the domain in question (in this case, ".com").
TLD Authoritative Name Server:The browser then contacts the TLD authoritative name server for the ".com" TLD. This server holds information about the authoritative name server for the next level, which is "google.com."
Authoritative Name Server for www.google.com:The browser now communicates with the authoritative name server for "google.com." This server contains crucial information—the IP address—associated with www.google.com.
IP Address Retrieval:The authoritative name server responds to the browser's query with the IP address linked to www.google.com.
Note : The browser stores this obtained IP address in its cache for future use. This helps in speeding up subsequent visits to www.google.com by avoiding the need for a complete DNS resolution if the address remains unchanged.
<<Now that we have obtained the IP address of google.com, the next step is establishing a connection with it. This process is akin to calling a friend on the phone before making a request; similarly, in the digital realm, the TCP/IP protocol takes on a majestic role.>>
TCP/IP CONNECTION:
TCP/IP stands for Transmission Control Protocol/Internet Protocol. It is a set of protocols that govern how data is transmitted over the internet. This protocol consists of four layers:
this protocol helps us establishing a connection with the server , so we can request it whatever we want , this connection referred to as :three-way handshake, and It ensures that both the sender and receiver are ready for data transmission
SYN (Synchronize):The process begins with the initiating device, often referred to as the client, sending a TCP segment with the SYN (synchronize) flag set to the receiving device, known as the server.This segment contains a sequence number, which helps in organizing the data.
SYN-ACK (Synchronize-Acknowledge):Upon receiving the SYN segment, the server responds with a TCP segment that has both the SYN and ACK (acknowledge) flags set.The acknowledgment number in this segment is set to the client's sequence number incremented by one, confirming the receipt of the client's SYN.
ACK (Acknowledge):The final step involves the client sending another TCP segment, now with only the ACK flag set.The acknowledgment number in this segment is set to the server's sequence number incremented by one, confirming the reception of the server's SYN-ACK.
Actually, there's a security check that happens before the connection takes place. In the world of TCP/IP connections, imagine a dynamic setting similar to a digital battleground. Like a protective front line fending off threats and adversaries, the internet relies on a dedicated defense mechanism. This crucial defense front is managed by firewalls, serving as vigilant guardians that oversee, assess, and regulate the flow of digital traffic
Firewalls : filtering incoming and outgoing traffic .
A firewall is a network system that filters incoming and outgoing traffic; it can be hardware, software, or a combination of both. Positioned in front of the components we wish to examine, it scrutinizes each traffic flow heading towards it. In the previous example of a TCP/IP connection, the firewall checks whether the connection is permitted to occur.
Firewalls filter network traffic based on various criteria, including the source and destination IP addresses, specific port numbers (e.g., 80, 443, 21), and protocol types such as TCP, UDP, or ICMP. Administrators leverage these criteria to set up rules that govern the flow of incoming and outgoing data
HTTP/HTTPS/SSL:
After the successful establishment of a connection through the three-way handshake, the client and server are poised to exchange data. In the realm of web communication, this typically involves the transmission of HTTP or HTTPS requests and the corresponding responses. HTTP (Hypertext Transfer Protocol) and its secure counterpart, HTTPS (Hypertext Transfer Protocol Secure), define how information is exchanged between the client and server. While HTTP sends data in plain text, HTTPS employs SSL/TLS (Secure Sockets Layer/Transport Layer Security) protocols to encrypt the transmitted data, enhancing security and privacy.
SSL/TLS acts like a secure tunnel between the client and server, encrypting the data transmitted during this exchange. This encryption, often compared to a protective tunnel, ensures that even if intercepted, the data remains unreadable to unauthorized entities, enhancing the overall security of the communication process.
Think of SSL like a special dance in the digital world. It's a secret handshake and a private routine known only to your computer and the website. This dance makes sure your online conversations stay safe and secure.
HTTPS operates on port 443, and when you enter 'https://www.google.com' into your browser, the browser, acting as the client, sends an HTTPS request to the server. This request, for example, could take the form of a simple 'GET' method, asking the server to retrieve the webpage. Importantly, both the request from the browser and the subsequent response from the server traverse through the firewall, which allows traffic on port 443 for HTTPS, ensuring secure and authorized communication.
an example of request using GET method :
Until now, we have identified the IP address of the server hosting google.com, acknowledged its communication through a TCP/IP connection, and recognized the necessity of a firewall to filter traffic. Additionally, we understand that the browser or client employs the HTTPS protocol for sending requests. Yet, do we truly comprehend the significance of a web server, an application server, or even a database? Let's delve into each of these components:
The web server:
in the narrative of accessing Google.com, the web server plays a pivotal role as the initial point of contact. Situated at the forefront, it receives incoming requests, such as the one initiated when we enter 'google.com' in our browser. The web server's responsibility is to handle these requests, retrieve the necessary resources, and send them back to the client. It manages static content like HTML, CSS, and images, efficiently delivering them to the user's browser.
examples of webservrs : we have Nginx , Apache HTTP Server, Microsoft Internet Information Services (IIS)...etc
The app server :
the application server comes into play for dynamic content. It processes more complex requests that may involve business logic, user authentication, or customized data generation. In the case of Google, it could be responsible for handling search queries.
The Database:
Behind the scenes, a database stores and manages the vast amounts of data needed for dynamic content generation. This could include search results, user preferences, and more.
The web server, application server, and database collaborate harmoniously in this intricate dance – the web server fetching static content, the application server handling dynamic aspects, and the database providing and storing the necessary data. Together, they create the seamless and dynamic user experience we encounter when navigating google.com
Now that we've got the webpage from Google's servers, it's clear that Google is a big company. Since many people might want to visit google.com at the same time, it could be too much for their servers to handle all those requests. This is where the load balancer comes in to help manage everything
The LoadBalancer:
A load balancer plays a crucial role in distributing incoming network traffic across multiple servers to ensure efficient utilization of resources, enhance performance, and achieve high availability.
A load balancer is like a traffic policer (Ta7iya Gadarmia 👮 ), at a busy intersection, directing vehicles (network traffic) to different lanes, ensuring a smooth flow and preventing congestion on any one route.
To achieve this smoothness and ensure a flawless experience, companies deploy their web pages on multiple servers. The load balancer is then tasked with managing access to these servers. When you request the google.com webpage, it is stored on more than one server. However, when the request reaches the load balancer, it is the one that decides which server to send you to, based on different algorithms like :
Round Robin: Requests are distributed sequentially to each server in the rotation. The load balancer cycles through the list of servers, sending each new request to the next server in line.
Least Connections:The load balancer directs traffic to the server with the fewest active connections. This ensures that incoming requests are distributed to servers that are currently less busy.
and there's more than this algos : Least Response Time, Weighted Round Robin, IP Hash...etc
now let's connect the dots and sum all of this in a diagram :
conclusion:
When we type 'google.com' in the browser, a query is sent to the DNS to resolve the domain name and obtain its IP address. This allows us to initiate an HTTPS request to the servers hosting 'google.com.' Using the obtained IP address, a TCP/IP connection occurs, establishing a handshake with the servers to build trust and facilitate secure data transfer. It's important to note that this connection passes through a firewall, configured with specific parameters such as ports and protocol types.
Subsequently, an HTTPS request is dispatched to the server, seeking the desired web page. The request first goes through a load balancer, a strategic player that decides which server to connect with. The chosen server receives our request, and upon approval, it diligently delivers the HTML, CSS, and JavaScript components, rendering the complete web page. This marks the end of the this web battle , until more requests follow in this dynamic web interaction.
Web Developer @ Ministère de la jeunesse, de la Culture et de la Communication | MOS, Front End
1yAmazing! Your article is an exciting journey through the digital world! I never thought that the journey of writing a URL could be so exciting. The "Game of Thrones" analogy blew my mind, and your clear explanations kept me on the edge of my seat. good job! I'm proud of you, my colleague and my partner in crime hahaha. too much love!!
Étudiant à Université Sidi Mohammed Ben Abdellah-Fès
1yA detailed well-written in-depth walkthrough! Keep it up 💪