Networking Tutorial - TCPIP Over Ethernet
Networking Tutorial - TCPIP Over Ethernet
Welcome to a new networking tutorial, based upon the most common technology - TCP/IP over
Ethernet. Many of the principles will apply to other technologies, but for now I'm aiming for the simple,
rather than totally complete, approach. Just what you need to know to get the job done.
This tutorial is OS-neutral; it uses Linux, Windows and Unix in its examples, but to be honest, those are
small details. How TCP/IP works over Ethernet is the same regardless of the OS.
This tutorial skips a lot of the detail which could be written; It is aimed at the new student of TCP/IP who
wants to understand how data actually gets from one machine to another over the network. That is not
to say that it is a trivial, high-level tutorial. The goal is that the reader should learn the following
information by the end of the tutorial:
I intend to keep adding to the tutorial over time; please let me know what you want. There is certainly a
lot more detail to be dealt with, but I do want to ensure that the text is kept clear. That is more important
than dealing with esoteric issues.
For example, I vow never to mention the OSI 7-layer model. Nobody uses it, yet every networking book
starts by explaining what it is.
In this tutorial, we will imagine a small configuration with a few servers (machines) on a few different
networks. We will start with a single network and by the end of the tutorial, will have built up to the
multiple networks shown here.
Don't be put off if the diagram looks small and therefore trivial; This small network diagram provides
plenty of detail to get our teeth into.
A Single Network
If A wants to talk to B, well, they're on the same network, so A addresses the packet directly to B:
Unfortunately, it's not as simple as that. The IP address identifies the machines at a software (logical)
level, but the physical (MAC) layer isn't the same as the logical (IP) layer.
• The IP layer needs to be able to route from Alaska to Zebediela. It works at a relatively high level.
• The MAC layer only needs to talk to machines on the local network (LAN). It works at a low
level.
Unfortunately, it's not always as simple as the previous page implied. The IP address identifies the
machines at a software level, but on the wire, a different type of addressing is used, so some additional
information is also required:
What's that about MAC addresses? Those are the hardware addresses of the network cards installed in
those machines. Any device receiving the packet will only process the packet if it matches their hardware
address (or the special broadcast address, which we'll deal with in a minute). This address is assigned by
the hardware manufacturer, from the address pool allocated to them by the IEEE. If you have an Intel
network card, you'll have an Intel MAC address (maybe 00:02:B3:xx:xx:xx). A 3COM network card will
have a 3COM MAC address (maybe 00:04:0B:xx:xx:xx). This is also called the Ethernet address, or the
Physical address. As the ethernet MAC address is a very large number (displayed in hexadecimal (base
16) for clarity - B's address above converts to the number 3,988,735,369,774), every card in the world can
be (and is) unique.
It is possible to change your MAC address, but there is rarely a need to do so.
Broadcast
Hang on, how does "A" know what "B"'s MAC address is? If we look back up to the ifconfig output, we
can see the "Bcast," or Broadcast address. This is configured to be the highest IP address available on the
network. As this network is 192.168.1.0 - 192.168.1.255, the broadcast address is 192.168.1.255. This is a
special address. If A wants to know "B"'s MAC address, it can broadcast a packet, addressed to
192.168.1.255, using ARP, asking who has 192.168.1.2. "B" (or potentially another device on the
network) will reply with "B"'s MAC address. A packet sent to the broadcast address looks like this:
Any machine on the network which knows the answer (but usually "B" itself) will reply with a fully-
populated packet, including its own MAC address - any outgoing packet always includes the sender's IP
and MAC address. This way, "A" can learn "B"'s MAC address if it needs to send it a packet.
The network card (NIC) listens to packets sent to itself, and also to packets sent to FF:FF:FF:FF:FF:FF. In
a similar way, the IP Stack will listen to packets addressed to the IP broadcast address - 192.168.1.255 in
this case (so long as the MAC address matches, otherwise the packet would already have been discarded
by the NIC).
If a packet is being sent to the local network (eg, "A" sending a packet to "B"), it will need B's MAC
address (via ARP, as discussed above). If it is sending to another network, it will not need the destination's
MAC address, just that of the router it is sending the packet to. So, for G to send a packet to F, the packet
sent by G would look like this:
However, the packet received by F (from the firewall) would look like this:
So G and F never know each others' MAC address; they don't need to. The firewall knows both, because it
talks directly to both hosts.
In the same way, when your PC talks to www.google.com, it does not need to know anything about
Google's physical address, only the address of your ISP's router. At a higher level, you personally don't
need to know Google's IP address (eg, 64.233.183.147), only the TCP name (www.google.com). This is
how TCP/IP blend together; IP deals with the "internet" side of things, whilst TCP deals with the higher
levels.
Netmask
The key to understanding IP routing is the netmask. The netmask tells us whether we can communicate
directly with another machine, or if we need to go via a router. If A wants to talk to B, well, they're on the
same network, so A addresses the packet directly to B. If A wants to talk to E, it will have to send the
packet to the (routing) firewall between those networks, as it cannot send directly to E.
But how does "A" know when to send a simple packet and when to do the harder work?
If we assume that box "A" is Linux, and box "B" is Windows, we will see the following: (may look
strange if your browser window is narrow)
192 168 1 1
A
11000000 10101000 00000001 00000001
255 255 255 0
Mask
11111111 11111111 11111111 00000000
192 168 1 2
B
11000000 10101000 00000001 00000010
Result Network Network Network Host
We need to perform a logical AND on the IP addresses and Netmask. We do this by looking down the
columns; a "1" in the Netmask means that if both IP addresses are the same in that column, then they are
on the same network, a "0" means that these bits can differ between hosts on the same network. Therefore,
the 1's are referred to as the network address, and the 0's are referred to as the host address. In this case,
192.168.1.0 is the (common) network address, so .1 (for A) and .2 (for B) is the host address.
Please see Bases for more information about Base 2 (Binary) and Base 16 (Hexadecimal). See /xx
notation for how this makes the /xx notation make sense, but in a nutshell, the example above has 24 "1"s
in a row, so it is a /24 network.
This means that for A to communicate with B, it can create a simple packet, like this:
Source IP 192.168.1.1 (A)
Destination IP 192.168.1.2 (B)
Data Hello B! This is the Data
Routing then, works at the next level. What happens when A wants to talk to E? It could broadcast an
ARP request, but E would not see the request, so it would not reply. On this scale, that might seem to be a
limitation, but should everyone really keep asking www.google.com for a physical address? It makes
sense that the physical layer stays at the network level. Beyond that, IP (Internet Protocol) takes over, so
the physical layer is not necessary.
Instead, A finds the IP address for E, via whatever method it is configured to use - /etc/hosts, DNS, LDAP,
etc. It then compares netmasks:
192 168 1 1
A
11000000 10101000 00000001 00000001
255 255 255 0
Mask
11111111 11111111 11111111 00000000
192 168 2 3
E
11000000 10101000 00000010 00000011
Result Network Network Network Host
All that "A" knows, is that its netmask doesn't match E's address completely, for all the bits (marked
"Network", not "Host") that the netmask tell it that it needs to match, so it will have to find a router on the
same network as itself in order to communicate with E. There is often only one router, configured as a
default router. In this case though, we have a few routers to choose from.
The netstat utility shows the routes on a *nix server (Solaris in this example) like this (in the example
diagram shown, this is for "G", because it covers more detail than an example for "A" would provide):
This server is configured as 192.168.1.4 and 192.168.2.65, so it is on two different networks, via NICs
hme0 and hme1 respectively. The first line tells it that to get to the 192.168.1.0 network, it can go direct
via 192.168.1.4 (itself) on the hme0 interface. For this, it will need the MAC address of the server it wants
to talk to (A, B or the firewall); if it's not in the ARP table, it will have to ask for it as discussed above.
The second line is the multicast address. You can safely ignore that for now :-)
The third line tells it that to get to the 192.168.2.64 network, it can go via (its own) 192.168.2.65 interface
on hme1.
The fourth line tells it that the default router is at 192.168.1.3. If it needs to get to 192.168.2.0/26 (or any
other network), it needs to go via that router. It may not get there, but the others certainly won't. The
default router is the "last resort"; the other, explicit, routes, are for specific networks. The default router is
usually connected to lots of networks, either directly or indirectly. The useful thing about this is that G
does not need to be explicitly told about that network; if it needs to communicate with the network, it can
simply send a packet to its default router. If you type ping 192.168.3.29 then it will send a packet to the
default router, just in case there is a device at 192.168.3.29. "G" doesn't need to know if there is, or what
its netmask is. It just sends the packet to the router, which deals with the request. In this case, a packet for
192.168.2.0/26 would get passed on, whilst a packet for 192.168.3.29 would simply get no response. The
router, if it can access 192.168.3.x, can sort out the netmask issues on G's behalf.
The final line deals with "localhost", a special address (127.0.0.1) which on any machine will point back
to itself. This is useful for debugging, as well as for non-networked machines which need a network stack.
A cruel joke is to tell a newbie to try hacking 127.0.0.1, or telling them that 127.0.0.1 is an FTP site with a
copy of their hard disk, etc. (examples). In fact, the entire 127.0.0.0/8 (that is, 127.x.x.x) is reserved for
loopback. It's just very rare to need more than one loopback address, so the popular one is 127.0.0.1.
As for the other fields reported by netstat, Flag "U" means the host is Up, "UG" means "Up and a route to
a Gateway (which may pass the packet on)"; "UH" means "Up and a route to a Host (which won't)".
From this information, the Operating System can determine the most useful router to choose for a
particular destination. On Solaris, the /etc/netmasks file tells the OS about particular netmasks for
given networks; otherwise, the old, pre-CIDR standard is followed, whereby the IP address itself suggests
its netmask:
You can see that each class (A,B,C) has a "Private" segment in the middle, which is non-routable. Other
than that, their netmasks are 255.0.0.0, 255.255.0.0. and 255.255.255.0 (ff000000, ffff0000, ffffff00
respectively, in Hex). That turned out to be a little too simplistic as internet usage grew, so we now have
Classless Inter-Domain Routing (CIDR), which forgets about classes, and just says that a network can
have any netmask. The closer you get to such a network, the more likely you are to need to know about
how it is configured (hence /etc/netmasks, and CIDR in DNS, etc).