Unit I Network Troubleshooting Components and Os: Problem Solving
Unit I Network Troubleshooting Components and Os: Problem Solving
The first step in diagnosing a network problem is to collect information. This includes
collecting information from your users as to the nature of the problems they are having, and it
includes collecting data from your network.
Troubleshooting is a form of problem solving, often applied to repair failed products or
processes. It is a logical, systematic search for the source of a problem so that it can be
solved, and so the product or process can be made operational again. Troubleshooting is
needed to develop and maintain complex systems where the symptoms of a problem can have
many possible causes. Troubleshooting is used in many fields such as engineering, system
administration, electronics, automotive repair, and diagnostic medicine. Troubleshooting
requires identification of the malfunction(s) or symptoms within a system. Then, experience
is commonly used to generate possible causes of the symptoms. Determining the most likely
cause is a process of elimination - eliminating potential causes of a problem. Finally,
troubleshooting requires confirmation that the solution restores the product or process to its
working state.
Principles of Troubleshooting
Troubleshooting requires skill
These skills are acquired through experimentation and experience
You cannot learn the resolution to every problem that exists
You can, however, learn a methodology to find and diagnose nearly every problem in
a systematic and logical manner
The following are the most common network problems:
User error
Physical connections
System needs a reboot
If these steps don't help, then it's time to move on and try other troubleshooting
options
Research on problem solving and reasoning is fundamental to understanding
troubleshooting skills
You can choose from several different methodologies of troubleshooting
These give us guidelines for logical solving problems using a step-by-step process
The first step is to determine the scope of the problem by identifying the symptoms
The next step is to collect specific information about the problem at hand
Once you have the pertinent information, then the scope is determined
Begin to isolate the problem by testing each of the causes, starting with the most
obvious first
Attempt to re-create the problem
Make only one change at a time
Test each change
Don't be afraid to ask for help
Read the documentation that came with the hardware or software
Don't forget about the obvious
Creating a Hardware Toolkit:
Crossover cable
Hardware loopback adapter
Tone generator
Cable tester or cable checker
Voltmeters
Time domain reflectometer (TDR)
Oscilloscope
Creating a Software Toolkit:
Ping
Netstat
Nbtstat
Traceroute
Network monitors
Protocol analyzer
Troubleshooting and Management
Documentation
The most important source of information is the local documentation created by you or your
predecessor. In a properly maintained network, there should be some kind of log about the
network,preferably with sections for each device. In many networks, this will be in an
abysmal state. Almost no one likes documenting or thinks he has the time required to do it. It
will be full of errors, out of date, and incomplete. Local documentation should always be read
with a healthy degree of skepticism.But even incomplete, erroneous documentation, if treated
as such, may be of value. There are probably no intentional errors, just careless mistakes and
errors of omission. Even flawed documentation can give you some sense of the history of the
system. Problems frequently occur due to multiple conflicting changes to a system. Software
that may have been only partially removed can have lingering effects. Homegrown
documentation may be the quickest way to discover what may have been on the system.
While the creation and maintenance of documentation may once have been someone else's
responsibility, it is now your responsibility. If you are not happy with the current state of your
documentation, it is up to you to update it and adopt policies so the next administrator will
not be muttering about you the way you are muttering about your predecessors.
Management Practices
Management practices will determine what you can do and how you do it. This is true both
for avoiding problems and for dealing with problems that can't be avoided.
Professionalism
To effectively administer a system requires a high degree of professionalism. This includes
personal honesty and ethical behavior. You should learn to evaluate yourself in an honest,
objective manner. It also requires that you conform to the organization's mission and
culture. Your network serves some higher purpose within your organization. It does not exist
strictly for your benefit. You should manage the network with this in mind. This means that
everything you do should be done from the perspective of a cost-benefit trade-off. It is too
easy to get caught in the trap of doing something "the right way" at a higher cost than the
benefits justify. Performance analysis is the key element.
Ego management
We would all like to think that we are irreplaceable, and that no one else could do our jobs as
well as we do. This is human nature. Unfortunately, some people take steps to make sure this
is true. The most obvious way an administrator may do this is hide what he actually does and
how his system works.This can be done many ways. Failing to document the system is one
approachleaving comments out of code or configuration files is common. The goal of such
an administrator is to make sure he is the only one who truly understands the system. He may
try to limit others access to a system by restricting accounts or access to passwords. (This can
be done to hide other types of unprofessional activities as well.
Legal and ethical considerations
A key consideration is the legality of collecting such information. Unfortunately, there is a
constantly changing legal morass with respect to privacy in particular and technology in
general. Collecting some data may be legitimate in some circumstances but illegal in others.
This depends on factors such as the nature of your operations, what published policies you
have, what assurances you have given your users, new and existing laws, and what
interpretations the courts give to these laws.
Economic considerations
Solutions to problems have economic consequences, so you must understand the economic
implications of what you do. Knowing how to balance the cost of the time used to repair a
system against the cost of replacing a system is an obvious example. Cost management is a
more general issue that has important implications when dealing with failures.One
particularly difficult task for many system administrators is to come to terms with the
economics of networking. As long as everything is running smoothly, the next biggest issue
to upper management will be how cost effectively you are doing your job. Unless you have
unlimited resources, when you overspend in one area, you take resources from another area.
Host Configurations
Utilities
Even if you plan to jump into the configuration files, you will probably want a quick
overview of the current state of the system before you begin. For this reason, we will examine
status and configuration utilities first. This approach has the advantage of being pretty much
the same from one version of Unix to the next. With configuration files, the differences
among the various flavours of Unix can be staggering. Even when the files have the same
functionality and syntax, they can go by different names or be in different directories.
Certainly, using these utilities is much simpler than looking at kernel configuration files.
Ps
The first thing any system administrator should do on a new system is run the ps command.
bsd4# ps -aux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 6590 22.0 2.1 924 616 ?? R 11:14AM 0:09.80 inetd: chargen [2
root 1 0.0 0.6 496 168 ?? Ss Fri09AM 0:00.03 /sbin/init --
root 2 0.0 0.0 0 0 ?? DL Fri09AM 0:00.52 (pagedaemon)
root 3 0.0 0.0 0 0 ?? DL Fri09AM 0:00.00 (vmdaemon)
root 4 0.0 0.0 0 0 ?? DL Fri09AM 0:44.05 (syncer)
16
root 100 0.0 1.7 820 484 ?? Ss Fri09AM 0:02.14 syslogd
daemon 109 0.0 1.5 828 436 ?? Is Fri09AM 0:00.02 /usr/sbin/portmap
root 141 0.0 2.1 924 616 ?? Ss Fri09AM 0:00.51 inetd
root 144 0.0 1.7 980 500 ?? Is Fri09AM 0:03.14 cron
root 150 0.0 2.8 1304 804 ?? Is Fri09AM 0:02.59 sendmail: accepti
root 173 0.0 1.3 788 368 ?? Is Fri09AM 0:01.84 moused -p /dev/ps
root 213 0.0 1.8 824 508 v1 Is+ Fri09AM 0:00.02 /usr/libexec/gett
root 214 0.0 1.8 824 508 v2 Is+ Fri09AM 0:00.02 /usr/libexec/gett
root 457 0.0 1.8 824 516 v0 Is+ Fri10AM 0:00.02 /usr/libexec/gett
root 6167 0.0 2.4 1108 712 ?? Ss 4:10AM 0:00.48 telnetd
jsloan 6168 0.0 0.9 504 252 p0 Is 4:10AM 0:00.09 -sh (sh)
root 6171 0.0 1.1 464 320 p0 S 4:10AM 0:00.14 -su (csh)
root 0 0.0 0.0 0 0 ?? DLs Fri09AM 0:00.17 (swapper)
root 6597 0.0 0.8 388 232 p0 R+ 11:15AM 0:00.00 ps aux
In this example, the first and last columns are the most interesting since they give the owners
and the processes, along with their arguments. In this example, the lines, and consequently
the arguments,have been truncated, but this is easily avoided. Running processes of interest
include portmap, inetd,sendmail, telnetd, and chargen.
There are a number of options available to ps, although they vary from implementation to
implementation. In this example, run under FreeBSD, the parameters used were -aux. This
combination shows all users' processes (-a), including those without controlling terminals (-
x), in considerable detail (-u). The options -ax will provide fewer details but show more of the
commandline arguments. Alternately, you can use the -w option to extend the displayed
information to 132 columns.
Top
When running, top gives a periodically updated listing of processes ranked in order of CPU
usage.
Netstat
One of the most useful and diverse utilities is netstat. This program reports the contents of
kernel data structures related to networking One use of netstat is to display the connections
and services available on a host. For example, this is
the output for the system we just looked at:
bsd4# netstat -a
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 bsd4.telnet 205.153.60.247.3473 TIME_WAIT
tcp 0 17458 bsd4.chargen sloan.1244 ESTABLISHED
tcp 0 0 *.chargen *.* LISTEN
tcp 0 0 *.discard *.* LISTEN
tcp 0 0 *.echo *.* LISTEN
tcp 0 0 *.time *.* LISTEN
tcp 0 0 *.daytime *.* LISTEN
tcp 0 0 *.finger *.* LISTEN
tcp 0 2 bsd4.telnet sloan.1082 ESTABLISHED
tcp 0 0 *.smtp *.* LISTEN
tcp 0 0 *.login *.* LISTEN
tcp 0 0 *.shell *.* LISTEN
tcp 0 0 *.telnet *.* LISTEN
tcp 0 0 *.ftp *.* LISTEN
tcp 0 0 *.sunrpc *.* LISTEN
udp 0 0 *.1075 *.*
udp 0 0 *.1074 *.*
udp 0 0 *.1073 *.*
udp 0 0 *.1072 *.*
udp 0 0 *.1071 *.*
udp 0 0 *.1070 *.*
udp 0 0 *.chargen *.*
udp 0 0 *.discard *.*
udp 0 0 *.echo *.*
udp 0 0 *.time *.*
udp 0 0 *.daytime *.*
udp 0 0 *.sunrpc *.*
udp 0 0 *.syslog *.*
Active UNIX domain sockets
Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr
c3378e80 dgram 0 0 0 c336efc0 0 c3378f80
c3378f80 dgram 0 0 0 c336efc0 0 c3378fc0
c3378fc0 dgram 0 0 0 c336efc0 0 0
c336efc0 dgram 0 0 c336db00 0 c3378e80 0 /var/run/log
The first column gives the protocol. The next two columns give the sizes of the send and
receive queues. These should be 0 or near 0. Otherwise, you may have a problem with that
particular service.The next two columns give the socket or IP address and port number for
each end of a connection.This socket pair uniquely identifies one connection. The socket is
presented in the form hostname.service. Finally, the state of the connection is given in the last
column for TCP services. This is blank for UDP since it is connectionless. The most common
states are ESTABLISHED for current connections, LISTEN for services awaiting a
connection, and TIME_WAIT for recently terminated connections.
lsof
lsof lists open files on a Unix system.
bsd2# lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
swapper 0 root cwd VDIR 116,131072 512 2 /
swapper 0 root rtd VDIR 116,131072 512 2 /
init 1 root cwd VDIR 116,131072 512 2 /
init 1 root rtd VDIR 116,131072 512 2 /
init 1 root txt VREG 116,131072 255940 157 /sbin/init
ifconfig
it is used to alter the configuration of the network interfaces
Connectivity Testing
It describes simple tests for individual network links and for end-to-end connectivity between
networked devices.
Most network environments can be summarized as workstations connected to a server via a hub
or switch.
Each machine that connects to the network must have a network card installed in it. Once the
network card is installed a standard network cable is connected from the PC to a hub or switch.
To connect a network cable into a network card or hub simply clip it into place. The plug can
only be inserted one way. To remove the cable you simply need to unclip it from the device and
remove it.
From the above picture you can see that each end of the network cable has a clip connector, much
like a standard phone connector. To remove the connector simply depress the clip and remove the
connector.
If you are unable to connect to the file server for any reason you need to determine the following :
1. Which workstations are affected ?
2. Has anything been changed that may affect the server or cabling ?
If all workstations on your network are unable to contact the server then you should start your
troubleshooting procedure at the server. If it is just a single workstation that is failing to connect
then you should start troubleshooting at that device.
Troubleshooting Network issues from the Server
When all workstations cannot connect to the server machine you should follow the steps below to
determine exactly what the cause of your problem is :
1. Check that your Server is powered on and not hung. You should be able to type at the
keyboard and move the mouse on the screen. If the screen is locked and you dont have
permissions to log on simply toggle the CAPS LOCK key to see whether the keyboard is
responding. If the server is hung and failing to respond it should be rebooted and network
connectivity checked again.
2. If the server appears to be functioning correctly, next check whether there is an active
connection between your server and the hub or switch. Start at the server, look at the rear
of the Server machine which should appear something like :
In the rear you will see a number of cords connecting the machine to things such as the power
point, the keyboard, monitor and the network
If we know that the server is operational then the fan next to the power supply should be
working. You check this from the noise of the fan or air flow through the fan. If this is all OK, you
will need to locate the position of the network card in the rear of the server. This can be found
at the end of a network cable and in our example machine appears like :
At the end of the network cable to the server you should find the network card as shown
above. Network cards should have at least one light to show whether they are in fact
connected to something. The above image shows that the network card has two lights and
we can therefore determine that it is fact connected to the network. If there are no lights
here then it is a good chance that your server is not connected the network.
3. If there no connectivity to your server the next place you have examine is the hub or
switch that the server is connected to. If you follow the other end of the network cable
from the server you should end up at a hub or switch that should appear something like :
Hub and switches allow networked computer to connect together and share information. There is
at least one switch or hub on every networked system.