Networking Professional’s Troubleshooting Toolkit
As a network administrator, you have a simple job description: keep the network running. And while it’s a simple job description, it’s not a simple job, since network problems can be due to anything from packet collisions to bad cables to gremlins in the system. But good admins know how to troubleshoot their networks with ease, not to mention precision—it’s a skill they’ve gained through years of battlefield training.
Of course, it would be nice to skip the battle and get right to the training, which is why we’ve assembled this toolkit for you. We’ll run through some common network problems and how to go about solving them, then look at the tools—software and hardware alike—that can help you keep your network in prime condition.
Have a System
If a printer is down or a server is slow, there could be 10,000 reasons for the problem—and the best way to find the one that’s causing you trouble is to follow a system. Most experts suggest you test for the problem from the bottom up—that is, by going through all rungs of the OSI (Open Systems Interconnect) model in a methodical manner.
If you’re not familiar with OSI, or if you simply need a refresher, here are the seven layers:
- Layer 1: Physical. This layer includes all your physical equipment, such as your Ethernet cables.
- Layer 2: Data Link. This is the layer where packets are encoded and decoded into bits.
- Layer 3: Network. This is where all the switching, routing, forwarding, addressing, sequencing and error handling occur, and where logical paths (known as circuits) are mapped out from node to node.
- Layer 4: Transport. This is the layer that handles flow control and end-to-end error recovery.
- Layer 5: Session. As the name suggests, this layer sets up and terminates sessions between applications.
- Layer 6: Presentation. This is the layer that translates data from a network format to an application format and back again, putting it into language that networks and applications understand. Hence, this layer formats and encrypts data, and ensures compatibility.
- Layer 7: Application. This is where Outlook, Word and all the other apps reside—the layer where end users interact with the system. It’s also the layer where user authentication takes place.
In a nutshell, traffic flows down the model from an application to the physical layer, then across the network. When it arrives at the receiver’s physical layer, it moves back up the model to the receiver’s application. Then, of course, the poles reverse, and the receiver becomes the sender, sending a response back down the layers and across the network, and so on.
How does this relate to finding and fixing problems? Simple. Just troubleshoot the problem from layer to layer, starting at the bottom. If there’s a problem with the physical layer—such a problem with a NIC or an Ethernet cable—then data is blocked from the get-go, and it’s unlikely that you’ll have to look for problems on layers above this one. On the other hand, if the physical layer is intact, then move on to the data-link layer. (For instance, at this layer you might check for a duplicate MAC address on an Ethernet switch.) If there’s no problem there, keep moving up the layers until you find the problem, progressing methodically through the model. If you follow an exact system every time, no matter what the problem you face, you’re likely to find it—and thus fix it—faster.
There are certain problems that lie at the bottom of most network failures—problems so common that we’ve assembled them here, so you can make them a part of your checklist:
- Cable Problems: All too often a cable is cut, shorted or simply faulty. If you suspect a bad cable is the cause of your problem, keep a wire tester on hand. Popular models include Fluke Networks’ MicroScanner Pro, ExTech’s CT100 and Psiber’s CT50 CableTool Multifunction Cable Meter. All of them let you ensure that a cable is physically sound.
- Segment Issues: Try to keep the number of machines on each network segment to as small a number as possible, say, 20. Add more and you’re inviting problems by overloading the segment. (Remember that using a switch instead of a hub cuts the size of each segment.)
- Connectivity Problems: If a PC or other device can’t see a server, check the hubs, routers and switches for configuration issues. Often a connectivity problem resides there. If you’re using a large number of hubs, consider replacing them with switches. They’re inexpensive and reduce network load.
- Network Collisions: When packets collide, problems occur. Sometimes it’s just the result of bad network planning. Sometimes it’s a faulty NIC, or one that won’t stop transmitting (known as a “jabbering card”).
- DNS Issues: DNS (domain name system) problems lie at the heart of many network issues. You may, for instance, have two machines with the same IP address.
- Speed: If you use a backbone to connect segments, how fast is it? You may need to upgrade to 1 Gbps. And if you’re using an older network, upgrade your users to 100 Mbps instead of 10 Mbps. They’ll thank you for it.
A bad server can bring your network to a halt, and you should know some of the common problems that cause it.
First, of course, is that old saw: too little memory. These days, memory is cheap, and you should splurge on 1 or even 2 GB for a server with moderate load. (Servers with heavy load can use as much memory as you can afford.)
Next up is the hard disk (or disks). For mission-critical apps, don’t bet the farm on IDE drives. They’re simply not meant for speed. Switch to SCSI and RAID arrays for best performance.
Don’t forget to check your server’s virtual memory. Servers rarely have enough RAM and write data to the hard drive to compensate for it, storing the data in a paging file. You can set the size and location of that file as you like. Often, it works best at one to two times as large as your physical RAM. And it’s wise to put it as close to the front of the disk as possible, to reduce seek times.
Last, defrag. Too few people do it, and it can increase disk performance by up to 10 percent (all the more on servers that read and write to disks all day long, and are thus more likely to buckle under the weight of fragmented data).
There’s one network tool that’s so useful—and common—that we’ll give it special coverage here. Ping stands for Packet InterNet Groper. It’s a standard feature of many operating systems (including Windows), and if for some reason you don’t have it, you can download a ping utility from a dozen spots on the Internet.
Ping lets you check network connectivity. It sends a network packet from one device—such as a desktop PC—to another, and waits for the reply. (In that way it’s a lot like sonar—whales and dolphins have been pinging each other for centuries.) If your ping gets a quick reply, then it’s a safe bet that your network is in good shape between the device you’re using and the device you pinged.
As a basic rule, start pinging devices near the one you’re using, then work your way out in ever-increasing circles. When a ping fails, you’ve begun to isolate the problem—or rather, to isolate its location, which is the first step in fixing the issue.
What’s more, the kind of message you get from a failed ping can give you a clue about the problem at hand. If, for instance, your ping software returns a message that reads, “No answer from [destination],” then your pinged device is not responding, but your system at least knows how to reach it. On the other hand, if you see a message that reads, “[Destination] is unreachable,” you may have a router issue that prevents your machine from even knowing where to send packets in the first plac