A guide to troubleshooting theory from a CompTIA A+ perspective
In addition to the domain Software Troubleshooting on the upcoming 220-902 certification exam (of which it constitutes 24 percent), topic 5.5 asks that you explain the troubleshooting theory. This requires you to know the six steps of the theory as given by CompTIA and always follow them in order (taking into consideration corporate policies, procedures and impacts):
1. Identify the problem: Question the user and identify user changes to computer and perform backups before making changes
2. Establish a theory of probable cause (question the obvious): If necessary, conduct external or internal research based on symptoms
3. Test the theory to determine cause: Once theory is confirmed determine next steps to resolve problem, If theory is not confirmed re-establish new theory or escalate
4. Establish a plan: Make a plan of action to resolve the problem and implement the solution.
5. Determine system status: Verify full system functionality and if applicable implement preventive measures.
6. Make a record: Document findings, actions and outcomes.
Since this is a key topic that bleeds over from one domain to another (and truly, from one test to another), it is important to walk through some of the main subject matter you should know as you study.
Identifying the Problem
While this may seem obvious, it can’t be overlooked: If you can’t define the problem, you can’t begin to solve it. Sometimes problems are relatively straightforward, but other times they’re just a symptom of a bigger issue. For example, if a user isn’t able to connect to the Internet from their computer, it could indeed be an issue with their system. But if other users are having similar problems, then the first user’s difficulties might just be one example of the real problem. Problems in computer systems generally occur in one (or more) of four areas, each of which is in turn made up of many smaller pieces:
● A collection of hardware pieces integrated into a working system. As you know, the hardware can be quite complex, what with motherboards, hard drives, video cards, and so on.
● An operating system, which in turn is dependent on the hardware.
● An application or software program that is supposed to do something. Programs such as Microsoft Word and Excel are bundled with a great many features.
● A computer user, ready to take the computer system to its limits (and occasionally beyond). A technician can often forget that the user is a very complex and important part of the puzzle.
Many times you can define the problem by asking questions of the user. One of the keys to working with your users or customers is to ensure, much like a medical professional, that you have good bedside manner. Most people are not as technically competent as you, and when something goes wrong they become confused or even fearful that they’ll take the blame. Assure them that you’re just trying to fix the problem but that they can probably help because they know what went on before you got there. It’s important to instill trust with your customer. Believe what they are saying, but also believe that they might not tell you everything right away. It’s not that they’re necessarily lying; they just might not know what’s important to tell.
Help clarify things by having the customer show you what the problem is. One of the best methods can be to ask them to show you what “not working” looks like. That way, you see the conditions and methods under which the problem occurs. The problem may be a simple matter of an improper method. The user may be performing an operation incorrectly or performing the operation in the wrong order. During this step, you have the opportunity to observe how the problem occurs, so pay attention.
Here are a few questions to ask the user to aid in determining what the problem is:
1. Can you show me the problem?
2. How often does this happen?
3. Has any new hardware or software been installed recently?
4. Has the computer recently been moved?
5. Has someone who normally doesn’t use the computer recently used it?
6. Have any other changes been made to the computer recently?
Be careful of how you ask questions so you don’t appear accusatory. You can’t assume that the user did something to mess up the computer. Then again, you also can’t assume that they don’t know anything about why it’s not working. The key is to find out everything you can that might be related to the problem. Document exactly what works and what doesn’t and, if you can, why.
Establishing a Theory
Once you have determined what the problem is, you need to develop a theory as to why it is happening. No video? It could be something to do with the monitor or the video card. Can’t get to your favorite website? Is it that site? Is it your network card, the cable, your IP address, DNS server settings, or something else? Once you have defined the problem, establishing a theory about the cause of the problem—what is wrong—helps you develop possible solutions to the problem.
Theories can either state what can be true or what can’t be true. However you choose to approach your theory generation, it’s usually helpful to take a mental inventory to see what is possible and what’s not. Start eliminating possibilities and eventually the only thing that can be wrong is what’s left. This type of approach works well when it’s an ambiguous problem; start broad and narrow your scope. For example, if the hard drive won’t read, there is likely one of three culprits: the drive itself, the cable it’s on, or the connector on the motherboard. Try plugging the drive into the other connector or using a different cable. Narrow down the options.
Once you have isolated the problem, slowly rebuild the system to see if the problem comes back (or goes away). This helps you identify what is really causing the problem and determine if there are other factors affecting the situation. For example, there are times when memory problems have been fixed by switching the slot that the memory chips are in.
Sometimes you can figure out what’s not working, but you have no idea why or what you can do to fix it. That’s okay. In situations like those, it can be best to turn to documentation. The service manuals are your instructions for troubleshooting and service information. Virtually every computer and peripheral made today has service documentation on the company’s website, or on a DVD, or even in a paper manual. Don’t be afraid to use them!
If you’re lucky enough to have experienced, knowledgeable, and friendly co-workers, be open to asking for help if you get stuck on a problem.
Test the Theory
You’ve eliminated possibilities and developed a theory as to what the problem is. Your theory may be pretty specific, such as “the power cable is fried,” or it may be a bit more general, like “the hard drive isn’t working” or “there’s a connectivity problem.” No matter your theory, now is the time to start testing solutions. Again, if you’re not sure where to begin to find a solution, the manufacturer’s website is a good place to start!
This step is the one that even experienced technicians overlook. Often, computer problems are the result of something simple. Technicians overlook these problems because they’re so simple that the technicians assume they couldn’t be the problem. Here are some examples of simple problems:
Is it plugged in? And plugged in at both ends? Cables must be plugged in at both ends to function correctly. Cables can easily be tripped over and inadvertently pulled from their sockets.
Is it turned on? This one seems the most obvious, but everyone has fallen victim to it at one point or another. Computers and their peripherals must be turned on to function. Most have power switches with LEDs that glow when the power is turned on.
Is the system ready? Computers must be ready before they can be used. Ready means the system is ready to accept commands from the user. An indication that a computer is ready is when the operating system screens come up and the computer presents you with a menu or a command prompt. If that computer uses a graphical interface, the computer is ready when the mouse pointer appears. Printers are ready when the Online or Ready light on the front panel is lit.
Do the chips and cables need to be reseated? You can solve some of the strangest problems (random hang-ups or errors) by opening the case and pressing down on each socketed chip (known as reseating). This remedies the chip-creep problem, which happens when computers heat up and cool down repeatedly as a result of being turned on and off, causing some components to begin to move out of their sockets. In addition, you should reseat any cables to make sure they’re making good contact.
Is it user error? User error is common but preventable. If a user can’t perform some very common computer task, such as printing or saving a file, the problem is likely due to user error. As soon as you hear of a problem like this, you should begin asking questions to determine if the solution is as simple as teaching the user the correct procedure. A good question to ask is, “Were you ever able to perform that task?” If the answer is no, it means they are probably doing the procedure wrong. If they answer yes, you must ask additional questions to get at the root of the problem.
If you suspect user error, tread carefully in regard to your line of questioning to avoid making the user feel defensive. User errors provide an opportunity to teach the users the right way to do things. Again, what you say matters. Offer a “different” or “another” way of doing things instead of the “right” way.
It’s amazing how often a simple computer restart can solve a problem. Restarting the computer clears the memory and starts the computer with a clean slate. If restarting doesn’t work, try powering down the system completely and then powering it up again (rebooting). More often than not, that will solve the problem.
Establish a Plan of Action
If your fix worked, then you’re brilliant! If not, then you need to reevaluate and look for the next option. After testing solutions, your plan of action may take one of three paths:
1. If the first fix didn’t work, try something else.
2. If needed, implement the fix on other computers.
3. If everything is working, document the solution.
When evaluating your results and looking for that golden “next step,” don’t forget other resources you might have available. Use the Internet to look at the manufacturer’s website. Read the manual. Talk to your friend who knows everything about obscure hardware (or arcane versions of Windows). When fixing problems, two heads can be better than one.
If the problem was isolated to one computer, this step doesn’t apply. But some problems you deal with may affect an entire group of computers. For example, perhaps some configuration information was entered incorrectly into the DHCP server, giving everyone the wrong DNS server address. The DHCP server is now fixed, but all of the clients need to renew their IP addresses.
Once everything is working, you’ll need to document what happened and how you fixed it. If the problem looks to be long and complex, taking copious notes as you’re trying to fix it. It will help you remember what you’ve already tried and what didn’t work. We’ll discuss documenting in more depth in the “Documenting the Work” step just a bit later.
After fixing the system, or all of the systems, affected by the problem, go back and verify full functionality. For example, if the users couldn’t get to any network resources, check to make sure they can get to the Internet as well as internal resources.
Some solutions may actually cause another problem on the system. For example, if you update software or drivers, you may inadvertently cause another application to have problems. There’s obviously no way you can or should test all applications on a computer after applying a fix, but know that these types of problems can occur. Just make sure that what you’ve fixed works and that there aren’t any obvious signs of something else not working all of a sudden.
Another important thing to do at this time is to implement preventive measures, if possible. If it was a user error, ensure that the user understands ways to accomplish the task that don’t cause the error. If a cable melted because it was too close to someone’s space heater under their desk, resolve the issue. If the computer overheated because there was an inch of dust clogging the fan…you get the idea.
Document the Work
Lots of people can fix problems. But can you remember what you did when you fixed a problem a month ago? Maybe. Can one of your co-workers remember something you did to fix the same problem on that machine a month ago? Unlikely. Always document your work so that you or someone else can learn from the experience. Good documentation of past troubleshooting can save hours of stress in the future. While documentation can take a few different forms, but the two most common are personal and system based.
It is highly recommended that technicians always carry a personal notebook and take notes. The type of notebook doesn’t matter—use whatever works best for you. The notebook can be a lifesaver, especially when you’re new to a job. Write down the problem, what you tried, and the solution. The next time you run across the same or a similar problem, you’ll have a better idea of what to try. Eventually you’ll find yourself less and less reliant on it, but it’s incredibly handy to have!
System-based documentation is useful to both you and your co-workers. Many facilities have server logs of one type or another, conveniently located close to the machine. If someone makes a fix or a change, it gets noted in the log. If there’s a problem, it’s noted in the log. It’s critical to have a log for a few reasons. One, if you weren’t there the first time it was fixed, you might not have an idea of what to try and it could take you a long time using trial and error. Two, if you begin to see a repeated pattern of problems, you can make a permanent intervention before the system completely dies.
There are many different forms of system-based documentation. Again, the type of log doesn’t matter as long as you use it! Often it’s a notebook or a binder next to the system or on a nearby shelf. If you have a rack, you can mount something on the side to hold a binder or notebook. For client computers, one way is to tape an index card to the top or side of the power supply (don’t cover any vents!) so if a tech has to go inside the case, they can see if anyone else has been in there to fix something too. In larger environments, there is often an electronic knowledge base or incident repository available for use; it is just as important to contribute to these systems as it is to use them to help diagnose problems.