Psychometrics 101: Setting a Pass-Fail Score
Every once in a while, I get a little more technical than I should in this column. I get the feeling (perhaps mistakenly) that you want to know more about how tests work, that you hunger for that knowledge, even. It’s with that firm impression that I continue.
Certification tests are created for one purpose only, and that is to determine if the person has enough knowledge or skill to perform in a particular job role. They aren’t designed to diagnose your strengths or weaknesses, or to decide if you should take a course or even try to get certified. They are simply used to determine if you should be certified or not. When you take the test, the score is less important than whether you pass or fail. In fact, you might not even see your score, only the pass-fail decision.
There are basically two ways to set pass-fail scores, also called cut scores, and I’ll go into them in a minute. Let’s look at the logic and difficulty of the task first. The problem is a difficult one, because the certification program needs to find a single score that would serve to separate individuals who are competent from those who are not.
I’m not sure what your experience is in high-school athletics, but imagine a coach trying to select 30 players for a team from 100 who want to participate. The coach has to find a way to separate the 30 from the other 70. Usually, there is not much difference in skill between the 30th chosen and the 31st, although there should be a big difference between the average skills of the 30th versus the average skills of the 70th. An important note is that in the 30 chosen, there are probably a couple of athletes who will not perform as expected and shouldn’t be on the team. Likewise, there are those in the group of 70 who would have done well on the team and should have been selected.
The first of these is called a “false positive,” which refers to selecting someone incorrectly. The other is called a “false negative,” which means not selecting someone who should have been chosen. Obviously, the coach would like to make neither of these errors, but no procedure is perfect at the time it is used. The best a coach could hope for is that only a few of these errors are made.
It’s the same in certification programs. A program manager would love to certify only competent individuals and fail only those who are incompetent. Unfortunately, like the coach, this isn’t completely possible. Even a great test isn’t perfect. Any score set to separate those passing from those failing will make some of those mistakes. One of the goals of setting the pass-fail score is to minimize the false-positive and false-negative decisions.
While there are many specific ways to determine the pass-fail score, they fall into two classes. One requires human judgment of each question. The other is more empirical, looking at testing data from certification candidates. I’ll describe each instance in more detail.
In the first instance, a panel of subject-matter experts–people like you–are brought together to review the test questions. They read a description of a competent, certifiable person and then review each question one at a time, usually as a group. As they review a question, they estimate the chance that a competent person would answer it correctly. When they’ve done this for all of the questions, a formula is applied that spits out an overall test score that would separate those who are competent (and who should pass) from those who are not competent enough (and who should fail).
The second method does not use judgments like this, but instead gives a number of certification candidates a test requiring them to answer each question as if it were the real certification test. They also give the candidates a survey asking them to rate their knowledge and ability in several key areas covered by the test. The survey also may be sent to people who are familiar with the IT professionals’ skills, such as fellow employees or supervisors. From the surveys, it is possible to separate those who are competent from those who are not. By reviewing the scores of those two groups, it is then possible to find the score that best separates them.
While I prefer the latter method, there are pros and cons to both ways of setting a pass-fail score. Using either is much more preferable to employing more casual methods. When I first started in this business, pass-fail scores were set at 70 percent for every test because the program manager felt that “70 percent was a pretty good score.” We’ve come a long way in IT certification since then, applying a little psychometric science to a difficult problem.
David Foster, Ph.D., is president of Caveon (www.caveon.com) and is a member of the International Test Commission, as well as several measurement industry boards.