Show Me the Data
It seems that I have always been a skeptic. I don’t automatically believe what people say, whether those people are parents, siblings, friends, bosses, employees, whomever. I might nod my head and emit non-committal noises such as “mmm-mmm,” but down deep I need evidence. At some risk, let me be a bit more specific.
My wife believes in a lot of strange things: magnets that can heal or reduce pain, or her “feelings.” She might have a feeling, for example, that she left the stove on after we had left for vacation and were more than 100 miles away. There is, of course, no evidence that her “feelings” have ever been right, except for the occasional random coincidence. Fortunately she is good-natured about my skepticism and has learned to live with it.
But what does this have to do with certification tests? Just that like my wife’s “feelings,” in order to have value a good test must be supported by evidence, real quantitative stuff. Studies, results, field tests or beta tests, standardized procedures, documentation—those sorts of things. It’s not enough to believe that a test is a good one and that it correctly determines the competence or incompetence of a candidate.
Each question has to be subjected to a beta test or field test of some sort, under actual certification conditions. With many people taking the beta test, you get data about each question that supports its quality or lack thereof. Then you can make objective decisions about which questions should be on the test and which shouldn’t. No guesswork or “feelings” are needed.
The test itself should not be released or published without calculating reliability and validity coefficients—actual values produced by following specific procedures. The studies that produce these values are not hard to do, although they take time and cost a bit.
My guess is that you, as certification candidates and test-takers, have never seen much documentation on the quality of a test. You have seen reviews of tests, written by those who have taken them, but not much from the programs themselves. I’d like to see something like this written about each test when it is published or updated:
Certified Web Enthusiast Test
240 questions were field-tested by 84 program candidates. The statistical analysis of the field test identified 142 of the questions as very good to excellent. These were divided into two equivalent forms of 71 questions. The forms were balanced for overall difficulty and content coverage.
A reliability value for the test of .93 was calculated by re-scoring each field test participant on the two forms and correlating the scores. A score of .93 (out of a possible score of 1.0) is very high and indicates a highly reliable test.
Validity of the test was established by correlating the scores with an independent measure of the competence of the field-test participants, in this case, a self-report survey on experience and competence. The correlation between the test score and the survey score was .79, indicating people scoring high on the test also rate themselves as more competent, and that people scoring low on the test rate themselves as less competent.
Finally, an Angoff procedure using certification candidates who are experts in the job being evaluated produced a cut score of 76 percent for the test. This means that to be considered minimally competent in the skills required, you need to answer correctly more than 76 percent of the questions to pass. A more detailed description and results of the Angoff procedure can be found on the program Web site.
There are probably better ways to write such descriptions so that they are more complete and meaningful to an audience unfamiliar with psychometrics, but you get the idea. Documenting the quality of a test and communicating this to stakeholders, including the certification candidates, is a major standard that must be followed by all programs. But just following the rules is not the best reason to document the quality of the tests.
There seems to be doubt among test-takers about the quality of the tests produced, and sometimes there are concerns about the motives of the programs. This can lead to justification for inappropriate activities, such as sharing test questions and supporting the existence of brain dumps. I have a “feeling” that establishing the quality of each exam and publishing that information routinely will go a long way toward creating a stronger bond between programs and participants.
David Foster, Ph.D., is a member of the International Test Commission and sits on several measurement industry boards.