You may connect to the internet: The case for open-resource testing in certification
Certification helps IT professionals stay on the leading edge of technology, but many certification exams are still behind the times. Despite innovations in test delivery, the structure and content of exams hasn’t evolved as rapidly as it could have — and should have.
Would open-resource testing improve IT certification? Exam developers Isaku Tateishi and Daniel Allen of IT training and certification provider TestOut Corporation tackle the question.
Advances in technology continue to reshape the way we learn and access information. Information that used to be stored in books or filing cabinets is now easily accessible from a computer with an internet connection. Professionals who once relied on bulky instruction manuals or resource guides to troubleshoot computer problems can now often find solutions through online documentation, message boards, blogs, and more. Most of us, in fact, are not likely to ever venture beyond the reach of the internet at any point in our work day.
Despite these changes in both education and professional practice, however, the rate of innovation in testing, including certification testing, seems to be several steps behind the pace of advancing technology (Williams & Wong, 2009). Whether for historical, economical, or other reasons, tests — especially standardized tests — haven’t changed much in terms of composition. To be sure, there have been advances in methods of test delivery in recent years. Computer adaptive testing (CAT) and computerized testing in general have seen widespread success.
Computers are used primarily (and unimaginatively), however, as a means to deliver the same kind of material students might have expected to find in a traditional pencil-and-paper test. Indeed, much of the technology being leveraged in testing seems to address only the end of delivering traditional tests via computers in order to reach a broader audience, automate scoring and data analysis, and be environmentally conscious by saving paper. All of these are admirable goals, but none of them truly leverages new technology to create a new kind of testing experience.
Why would we want a new kind of testing experience? As the number of resources available to professionals increases, the importance of memorization and the ability to work in a silo may be decreasing. Speaking of professionals in the business world, Williams and Wong state, “Rarely, if ever, are [professionals] required to go into a room, not talk to anyone and solve a work problem with their computers turned off” (2009). In many cases, however, this is exactly the situation we encounter in testing. Given this apparent disconnect between principle and practice, it’s time to consider whether an online, open-resource performance-based exam (OPE) might provide a more valid test of whether an examinee can complete real-world tasks.
Broadly speaking, examinations can be divided into two categories: closed-resource exams and open-resource exams. Closed-resource exams are traditional, proctored exams that require students to close any books or notes and answer a series of questions without access to any external reference materials. Open-resource exams include any exam that allows the use of external materials. Familiar examples might include open-book exams where students are allowed to consult a textbook during the exam, or open-note exams where students are allowed to consult their own notes during the exam. University students are likely to be familiar with both categories of tests, as some college professors, bucking the closed-resource tradition, prefer open-resource exams (Gharib, Phillips, & Mathew, 2012).
We can also separate exams in terms of whether they are administered online or in a more traditional offline format. Closed-resource exams are prevalent in both online and offline formats. Online, closed-resource exams are enabled through innovations in online testing that include technology like online proctoring services (frequently involving a webcam or streaming audio) or locked-down computers and/or internet browsers. Open-resource exams are not prevalent in the online format.
Another useful distinction to make is the difference between performance-based and knowledge-based exams. Knowledge-based exams tend to measure memorization and recall of facts. A knowledge-based exam is most likely to be composed of a series of multiple-choice or other selected response questions. Performance-based exams are meant to measure the completion of a specific task or set of tasks. Performance-based exams are more likely to contain open-ended or constructed response questions where examinees are required to produce a paper, presentation, graphic, or other artifact. These two different exam types are illustrated by many state driving tests. Prospective drivers are administered a multiple-choice knowledge-based exam, and are also evaluated in a performance-based test where the examinee demonstrates the ability to drive a vehicle in a realistic setting.
Advantages of Online, Open-Resource Performance-Based Exams
Authenticity — As discussed above, computer and internet use is inseparable from the lived experience of daily work for many of us. If the purpose of a certification exam is to determine whether candidates are capable of specific real-world tasks that they are likely to encounter in the work place, then an online open-resource exam is an authentic way of measuring such tasks.
Even well before modern computers and networks were around, researchers were concerned about the inauthentic nature of testing. As early as 1934, Stalnaker & Stalnaker were arguing that open-book examinations in the university setting might be superior to their closed-book counterparts. They suggested that closed-book exams emphasize the memorization of facts, and that open-book examinations are most appropriate in situations where the desired outcome is more than the recall of facts. Additionally, Stalnaker & Stalnaker note that open-book examinations offer “a more natural situation” and that they emphasize “the importance of knowing where to find information, how to evaluate it, and how to use it in problems” (1934). The Stalnakers conclude that there is a solid logical basis for the implementation of open-book examinations. These observations on the differences between open- and closed-book examinations in college reflect modern concerns in the testing and certification industry on the real-world relevance, or external validity, of an exam.
This key issue here is the concern over providing an examination with an authentic, real-world application — the “more natural situation” mentioned by the Stalnakers. The closed-resource examination presents a contrived experience. It has no analogue to any common experience one is likely to have in work or life in general. Imagine, for instance, being asked to not use any external resources while analyzing data, writing a report, or troubleshooting a problem at work. Rather, the content of daily work is largely made up of solving specific problems with all the necessary and available resources. This point has been made more recently by other researchers (e.g. Feller, 1994; Williams & Wong, 2009). As technology and proficiency for access and utilization of online resources continues to rise, available resources also increase. The result is that closed-resource examinations look less and less like any real-world experience any of us are likely to have outside of an examination site.
Reduced Test Anxiety — Another point worth bringing up is that evidence points to lower test taker anxiety in open-resource conditions compared to closed-resource conditions (Williams & Wong, 2009). Test anxiety is associated with poor performance on exams (Morris, Davis, & Hutchings, 1981). Despite not providing a big advantage in terms of exam scores, open-book exams are generally preferred by most test takers (Philips, 2006), and cause less anxiety compared to closed-book exams (Gharib, Philips, & Mathew, 2012; Theophilides & Dionysiou, 1996; Theophilides & Koutselini, 2000; Zoller & Ben-Chaim, 1988).
Cheating — An OPE might seem like an easy opportunity to cheat because of less restricted access to resources compared to a closed-resource exam. This is not necessarily the case, however, when an OPE is truly performance-based. To the extent that an open-resource exam is knowledge-based, it will become easier to look up answers and copy them verbatim (though we wouldn’t necessarily call this cheating). When, however, examinees are expected to apply knowledge and skills to solving a unique, authentic task, it will be difficult to simply look the answer up in a search engine (Williams & Wong, 2009).
While authentic tasks may control cheating due to looking up answers, the nature of responses to open-ended questions often makes them more susceptible to a different kind of cheating: plagiarism. Fortunately, technological advances have been made in plagiarism detection. Plagiarism detection services can check examinee work against web pages, journal articles and other student work (including a student’s own previous work).
It is still possible, however, that an examinee could hire a proxy to take the exam for them. There are several ways to mitigate the risk of hired proxy examinees. First, having an authentic and current task or tasks can reduce the pool of potential proxy examinees, and this effect is strengthened when the OPE has a shorter time limit. Second, because the artifacts produced are supposed to be unique each time, proxy examinees may be deterred by the lost economy and increased effort required to pass the exam. Put simply, OPEs aren’t as likely to provide a lucrative source of income for proxy examinees. Third, there are certainly ways to verify and monitor examinee identity during the examination period. Certainly OPEs can be proctored — either remotely or in person.
In addition to creating appropriate tasks, setting an appropriate timeline for an exam will help ensure students aren’t as able to cheat. If the exam has a 1-hour time limit, students are unlikely to be able to find loopholes to exploit compared to a 5-day time limit. Finally, OPEs should frequently switch out tasks and scenarios to keep content fresh.
Of course cheating is not unique to any particular kind of exam. Online testing in general is a relatively new enterprise plagued by relatively old problems. Before online testing, test proctors would have to guard against the possibility of examinees sharing answers or utilizing cheat sheets or other unauthorized materials during the administration of the exam. Other risks include plagiarism of essay or short answer questions, or third-party test takers.
In order to productively administer exams online, test administrators must utilize proctors, locked-down browsers, identity verification techniques, and plagiarism detection services, among other things. While more technologically advanced, these solutions are all roughly analogous to the actions of an exam administrator or proctor in an offline, closed-book setting. In other words, most of the concerns about cheating in an online environment are really just the same old concerns that we’ve had for years about cheating on tests more generally (and, therefore, should not be seen as a compelling reason not to use OPEs).
Difficulty — One concern that often comes up about open-resource exams is whether they are too easy. Research on the topic, however, suggests little real difference between closed- and open-resource exams. Gharib, Philips, & Mathew (2012) compared exam scores between university students on open-book and closed-book exams. They found a modest increase in exam scores in an open-book version of an exam compared to the closed-book version of the same exam. Other research has also found smaller gains in open-book exam scores (e.g. Brightwell, 2004; Krarup, Naeraa, & Olsen, 1974) or no difference at all between open- and closed-book scores (Kalish, 1958).
One possible explanation for these differences is that open-book assessments may tap into different skill sets (Brightwell, 2004). In the case of Gharib, Philips, & Mathew (2012), the authors suggest in an interview on NPR that the slight increase of grades in the open-book situation is not a concern. To quote one of the authors, “So a good student did well regardless of what type of test they were given. A poor student did poorly regardless of what type of test they were given. And that, I think, was a really interesting finding, that in a way, the type of exam really makes very little difference.”
Many important questions, however, remain unanswered. For example, the above research all deals with university course exams rather than certification exams. It is possible there are fundamental differences between these two types of exams (or the groups of people taking them) that would produce different results. Another concern is that the existing research on score differences deals primarily with the same type of exam being delivered under the two different conditions.
The premise behind an OPE, however, would necessarily differ from most knowledge-based exams not only in terms of the available resources at the time of examination, but also in terms of the content and process of taking the exam (as discussed in the authentic section above). So while there is some evidence to suggest that open-resource exams produce comparable scores to closed-resource exams, more research is needed particularly in the case of high-stakes and certification exams as well as performance-based exams.
Some evidence suggests college students prepare less for open-book exams compared to closed-book exams (Moore & Jensen, 2007). It is not clear, however, how this behavior would translate into certification and other high-stakes exams. Moreover, if exam publishers do due diligence to ensure that only qualified examinees pass exams, then prospective examinee study behavior is not a threat to exam utility. Of course examinees should be informed about the nature of examinations prior to administration, and should be offered direction on how best to prepare, but student preparation concerns do not necessarily translate into concerns about the validity of an exam.
At the same time, other research suggests that open-book exams can increase engagement and understanding (Eilertsen & Valdermo, 2000). It may be that because, as previously mentioned, open-book exams tap into different skill sets (Brightwell, 2004) there are additional differences in how and what kind of examinee engagement they encourage. This suggests that an exam publisher should consider the issue of open- versus closed-resource examinations vis-à-vis the intended purpose of an exam during the design phase.
Validity — In any case, the real question exam publishers should be asking is not whether an exam is too difficult or too easy, but whether the exam accurately reflects examinee skills, knowledge and/or abilities. This is the question of exam or test validity. The issue of test validity is a complicated issue. For instance, a difficult closed-resource exam may appear to be more rigorous and more valid in one way, but in reality does not closely align to the real-world application of knowledge, skills and/or understanding to solving a problem. In this sense of resembling real-world application, which is arguably the most important aspect of exam validity, an open-resource exam may actually have the advantage over its closed-resource counterparts. The desire for greater exam validity, which can be thought of as an alignment between an exam and a real-world skill or aptitude, is one of our primary motivations behind considering the concept of an OPE.
Potential Concerns about Online, Open-resource Performance-based Exams
Despite the advantages listed above, OPEs are not without downsides and will not be the best kind of exam to use in some situations. For example, in situations where the purpose of an exam is to measure the amount of knowledge or recall examinees have on a particular subject. Exam exposure is also a concern, as well as the cost of development.
Knowledge and Recall — As mentioned above, an OPE is not ideal for measuring knowledge and recall of facts. If your objective is to test whether examinees know the maximum speed of USB 3.0, you’ll be better off using a different kind of exam. The OPE approach would be to develop a real-world problem that required knowledge of USB versions to solve, but examinees would, of course, be able to refer to resources if they didn’t have the various speeds memorized. And there are benefits in measuring one’s knowledge and recall as well. It is a limitation of OPEs that they aren’t well suited for these kinds of measurement.
OPEs are also less ideal for covering a broad range of content in a single exam. Scenario-based problem solving can be time consuming, making it more problematic to cover the same breadth of knowledge as a comparable selected response test. A multiple-choice test, on the other hand, can cover a great deal of content in a shorter amount of time.
One way to compensate for these two limitations of the OPE approach would be to develop a hybrid certification exam which includes some real-world performance-based tasks and also some knowledge checks — roughly analogous to requiring both a written portion and a performance portion of a driving test.
Exposure — All tests must be concerned about another old problem — exam exposure. Exam exposure occurs when the content of a test is either partially or entirely compromised by being disclosed beforehand to examinees. This is sometimes done for the purpose of cheating, but also is sometimes done for the purpose of selling exam questions to potential examinees. While not fundamentally different from the problems of exposure in an offline exam, new technology does provide new challenges to exam exposure.
A tech-savvy and unscrupulous examinee might find a way to record the entirety of an exam, for example. Testing and certification associations need to be concerned about exam exposure for the simple reason that exams are expensive to produce and costly to maintain. Exam exposure can be mitigated through proctoring services, multiple scenarios or forms of an exam, software solutions such as locked browsers, and web crawling services which scour online sources for compromised exam content.
Development Cost — It seems likely that OPEs will be more costly to develop than traditional exams. There are several reasons for this. The development of many real-world scenarios can be costly. New scenarios will have to be added in order to keep them fresh and updated. A scenario involving floppy disks once qualified as a common real-world scenario, but would likely be unsuitable in the present day. Likewise, scenarios have to be up-to-date to maintain exam validity. Additionally, OPEs are newer and standard ways of creating and scoring them will require a learning curve. These factors make it likely that OPEs will be more expensive to develop than many traditional selected response exams.
As is often the case with innovation, new technology changes how tasks are completed in workplace. There are compelling reasons to consider an OPE type of examination despite the legitimate concerns. OPEs provide a situation where examinees are tested on authentic tasks in an environment that more closely match a real-world situation. OPEs are also harder to cheat and demand more from test-takers while reducing test anxiety.
It is important to consider the purpose of an exam, however, before deciding whether to use an OPE or a different type of exam. In situations where real-world skills are the main objective, an OPE may be an excellent option. If you are more interested in asking examinees to demonstrate memorization or knowledge of facts, then you’ll be better off with a selected-response type of exam. We are often interested in both knowledge and application, of course, in which case it might make sense to try out a hybrid approach by offering some of both types of exam.
In any case, OPEs present one way to address the mismatch between much of the testing world and the lived experience of doing real work. As we have shown above, there is no reason an OPE need be considered less capable than a selected-response exam. It is our hope that future research in this area can look at the effects of an OPE in high-stakes or certification exams rather than only in (comparably) lower-stakes university course exams.
Brightwell, R., Daniel, J. H., & Stewart, A. (2004). Evaluation: Is an open book examination easier? Bioscience Education E-Journal, 3, retrieved from http://journals.heacademy.ac.uk/doi/full/10.3108/beej.2004.03000004.
Eilertsen, T. V., & Valdermo, O. (2000). Open-book assessment: A contribution to improved learning? Studies in Educational Evaluation, 26, 91-103.
Feller, M. (1994). Open-book testing and education for the future. Studies in Education Evaluation, 20, 235-238.
Gharib A, Phillips W, Mathew N. Cheat Sheet or Open-Book? A Comparison of the Effects of Exam Types on Performance, Retention, and Anxiety. Psychology Research. 2012;2(8):469–478.
Kalish, R. A. (1958). An experimental evaluation of the open-book examination. Journal of Educational Psychology, 49, 200-204.
Krarup, N., Naeraa, N., & Olsen, C. (1974). Open-book tests in a university course. Higher Education, 3, 157-164.
Moore, R., & Jensen, P. A. (2007). Do open-book exams impede long-term learning in introductory biology courses? Journal of College Science Teaching, 36(7), 46-49.
Morris, L. W., Davis, M. A., & Hutchings, C. H. (1981). Cognitive and emotional components of anxiety: Literature review and a revised worry-emotionality scale. Journal of Educational Psychology, 73, 541-555.
Phillips, G. (2006). Using open-book tests to strengthen the study skills of community college biology students. Journal of Adolescent and Adult Literacy, 49, 574-582.
Stalnaker, J. M., & Stalnaker, R. C. (1934). Open-book examinations. The Journal of Higher Education, 5, 117-120.
Theophilides, C., & Dionysiou, O. (1996). The major functions of the open-book examination at the university level: A factor analytic study. Studies in Educational Evaluation, 22, 157-170.
Theophilides, C., & Koutselini, M. (2000). Study behavior in the closed-book and the open-book examination: A comparative analysis. Educational Research and Evaluation, 6, 379-393.
Williams, J. B., & Wong, A. (2009). The efficacy of final examinations: A comparative study of closed-book, invigilated exams and open-book open-Web exams. British Journal of Educational Technology, 40, 227-236.
Zoller, U., & Ben-Chaim, D. (1988). Interaction between examination type, anxiety state, and academic achievement in college science: An action-oriented research. Journal of Research in Science Teaching, 26, 65-77.