Building a Certification Program, Step 9: Five key questions about assessment design and development (Part 1)
NOTE: This is an ongoing series. To view all articles in the series, click here.
In this installment we will look at the processes involved in creating targeted certification assessments. Having shared with you a proven method for preparing a learning and assessment blueprint, which will provide you with a solid foundation for a certification program, our next step will be to assimilate all the data from our earlier analyses, along with the task analysis and blueprint, and begin designing and developing a learning assessment that follows our blueprint and that leads to certification.
How well the assessment developers communicate with the course developers and follow the assessment blueprint will determine whether the participants rate the certification program as favorable and fair or unfavorable and quite unfair. Five of the issues tackled during this step (presented here with important sub-questions) include:
1. What type of assessment are you designing: a formative assessment or a summative assessment?
A. Will it be a quiz, a test, or an exam?
B. Will it be a standardized exam, a non-certification-based exam, or a true certification exam?
C. Will it be a criterion referenced test or a norm referenced test?
D. What type of assessment are you creating: a high stakes assessment, or a low stakes assessment?
2. How many tests are needed to certify a role?
A. What is the number of forms per exam?
B. What is the number of items per exam?
3. If it is a true certification exam, then what is the process for designing and developing this instrument?
A. How will you determine validity and the reliability of the test(s)?
4. What will you communicate to you test authors regarding what a good assessment item should look like?
5. How will we score the exam?
A. What is a rubric and how do you create a rubric?
B. Who will design the scoring rubric for the test(s)?
C. Will you incorporate oral-exam format, role-playing format, or hands-on performance-based questions? And if you do, then how will you score these non-traditional formats?
D. How will you determine the final cut score (passing score) for cognitive exams?
Those who are called instructional designers (IDs) or instructional system designers (ISDs) are the professionals who can make a certification program successful. Most of what we have discussed in previous installments of this series falls under the purview of the ISD, including the occupational analysis, the task analysis, and most importantly the blueprinting process.
If you are fortune enough to have access to ISDs who have been trained in assessment development, then you are a step ahead of many. If not, then these thoughts I will share with you will help you train your ISDs. If you do not have a team of ISDs, then what I will share will help you train your SMEs (subject matter experts), or help you orient your test/exam developers to a model that I follow for certification exam development.
From here, we’ll individually address our key questions (and sub-questions) regarding assessment design.
1) What type of evaluation instrument are you designing? Is it formative or summative?
When it comes to designing an evaluation instrument (an exam) one of the first questions a developer needs to know is whether the instrument, they are being tasked with developing is going to serve as a formative or a summative instrument.
A formative evaluation instrument is one that is administered prior to or during an instructional unit in order to determine how learners are progressing. It is also used to determine whether the instructional intervention is meeting its goals and objectives.
A summative evaluation instrument is typically administered at the end of an instructional intervention in order to evaluate a learner’s overall success in adding to their knowledge, skills, or attitudes as a result of the learning.
Many professional educators believe that good instructional evaluation contains both formative and summative instruments. When it comes to certification, formative instruments are often used during instruction to help learners evaluate their knowledge gaps, while a summative instrument is generally considered the “comprehensive” certification assessment.
Hence, if you are being asked to design a certification assessment, nine times out of 10 you are being asked to design a “comprehensive” summative evaluation instrument.
1-A) Will it be a quiz, a test, or an exam?
A quiz is typically a short evaluation instrument, and it usually does not have as much impact on one’s grade as a test has. Instructors use quizzes to check up on how well a learner is understanding a limited amount of content. While a test might have 50 or 100 questions on it, a quiz might instead have just 5 or 10. Quizzes are often used in certification classes to prepare a learner for their vendor-based certification exercise.
Tests are the standard evaluation technique used to determine a learner’s final academic standing in both high school and college classes, as well as for certification. Unlike with quizzes, test scores are almost certain to be used in determining one’s grade in a class. In fact, some instructors use test grades exclusively to determine a learner’s grade.
Whereas a quiz usually gauges one’s understanding of short sections of a unit, a test normally covers a longer chunk of the course: a whole unit or several chapters. In the case of certification, a test might cover several courses covering a lot of content.
For this reason, tests are commonly longer than quizzes. Again, the makeup of the test will vary according to the instructor, and or the vendor, but it’s common for a test to consist of several different types of questions.
Many instructors use “test” and “exam” interchangeably, but for students, the term exam typically refers to either a midterm or final exam. It’s the granddaddy of tests in both secondary and post-secondary education. You can expect that an exam will be long — long enough that most instructors allow hours rather than minutes for it to be taken.
Exams have to be long. Most exams will cover all the material that a given course has covered up to the date of the exam. When used in certification, when a credential is at stake, an exam may cover multiple courses plus a host of secondary content. Although some instructors don’t use exams at all, those who do frequently make them a large part of students’ final grade for the term.
For example, some instructors consider the exam grade as one-third to one-half of your total score for the class. Most students gripe about the weighting of academic exams, but when it comes to a certification exam, passing can be the difference between the exam candidates getting or not getting a job.
In 1995, that was my experience when I was interviewing for a network admin job at a local college. I was passed over in favor of someone else who had a credential, but had far less practical experience than I did. It was at that point that I learned the value of certification. It was also at that point that I vowed to earn every possible certification I could that would keep me in the field.
Right now, that number stands at well over 50 certs earned. I’ve also taken too many vendor-authorized exams to count. Be that as it may, the value of the certs for me is that they have kept me active in a growing field — despite being past my prime with, I have more than 20 years’ experience in IT certification.
Based on these definitions, as well as on my experience, a good test or exam in the academic world will accurately reflect the published course objectives. The general rule of thumb for certification exams and tests is that the good ones accurately reflect the knowledge and skills required to succeed in a given job role. (Shrock and Coscarelli, 2007)
1-B) Will it be a standardized exam, a non-certification-based exam, or a true certification exam?
Since we already know that what we are aiming at is the design of a good certification exam, we have a sense of what should be part of that exam. Namely, we should be testing on what a person who wants to be certified in the role must know and or do to succeed in that role or occupation.
This can be as a nurse, a palliative care chaplain, a physician’s assistant, a fire marshal, a project manager or a program manager, a network engineer or a network admin, a cybersecurity professional, a teacher or instructor — the list is endless.
We must ensure, however, that our exams are asking relevant questions of those who are seeking our “certified” credential. If our exams are just asking general knowledge questions, then how would we know if a person was competent to act in the given role?
This is where our earlier work comes into play. Steps like the DACUM analysis and the task analysis and, most importantly, the blueprint will guide us in the exam design. Completion of those prior steps will help us to target the occupational competencies that we want to test candidates on.
If the competencies involved in a role or occupation were not our focus, then we could put together whatever we deemed important. Many test takers have experienced such exam design in their earlier education. Such exams, as we can all attest to, are not certification exams. Rather they are what some have termed mongrel exams (Shrock and Coscarelli, 2007).
So-called mongrel exams (or tests) are typically put together by a teacher who has never been trained in the art and science of designing a test, or of how to ask relevant questions.
They were also not standardized tests/exams like the SAT, GRE, MCAT, or GMAT. Standardized tests are not task-focused, but rather are designed to assess one’s level of aptitude within a given population of test takers.
1-C) Will it be a criterion referenced test or a norm referenced test?
A norm referenced test is a test like the SAT, GRE, MCAT, or GMAT that assesses the performance of test takers in relationship to each other, with the anticipated distribution of scores, to demonstrate that each test taker has a unique score.
A criterion referenced test is one that assesses a test taker’s performance based on a set of well-defined and published competencies or objectives.
A learner, therefore, who is either attempting to get into a profession, or is trying to maintain their status in a profession, will do so by taking the appropriate certification exam — which will be graded based on how well the leaner knows the published material and performs the tasks requested. This is the key to a criterion referenced test (CRT) evaluation.
Unlike in CRT-driven scenario, the norm referenced test (NRT) experience will be graded not based on the published objectives, but on how well you fare compared to other learners who are taking the same assessment.
This leaves us with one final type-of-exam issue that we must decide on:
1-D) What type of assessment are you creating: a high stakes assessment, or a low stakes assessment?
A high stakes test is one that has a lot riding on it for the learner. Tests such as medical school boards or the law school boards, as well as Novell’s old instructor performance evaluation, or Cisco’s CCSI exams, are examples that come to mind of high stakes exams. These as well as exams for recertification are examples of assessments that potentially have career implications.
A low stakes assessment, for example a quiz or survey, will not impact a learner’s employability. It may, however, provide the student and their instructor with information on what the student needs to study in order to pass the final certification exam.
To this point, we have decided that, for certification, we need to design a certification exam that will be a summative, high stakes, CRT assessment, meaning that it will be a comprehensive exam that will determine employability based on learners demonstrating their knowledge and skills with regard to the published set of competencies and objectives.
Now that we know what type of assessment we need to create, we must determine the number of assessments and the number of items per assessment.
2) How many tests are needed to certify a role?
The major question we must answer next is how many exams we need to create to certify a candidate in an occupational role. The correct response is that the total number of exams depends on the role’s occupational analysis, as well as the follow up task analysis.
For example, a network administrator — whether working with Novell, Microsoft, or Cisco tech — can be certified with a single form exam consisting of between 40 and 50 questions. By comparison, a network engineer will often take between four and seven exams to become certified in the role.
About 20 years ago, that was the case for Microsoft Certified Solutions Expert (MCSE) exam candidate, whether they were new engineers or were simply upgrading their credentials to the latest operating system. If I wanted to work on the latest OS from Microsoft in 2001, I had to re-sit the seven courses for Windows 2000, despite having passed similar exams for NT3.51 and NT 4.0. (This was also my experience with Cisco and Novell.)
For healthcare IT folks that I certified, they had multiple exams to complete because of the complexity of their jobs and/or roles. In fact, of the 1,100 individuals I certified in the first year of my program, none complained to me about the number of courses and tests we required them to complete and pass in order to be certified in their role.
This was because, despite the number of tests and courses we required, the capstone for each role was some kind of final comprehensive exam. Our leadership and management wanted people in the role to pass such an exam in order to give potential customers a positive impression about our organization. They wanted it know that that their techs were “certified.”
So, to be crystal clear, the response to the question of how many exams are needed to effectively certify a candidate in an occupational role is … it depends. It might be one knowledge-based exam, or multiple knowledge-based exams depending on the complexity of the role.
It might also be the case that you need to certify folks based on their performance of specific tasks. In such a case, the final exam might be a performance-based exam, perhaps involving a role play, or a presentation to a panel of experts, or a series of oral responses, or a hands-on demonstration of skills. These decisions must be made in collaboration with your leadership.
2-A) What is the number of forms per exam?
Whether you settle on one exam or multiple exams, the next question that you need to consider is whether to create multiple forms of the exam. And if multiple forms (or exam versions) are needed, then how many forms of the exam do you need to create?
This question is focused on the security of your exam. On most certification exams, it is considered a best practice to have two or more forms of any required exam(s) available. Each form will measure the same objectives using an equivalent number of items, and should yield equivalent mean scores.
For this to be a truly accurate representation, the items must come from separate item banks containing an equivalent number of items per the published objectives. This is often done for cognitive and performance-based exams, the better to maintain the integrity of the certification process.
2-B) What is the number of items per exam?
Having determined the need for multiple forms for each of the exams we are going to leverage, we are next faced with the key resource question. To whit, how many items do we need to create for a role-based certification effort?
Here, it all depends on the number of competencies and/or objectives you are asking candidates to certify on. The other key issue that determines the number of items is the number of forms. At a minimum, you should have one item bank per exam form.
For nearly two decades, my rule of thumb has been to create between three and five questions per competency (or objective), per item bank. If you have 35 competencies that you are evaluating, then you need to create between 105 and 175 items per test bank.
Why “between three and five questions per competency” is considered ideal is a common question. My reasoning is simple: For each objective or competency, an effective exam should include one or two questions based on the first two levels of Bloom’s Taxonomy, followed by between two and fours items that address Bloom’s level 3-6.
This will ensure that every test taker will be asked at least one low-level question, as well as at least two items that address Bloom’s higher-order processes.
Now that we have discussed some of the parameters for crafting a certification exam in broad strokes, it is time to examine the Level 1 process maps. To accomplish this, we will examine our third key question (and sub-question).
3) If it is a true certification exam, then what is the process for designing and developing this instrument?
Shrock and Coscarelli (2007) define 13 steps in their Level 1 process plan. These are as follows:
1. Document the process.
2. Analyze the content of an occupational role.
3. Establish the content validity of the objectives.
4. Create and item bank with test items.
5. Create a performance checklist and a key.
6. Establish the content validity of the items.
7. Beta test the exam.
8. Analyze the results of the beta test with item analysis.
9. Create parallel forms.
10. Set up your initial pass score.
11. Establish exam reliability.
12. Train your evaluators.
13. Report the scores.
What I did for my programs was leverage the work of Shrock and Coscarelli (2007) and add in some steps to work with my test engine and to be able to present to my leadership how we would address the performance-based requirements that were very important to them. My process, as shown in Figure 1 below, includes the following steps:
1. Identify the role.
2. If a comprehensive project plan was not available, then analyze the content of an occupational role using DACUM.
3. Analyze all content with task analysis.
4. Develop a learning/assessment blueprint.
5. Establish the content validity of the objectives.
6. Decide on the type of exam: knowledge-based (KB) or performance-based (PB).
7. Create an item bank with test items (KB) OR create a performance standard and instrument(s) (PB).
8. Establish the content validity of the items.
9. Load the assessments onto your LMS.
10. Conduct a technical edit and copy edit of the assessments.
11. Conduct pilot tests of the exam. This includes an alpha test followed by a beta test.
12. Analyze the results of the pilot tests with item analysis; make required edits based on the item analysis.
13. Create parallel forms.
14. Set up your initial pass score using the Angoff method.
15. Establish exam reliability.
16. Document the process.
17. Report the scores.
18. Monitor and revise the process and the exams as needed.
Figure 1: Level 1 Process Map for Assessment Design and Development
The next issue that comes up is what are the steps to follow for each Level 1 process. As shown in Figure 2, this Level 2 process map was drawn up for use with the Questionmark Test engine primarily for knowledge-based tests, and clearly lays out the steps required to accomplish the Level 1 processes.
It will easily adapt to any of the popular engines in use, including those designed to assist with performance-based exams.
Figure 2: Level 2 Process Map for Cognitive Assessment Design and Development using Questionmark (QM) Test Engine
We’ll lead off Part 2 of Step 9 next week a discussion of two important attributes of a good exam: validity and reliability.