Task Criticality and Scoring
A high percentage of standard knowledge-based tests are scored with each question or item being equal in the number of points counted toward the final score. In knowledge-based testing, this scoring method is appropriate, and it simplifies the test-design process.
As testing methodology moves along the spectrum from knowledge- to performance-based, however, it is often appropriate to give items different weight, depending on the criticality of the task being tested. In addition, an item might be partially scored based on the correct steps that were taken to perform a task, or no credit is given for item unless a correct end state is achieved.
I’ll put on my techie hat again and use some examples from the realm of information technology to explain what I’m talking about. For example, I design a performance-based item that tests for the ability to create a user on a database application and assign that user privileges to access specific data. There are many steps that are required to complete this task.
In the real world, this is a fundamental competency for a database administrator job role but strictly for purposes of this example, it is one that is not critical for the database administrator job role. What I might decide to do for this item is give partial credit for the correct steps taken toward creating the user, assigning a password, giving access and assigning privileges. Adding to this, each subtask can be equal in score or weighted differently.
On the other hand, a critical competency for a database administrator job role is the ability to successfully recover from a database failure without losing data. In this situation, the item can be very complex, but I will give credit only for the item if the candidate successfully completes all tasks. Because there can be many types of failure scenarios, some being more complex than others, I might decide to give items different weight within the same test.
Deciding how to score performance-based items based on the criticality of tasks can be a complex process, but in many cases, it enhances the relevancy and validity of the test.
I’m sure there are numerous methods for determining task criticality, but a few that I’ve employed in test design include conducting a thorough job-task analysis (JTA), defining a concise objective domain for tasks to be tested, engaging a large pool of subject-matter experts and conducting one or more pilot events with candidates who are representative of the target audience.
Describing the process for conducting a JTA is a long discussion in of itself, so I’ll exclude the details in this column. A thorough JTA is extremely beneficial because it helps establish the domain of tasks that are performed in a specific job role. The JTA also helps establish the face validity of the test. (For a discussion of face validity, see my previous column, at http://www.certmag.com/articles/templates/cmag_nl_credentials_content.asp?articleid=969&zoneid=98.)
A successful JTA requires input from subject-matter experts who participate in the job role for the intended audience. The output of the JTA contributes to the design of the objective domain.
The objective domain defines the objectives that will be tested. In performance testing, the objectives are task-based. In the previous example about creating a user, there are multiple task-based objectives within one item: create a user, assign privileges to a user, etc. The same holds true with the failure scenario. The objective domain is used to establish the exam blueprint, which dictates what objectives will be tested and is used to determine the criticality of the tasks.
One method of determining criticality is engaging a large pool of subject-matter experts in weighting each task. The test designer then establishes a consistent weighting for each task. At this point, the test designer and developers can begin constructing the test.
For purposes of this column, I’ll skip the development process and move to the point where the test is ready for a pilot. In one high-end performance test I developed, I conducted numerous pilot events. During each test event, subject-matter experts took the test in a live environment and provided feedback on each item. This data was used to refine the scoring, determine criticality and adjust timing. My recommendation is to account for this activity in the overall exam development cycle, particularly with performance-based tests.
There are undoubtedly many methods and processes for determining item scoring and criticality. The methods and processes I use are appropriate for the types of performance tests I develop. They are adaptable, however, and can be generally applied to exam-development activities. The key point is that due diligence is required to ensure the quality, validity and relevance of the test.