Validity Considerations for Performance Testing

Posted on
Share on Google+Share on LinkedInShare on FacebookShare on RedditTweet about this on TwitterEmail this to someone

Assume you have decided to launch a new certification or you are enhancing your current program with performance testing. You know there is a demand to test for “real world” scenarios and an application of skills, problem analysis and resolution. The challenge is determining the domain of skills and competencies that should be tested. Remember that performance testing, as espoused in the Performance Testing Council’s motto, is “Testing by doing.”


One of the major challenges in designing performance tests is ensuring validity. I have found that determining validity requires due diligence to ensure that the appropriate people are involved in the design and development process. There are four types of validity that I’ll address here: face validity, content validity, concurrent validity and predictive validity.


Since performance testing is associated with performing tasks, test takers should perceive the test as a measurement of what it is supposed to measure. This is face validity. For example, database administrators who take a performance test will perceive the test as having face validity if they are measured on a common, critical task, such as restoring and recovering from a database failure. This is a task that all database administrators must master. On the other hand, if the performance test contains tasks that are obscure and outside the scope of common tasks, then this separates test takers from one another and consequently jeopardizes face validity. Therefore, it’s important in the design phase that enough representative target test takers are involved to help define the domain of skills that are tested.


Subject-matter experts play a key role in the design phase to ensure that the subject matter of the test does an adequate job of measuring performance. This is called content validity. Getting back to the database failure scenario, subject-matter experts will help develop the content that is included on the test. For example, they may define the specific outcome(s) of the task to be performed and determine what tools should be available to complete those tasks. Getting consensus from a group of “experts” can be challenging, so my recommendation is to engage a sufficient number of experts in the design, development, testing and even early implementation of the test.


Performance tests are typically geared toward testing for mastery. They should do an appropriate and fair job of weeding out individuals who have not mastered tasks. Testing for mastery is concurrent validity. Performance tests are often comprised of both mastery and non-mastery items, but at a minimum, the candidate should complete the majority of mastery items in order to pass. This will vary across performance tests and is determined by a number of factors. Depending on the test, determining concurrent validity may require multiple pilot tests where each event includes a test taker population comprised of both ”experts” and “non-experts.” This was the process I used with piloting the performance test that included the database failure scenario. Actually determining concurrent validity is an involved process, but it is a critical consideration when developing performance tests. If you do not have someone on your development team who is familiar with determining concurrent validity, this is a resource consideration for your development effort.


The interesting characteristic of performance testing is that while it tests for mastery through performance, the nature of a well-designed and robust performance test will account for knowledge, experience, best practices and competency. Therefore, not only is current mastery tested, but future competency is tested as well because it incorporates higher-order cognitive thinking into the testing process. The ability to test for future competency is called predictive validity. Let’s get back to my database failure scenario once again. If the failure scenario challenges the higher-order cognitive skills of the test taker and demonstrates mastery, this should be an indication that the test taker can competently take on more challenging tasks. Similar to concurrent validity, if you do not have someone on your development team who is familiar with predictive validity, you will need to take this into consideration. In my opinion, a performance test that has high predictive validity also has a very high value to both the test taker and the hiring managers.


In summary, test validity is a critical consideration when designing performance tests. Face validity ensures that what is being measured is valid in the eyes of test taker. Content validity ensures that what is being measured is valid in the eyes of the content experts. Concurrent validity ensures that the test adequately and fairly separates “experts” from “non-experts.” Predictive validity ensures that the test is a measurement of future competency. Taking these factors into consideration and giving them due diligence throughout the design and development process will help ensure the success of your performance test.


James A. DiIanni is the director of assessment and certification exam development at Microsoft Learning and supports the Microsoft Certified Professional program. His experience with performance testing started in 1986 developing simulators for the U.S. Navy, and he has been involved in the IT certification industry since 1997. He can be reached at

Share on Google+Share on LinkedInShare on FacebookShare on RedditTweet about this on TwitterEmail this to someone


Posted in Archive|