*This semester, students in READ 6329.10 were asked to contribute a post to this blog.*

by I. Martinez

So, let’s say it’s the beginning of the school year and you’d like to improve on your test construction skills and at the same time you’d like to get an idea of which objectives seem to be giving your students the most trouble. What a great idea! And doing a quick item analysis of your tests will help you accomplish both of these goals!

To simplify things, let’s start from the beginning and go through the steps of how to analyze a teacher-made test for difficulty and discrimination. The first step you need to take is to do a quick plan of the number and types of questions you will ask in your test, as illustrated in this chart:

This hypothetical chart organizes the objectives in order, links them to Bloom’s Taxonomy, and plots out the number of questions to be constructed. Notice in the “Applying” row, the students will be answering two questions for objectives 1, 3, and 4, and three questions for objectives 2 and 5. Also, make sure to keep in mind which questions on the test are targeting which objectives. For example, keep in mind that you constructed test item number 7 to cover objective 1 in a “remembering” learning domain. This will help you to discover which objectives are more difficult for your students.

The test is then developed, administered and scored. After this, the students’ names, grades and responses are plugged into an Item Analysis Worksheet like the one below:

In this example, the students with the high scores are listed first and in descending order, thus automatically separating the students into high and low groups. The two groups are distinguished by the white and yellow shading for clarity. The two groups of students must be equal; therefore, in cases where there is an odd number of students, the middle scoring student must be left out.

Marking the right and wrong answers can be time consuming; therefore it is helpful to have someone assist you in calling out the students’ names (in the order of their listing) along with their “right” or “wrong” answer, so you can quickly mark a “1” or a “0” across the table. For example, in the first row we have a student named Andres who has a grade of 100, so his responses are all plugged in as a “1” because he got each one of the test questions correct. In contrast, the student named Delmar only answered questions 2, 6, and 9 correctly and all the other questions have a “0” marking because they were incorrect.

Now, to calculate the difficulty index for question #1, I will direct your view to the first column of the chart where we will add up the number of students who answered it correctly and divide this number by the total number of students. So for question #1, 6 students out of 10 got it correct or 0.60.

It is important to remember that the difficulty index measures the number of students that responded correctly to test questions, and that it is actually counter-intuitive. For example, at face value, you may think that a test question with a difficulty index of 90% is very difficult, but it is actually too easy, because again, it is measuring the percentage of students who got it right. Normally, any test question that is 0.90 or above is considered to be an easy, give-a-way test item and any question that is below 0.20, is considered to be a very difficult test item.

Next, the questions are checked for their discrimination index. The number of “lower” students who answered a test item correctly is subtracted from the number of “upper” students who answered that same test item correctly and is divided by half of the total number of students. With discrimination scores, the higher the value the more discriminating the item, meaning that more students with high test grades got it correct than students with low test grades. Discrimination scores for test questions should be 0.20 or higher. If in case a test question has a discrimination score of 0 or close to it, it means that it was answered correctly by more low testers than high testers and is not a well constructed test question. A discrimination score of 0 signals a question that was most likely worded ambiguously and should be eliminated. An example of this anomaly is question #9. Another possibility is that both high and low groups answered it correctly, as is the case with questions #2 and #3.

In closing, you may want to pose these questions to yourself:

- Which question was the easiest? Answer: Question #2
- Which question was the most difficult? Answer: Question #9
- Which question has the poorest discrimination? Answer: Question #9
- Which question(s) should be eliminated? Answer: Question #2 (too easy) AND Question #9 (for its negative discrimination score, not because of its difficulty index)
- Which objectives are the most problematic? Answers will vary
- Are the questions from the higher levels of Bloom’s Taxonomy the most problematic? Answers will vary

For more on item analysis, please visit the following sites:

http://www.utexas.edu/academic/ctl/assessment/iar/students/report/itemanalysis.php

http://fcit.usf.edu/assessment/selected/responsec.html

Delma MartinezDecember 21, 2013 / 1:46 pmThis is wonderfully written and explained. Thanks for the additional links!