Testing is a main component of many of our projects. We make use of a wide range of testing methods, each with their own goals, limitations, and benefits. One testing method we use quite often is Impression Testing (often using Chalkmark) because it’s a great way to get quick, inexpensive feedback on single-interaction tasks. In an impression test, we give a user a task, show them a non-interactive screen shot, and record their first click.
We found ourselves in a bit of a dilemma on a recent project: we wanted to do some impression testing but we also wanted to test two levels of navigation (including A/B testing one of the labels), and we only had time and budget for one more round of testing. We couldn’t find any tools that would allow us to run two-page impression testing, let alone one that would also let us randomize the label we needed to A/B test. So we built our own.
For each task, we showed participants a wireframed version of the starting screen, usually the homepage. If the participant clicked anywhere in the main body of the screen, i.e., anywhere but the menu, their click would be recorded and they would advance to the next task, just like a regular impression test. However, if the participant clicked one of the menu labels, they would be shown that menu choice expanded (though this was technically a separate second wireframe, it was unlikely that they realized this). They would be prompted to click again on this second page, and that click would be recorded as their final answer.
For the most part, the test went well! We got useful and interesting feedback and were able to make improvements to our design (successful impression testing) and to our labels (successful A/B navigation testing). We also learned a few lessons for next time:
1. Don’t forget about the back button.
One of the rules of good web design is ‘don’t break the back button’. Years of testing have proven that it’s the favourite way for people to navigate. Unfortunately, based on the way we built our system, use of the back button also affected our stats.
We had considered that some people might click the menu, see their expanded choices, realize they had made a poor choice on their first click, and want to go ‘back’ to amend their first decision. We had two choices in dealing with this:
- Allow the second guess to override and erase the first guess. This would probably have resulted in an inflated success rate, since the second guess was more likely to be correct after the user ‘learned’ from the expanded menu displayed after the first click.
- Include the first and second clicks in the final tallies. This would mean that in the end we would have some numbers that didn’t add up neatly (e.g., 90 clicks on one task, 95 clicks on another, etc.) since some users would essentially be counted twice.
Awkward totals are better than misleading stats, so we went with the second option. Though the instructions asked users to refrain from using their back button, many did, many times. Next time we should highlight this instruction or include it on every task page.
2. Start with some practice tests.
We saw some behaviour on the first task (and on the second and third, to lesser extents) that we think can be attributed to either confusion or curiosity about how the test worked e.g., clicking outside the wireframe border and using the back button to explore every single menu dropdown. Each instance of this behaviour was recorded as a failure, which meant that the first few tasks had artificially high failure rates. Again, two options:
- Randomize the order in which tasks are presented. This way, participants will exercise their need to explore on different tasks from each other. One downside to this option is that it means each task will have a slightly inflated failure rate, which is better than a few tasks having a very inflated failure rate, but still not ideal. Another downside is that it would be quite complicated to build, given that we were already randomizing whether the user saw label A or label B on several tasks.
- Start with a trial task or two. This will allow the participants to settle down and get their testing jitters out before starting on tasks on which we need accurate success/failure rates.
Next time, we think we will start with a trial task of a wireframed version of a very familiar site, such as Google, followed by a very easy throwaway task for the site we are testing. Then, once they are habituated to the testing environment, we can get started on the real tasks.
3. Think through all the wrong answers
We had one menu label that we couldn’t quite settle our minds on, so we wrote two tasks where the “right” answer involved clicking that menu option and we randomized whether participants would see label A or label B. It was a great way to A/B test and the results showed label B as the better choice.
In all the other tasks, we had decided to show label A, thinking that nobody would even glance at it anyway since those tasks didn’t involve that menu option at all. Or at least, not in our minds. When the results came in, we were surprised to discover that quite a few people ended up selecting that menu option! We have a hunch that if participants saw label B, the contrast between it and the “right” answer would have been stronger and we would have had higher success on that task, but because we didn’t think to A/B test that task, we’re unable to confirm our hunch.
When we do this again, we will be careful to add the A/B label randomization to every task for which it might be relevant. One way to discover which tasks to A/B test would be to run a pre-test with colleagues who aren’t working on the project, as they would be more likely to notice the things to which the project team has become blind.