On the research side of SimBio, we spend a lot of time writing questions for testing whether students are learning from our labs. We also publish papers on these tests—making a really good test takes enough work to merit its own publication. So even though research is supposed to be an objective, emotionless search for the truth, I'll admit to a sinking feeling while reading a new paper by Ross Nehm and Minsu Ha on how students respond differently to evolutionary questions if you ask them in different contexts.
In a nutshell, Nehm's research says that the stories you use in questions about evolution have a lot to do with the responses you get. If you ask questions about a single species gaining traits, for instance, students have a relatively high rate of correct responses and lower rates of what they call "naive" responses (i.e. misconceptions). On the other hand, if you ask questions about cross-species comparisons where species are losing traits, students show many more naive responses and many fewer correct responses. From a teaching perspective, it seems students have more trouble understanding trait losses across related species than trait gains within a single species.
Looking back at our own tests in this light, I'm not sure how I would reinterpret our results. On the natural selection test we developed for research on our Darwinian Snails lab, we used quantitative changes in traits rather than gain and loss, a situation Nehm and colleagues did not try. Similarly, other evolutionary tests we've built and published (on tree-thinking and on population genetics, for instance) don't fit neatly into the categories that Nehm's group looked at. But that points up the really worrying thing about these results.
Although we spend months designing each new test already to capture the same understanding we find in interviews and essays, we don't systematically vary the elements in the questions to see what effect that has—changing species, traits, whether we ask about gains vs. losses, and on and on across many other possible features of each question. To do all that for a single test would take years of work. Basically, it's an impossible barrier for a single group that is building teaching tools - or for a single teacher that is just trying to assess her class.
I think this further highlights the need for a real effort towards producing high-quality tests across different fields of biology that can be used to assess new teaching techniques. It has to be a community effort because each test item is a lot of work. There should be some central repository where these items, together with the research behind them, are put, and there should be a conversation within this community about how to improve test items, based on research such as Nehm's group is doing. It would be nice to link this repository to some of the concept inventories efforts that are starting to gain traction—I would say those inventories are only useful to the extent that we can assess them accurately. Developing all this will be a bit of a slog, but our own group will be so happy the day we can just lift a test off the shelf and use it, allowing us to focus on how to teach better rather than on how to know whether we're teaching better. That would be a real gain—and as Nehm showed, gains are easier than losses.