Last week I was at a meeting sponsored by NSF that brought together people who have worked on project funded by their CCLI program (Course, Curriculum, and Laboratory Improvement). They fund an impressively large number of projects through that program, there must have been 300 people at the meeting, and it turned out to be a really fun meeting as everyone there was doing something cool involving science education at the college level. I’ll make a separate post about some of the interesting projects I saw while there, but the session that applied most directly to our current work at SimBiotic was on concept inventories and assessing misconceptions.
We want to test the labs we make to see if they are educationally effective – in other words, do students learn from doing our labs. It turns out, we discovered (and we’re not alone), this is a deceptively hard question to answer. All of us involved in teaching make test questions all the time, and we think we do an OK job. So to test a lab on, say, the evidence for evolution (a new lab we’re working on now), one approach would be to simply make some standard test questions on evolution and fossils and so on, and see if students answer those questions better after doing our lab than before. The problem is that standard questions that you just write down off the top of your head tend to test content knowledge – can students remember what they read and repeat it back. They don’t tend to test conceptual knowledge – do students have a working understanding of the concepts that they can apply in a novel situation. Students tend to have “misconceptions”, conceptual ideas that are wrong in general but nevertheless explain parts of their world, so they are hard for the student to change to a correct conception of how the world works.
So we think, along with many others, that one of the things a good teaching tool should do is overcome students misconceptions on a topic and move them towards having more scientifically accurate concepts that they use. The problem is that its not easy to write test questions that really get at conceptual knowledge. And often, even very experienced teachers don’t know where their students misconceptions are (at least, among our group, our initial predictions of where students will have trouble seem to always be very incomplete). As we’ve worked on our labs in OsmoBeaker and EvoBeaker, we developed something of a protocol for finding student misconceptions and developing test questions on those. At the CCLI session, it was nice to hear that what we’ve developed seems in line with what people with a more formal training in educational research recommend.
As part of a session on concept inventories (listing those concepts that are important to understanding a topic, and the misconceptions that students have around each), Julie Libarkin from Michigan State gave a nice concrete presentation on how to make a good test. She talked about two basic principles behind test design – validity and reliability. Validity is whether the test is really measuring what you think it is. Reliability is whether it measures that consistently, both on different questions in the test addressing the same topic, and when you give the test to different groups of students. To get validity, she talked about using think-aloud interviews with students to find out why they answer questions in a certain way, and giving essay questions to students and using those essay answers to tease out the misconceptions and develop the test questions. She also talked about having experts critique the test questions (both scientists and education experts). And she mentioned using focus groups to probe for misconceptions. We’ve been doing most of those other than the focus groups with our tests. We come up with some essay or drawing questions that we think will hit on concepts students have trouble with, and then ask students to explain their answers and we probe their explanations in an interview. We then redesign the questions and add questions where we think we missed something with the first set. We continue doing that iteratively until our questions seem to be capturing most of the common thoughts we hear from the students, and the written answers seem to match the explanations. Then we select student-written answers to each question that hit on each misconception we’ve seen, and use those in making the question multiple choice. Finally, we give these to some outside experts to critique.
We haven’t thought as much about reliability, so that’s something we should do. And there are some statistical analyses we should be using as well, along with accumulated wisdom on how to write a good test question. So lots to learn. But still, it was interesting to hear that the process we came up with from a little reading in the literature and just trying things out is not that far from accepted practice. It’s encouraging to know that somewhat independent groups coming from different disciplines converge on a similar process. This whole process takes a seriously large effort, though. We’ve spent months writing a test, almost as much time as we spend writing the lab we’re trying to test. It would be great if there was some organized repository of misconceptions in various areas of biology, and tests that have been validated and reliability verified that we could just grab and run. Of course then our labs (and others) would be teaching to those tests, but if the tests are good, I don’t think that’s a bad thing. At the meeting, I heard snippets about various groups that are trying to make concept inventories and conceptual tests for different areas in biology and have these available, I’m hoping those efforts really take off as it would be such a time-saver for anyone trying to make a really good teaching tool.