Many students struggle with statistical concepts such as interaction. In an experimental group, participants took a paper-and-pencil test and then were given training to establish equivalent classes containing four different statistical interactions. All participants formed the equivalence classes and showed maintenance when probes contained novel negative exemplars. Thereafter, participants took a second paper-and-pencil test. Participants in the control group received two versions of the paper-and-pencil test without equivalence-based instruction. All participants in the experimental group showed increased paper-and-pencil test scores after forming the interaction-indicative equivalence classes. Class-indicative responding also generalized to novel exemplars and the novel question format used in the paper-and-pencil test. Test scores did not change with repetition for control group participants. Implications for behavioral diagnostics and teaching technology are discussed.