The pre-match hype
Fans of good typography are like any other – they love a good fight and take sides easily, and sometimes the comment thread under my review of serif/sans-serif legibility heats up as if they’re arguing about religion, politics or even climate change.
Now, some commenters on the thread have claimed they’ve found research showing undeniable evidence of a massive difference in legibility between serif and sans-serif fonts, despite the overwhelming body of evidence showing that there is either no difference, or if there is a difference it is too small to worry about.
The research they’re talking about is from Colin Wheildon’s 1984 report – “Communicating or just making pretty shapes” (here reprinted in 1990). The report later formed a part of his 1995 book “Type and layout: How typography and design can get your message across–or get in your way“.
When you hear claims which are radically different from the established body of research, you should rightly be sceptical, especially when they haven’t been published in a peer-reviewed scholarly journal. Nevertheless, being sceptical means examining the merits of any research even if it goes against the consensus view…
Round 1: Down but not out
A few years after Wheildon’s book came out, it was savaged by researcher Ole Lund in a book review and PhD thesis, including what looked like personal attacks. But he failed to mention the basic problem with the research study – that it is very badly designed and the conclusions drawn from it are not credible.
Round 2: It’s a knockout?
Let’s take a closer look at the study as it is described in the copy of the report I have…
In Wheildon’s experiment people were shown a newspaper article set in a sans-serif font, asked about their comprehension of the article and any other comments about difficulty in reading, then they were shown an article set in a serif font and were then asked the same questions.
Immediately we can see a problem - The purpose of the experiment is revealed after one test condition but before the other, so biasing the second condition. People taking part in research studies are notoriously open to bias and leading questions, so the volunteers may have simply been saying what they thought the experimenters wanted to hear - this alone has the potential to invalidate the test (see the Hawthorne effect).
It looks like there may have only been one or two rounds of testing, which isn’t enough to produce a valid result either. If it was the same article content shown each time, then obviously it will be easier to read 2nd time round. If you solve it by randomising the article order you’re going to need a lot more rounds of testing. If you solve it by using different article content how do you make sure they are equal in terms of reading difficulty to isolate the effects of different fonts?
Measuring the results
There is a table listing comprehension levels but we don’t get to see the questions. Why is comprehension the only measure used?
What about these other recognised measures of legibility or readability?:
- speed of reading
- speed of perception
- fatigue in reading
- backtracking and other eye movements
- perceptibility at a distance
- perceptibility in peripheral vision
A lot of space is given for comments from people about how they felt, the actions they performed or what they thought they understood when reading, but these are only anecdotal claims – there were no objective observations made using stopwatches or eye trackers to see what really happened.
The final bell
The research is poorly designed, wide-open to unintentional (and perhaps intentional) bias and doesn’t provide credible objective data.
However, perhaps I’ve got it wrong – the write-up I’m working from is very poorly described – If anyone can send me what they consider to be the definitive version of the report, I’d be glad to take another look.
But for now, I hope these 28 year old rogue claims about serifs are finally out for the count.