Fighting bad typography research

The pre-match hype

Fans of good typography are like any other – they love a good fight and take sides easily, and sometimes the comment thread under my review of serif/sans-serif legibility heats up as if they’re arguing about religion, politics or even climate change.

Now, some commenters on the thread have claimed they’ve found research showing undeniable evidence of a massive difference in legibility between serif and sans-serif fonts, despite the overwhelming body of evidence showing that there is either no difference, or if there is a difference it is too small to worry about.

The research they’re talking about is from Colin Wheildon’s 1984 report – “Communicating or just making pretty shapes” (here reprinted in 1990). The report later formed a part of his 1995 book “Type and layout: How typography and design can get your message across–or get in your way“.

When you hear claims which are radically different from the established body of research, you should rightly be sceptical, especially when they haven’t been published in a peer-reviewed scholarly journal. Nevertheless, being sceptical means examining the merits of any research even if it goes against the consensus view…

Round 1: Down but not out

A few years after Wheildon’s book came out, it was savaged by researcher Ole Lund in a book review and PhD thesis, including what looked like personal attacks. But he failed to mention the basic problem with the research study – that it is very badly designed and the conclusions drawn from it are not credible.

Round 2: It’s a knockout?

Let’s take a closer look at the study as it is described in the copy of the report I have…

The set-up

In Wheildon’s experiment people were shown a newspaper article set in a sans-serif font, asked about their comprehension of the article and any other comments about difficulty in reading, then they were shown an article set in a serif font and were then asked the same questions.

Immediately we can see a problem -  The purpose of the experiment is revealed after one test condition but before the other, so biasing the second condition. People taking part in research studies are notoriously open to bias and leading questions, so the volunteers may have simply been saying what they thought the experimenters wanted to hear - this alone has the potential to invalidate the test (see the Hawthorne effect).

It looks like there may have only been one or two rounds of testing, which isn’t enough to produce a valid result either.  If it was the same article content shown each time, then obviously it will be easier to read 2nd time round. If you solve it by randomising the article order you’re going to need a lot more rounds of testing. If you solve it by using different article content how do you make sure they are equal in terms of reading difficulty to isolate the effects of different fonts?

Measuring the results

There is a table listing comprehension levels but we don’t get to see the questions. Why is comprehension the only measure used?

What about these other recognised measures of legibility or readability?:

  • speed of reading
  • speed of perception
  • fatigue in reading
  • backtracking and other eye movements
  • perceptibility at a distance
  • perceptibility in peripheral vision

A lot of space is given for comments from people about how they felt, the actions they performed or what they thought they understood when reading, but these are only anecdotal claims – there were no objective observations made using stopwatches or eye trackers to see what really happened.

The final bell

The research is poorly designed, wide-open to unintentional (and perhaps intentional) bias and doesn’t provide credible objective data.

However, perhaps I’ve got it wrong – the write-up I’m working from is very poorly described – If anyone can send me what they consider to be the definitive version of the report, I’d be glad to take another look.

But for now, I hope these 28 year old rogue claims about serifs are finally out for the count.

This entry was posted in Articles, Colin Wheildon, legibility, Ole Lund, reading, Research, sans serif, serif, typography and tagged . Bookmark the permalink.

20 Responses to Fighting bad typography research

  1. Pingback: Which Are More Legible: Serif or Sans Serif Typefaces? – alexpoole.info

  2. I’ve read and considered Wheildon’s work in the broader context of legibility and readability studies since I first came across it about 1990. I have worked in publishing and printing since the early 1970s, being just old enough to have direct experience of letterpress [trained as a compositor, ho, ho , ho] and then to see the complete transformation of my craft over the next twenty years or so. I advise large numbers of thesis writers about the use of production technologies, and help them actually design their theses to communicate their findings.
    I disagree with those who say that Wheildon’s study was poorly designed. He used over 200 subjects to test five main hypotheses, sufficient to demonstrate the necessary content and construct validity. The material he was using [text and pages from a popular automobile club magazine] was at an appropriate level for the audience. He only measured comprehension, no other fashionable concepts, which is what makes it such a powerful study. When I advise thesis students, who are writing material far above that used by Wheildon but for an expert audience, I treat his work as a classic empirical study. Very few other studies are as clearly defined, and very few have findings as clear-cut. Too many are compromised at the outset by theory. The purpose of typography, invisible as it should be, is to let the meaning of the words flow effortlessly to the reader.
    Objections based on literary theory founder, in my view, on Wheildon’s empirical rocks. I have lost count of, and lost patience with, people who says things like ‘I hate Times New Roman’. TNR was designed to be legible from 4 points to 196 points; it is the quintessential ‘legible’ font. When students give me a page of Arial [or Garamond], I convert it to Times and print it for comparison, and say ‘what do you think now that you see it on the page?’: grey versus black, a function of ascenders, descenders and counters. To paraphrase Wheildon: black is the best colour, the blacker the better.
    Michael McBain
    University of Melbourne, Australia

    • Alex Poole says:

      Hi Michael,

      Thanks for taking the time to comment.

      It’s good that he used so many test subjects (224 to be exact), but if he presented the test conditions as described in the report – ie probably not randomised and maybe with leading questions, then that number is irrelevant – it simply isn’t a fair test and the results cannot be taken seriously.

      But as I said, the copy of the report I have perhaps isn’t the most complete one – but it sounds like you have access to the full write-up. If you post a scan or link here or send it to me by email, I’d be glad to update my post.

  3. Drayton Bird says:

    And where is your research – other than a series of generalisations – that indicates Wheildon is wrong?

    • Alex Poole says:

      Hi there, The criticisms are very specific about the methodology of the research. Do you have a copy of the original research so I can check these points definitively?

  4. Pingback: Which Are More Legible: Serif or Sans Serif Typefaces? | Alex Poole

  5. Ener Hax says:

    excellent post – i am a typographic nut (yes, early bauhaus lack of capitalisation here).

    what are your thoughts of the interesting, albeit self-admitted lack of methodology, work just done by Errol Morris?

    http://opinionator.blogs.nytimes.com/2012/08/08/hear-all-ye-people-hearken-o-earth/

    i think it was very “fun” of him to do his study in such a non-academic matter and that his results to speak to something – although i am not certain what and thus the question to you.

    thanks! =)

  6. David says:

    From my memory of reading Wheildon’s full study, the test was objective as to comprehension. Would subjects mask their comprehension of a san-serif typeface? Maybe, but it seems unlikely. Still, it would be interesting to re-run the study. Michael, would you have the resources to do it? It could be done to without telling students it was to test type faces, which would be simple.

  7. Typography and design are battlefields strewn with the wreckage left behind by the life-and-death struggles of many theorists. Wheildon’s study does seem to get the hackles up for a lot of people. I like Karen Shriver’s book for its well-referenced studies of meaning and rhetorical purpose, and even though it is now getting a little bit old, I still draw on it on the three or four occasions a year when I speak about the value of design. Shriver doesn’t like Wheildon, either.
    Partly because of the enmity Wheildon invokes, I’ve read most of what has been written about design, rhetorical purpose and measures of communicative value. As I said in my April comment, much of the wheildonschmerz is based on an incorrect pre-judgement of what such studies should be discovering; they make quite large assumptions about the design and purpose of Wheildon’s study. It makes me wonder if they ever had access to the original study, which admittedly was published privately by a New South Wales industry organisation–it doesn’t get much more obscure than that.

    What makes Wheildon’s study so seminal for me is that he eschewed theory. He took his 224 volunteers, and mixed up the fonts, the leading, the justification, the capitalisation and the measures, and then tested comprehension–that’s all. “Read this, and then answer these questions about the thing you’ve just read”. It’s a classic independent/dependent variable study, and virtually unencumbered by theory. The number of variables being tweaked, and the number of participants tested, is well within reasonable limits; it’s a sound and robust experimental design.

    • Alex Poole says:

      Hello again Michael,
      Do you have a copy of the original study, or a version of it that describes the study design in more detail than the version I have access to?

  8. Pingback: Exploring Typography « PWs at USF

  9. Chris Dean says:

    “Communicating, or just making pretty shapes?” also 404.

  10. doknir says:

    Links from this article to your blog are not working (blog/ is missing). BTW, thanks for your analysis!

  11. Pingback: Best fonts for business documents

  12. Pingback: Битва века: гротески против шрифтов с засечкамиМультимедиа в Linux

  13. Sam Doctor says:

    Has anyone presented a link to the actual study yet? I am curious to read it as verbatim as possible. I have seen many other studies that were inconclusive in the other aspects of readability between fonts with and without serifs.

    • Alex Poole says:

      No, every time I ask for a copy, these ardent supporters suddenly shy away from sharing the actual study…I wonder why?

  14. dunerat says:

    Based on the comments i’ve seen here, the reason they aren’t sending you a copy is that they don’t have one either, hence their inability to respond to your comments about experimental design.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>