ETAOIN SRHLDCU, or: What are the most common words and letters in English?

Most of us know that ‘e’ is the most common letter in English and the is the most common word. Many are familiar with ETAOIN SHRDLU, the nonsense string that used to appear in print because of early-20thC printer design and now serves as shorthand for the most popular letters.

Beyond prevailing lore and trivia, we’re generally less certain about the English language’s most common words and letters. Different studies over the years have produced varying results, depending on the datasets and methods used.

Now Google’s director of research Peter Norvig has used the vast data from the Google Books corpus – over 743 billion words – to produce updated word- and letter-frequency tables. Here’s his letter count:

Peter Norvig - English language letter count frequency table

As you can see, it violates ETAOIN SHRDLU only slightly, becoming ETAOIN SRHLDCU.

The 50 most common words, in order of frequency, are: the, of, and, to, in, a, is, that, for, it, as, was, with, be, by, on, not, he, I, this, are, or, his, from, at, which, but, have, an, had, they, you, were, there, one, all, we, can, her, has, there, been, if, more, when, will, would, who, so, no.

Norvig also investigated the most common word lengths, sequences of letters (“n-grams”), letters in various positions in words, and much more. It’s a fascinating page – a feast for data fiends and word nerds alike. (And they are often alike.)

About these ads

19 Responses to ETAOIN SRHLDCU, or: What are the most common words and letters in English?

  1. Shaun Downey says:

    ALways good to be a delving nerdy type Stan!

  2. alexmccrae1546 says:

    Norvig’s letter count stats, i.e., frequency of use, seem to reflect the reality of our English usage pretty accurately, although I was a tad surprised to see how low in the rankings the letter “B” stood.

    In light of the fact that there were four “B”-words in the alleged top-50 most frequently in use, i.e., “be”, “by” “but” and “been”, I was a bit puzzled with it being so low down in the list, nestled as it was between the letters “”Y” and “V”.

    I see few other anomalies that jump out within the other relative rankings.

  3. Stan says:

    Shaun: It keeps us busy!

    Alex: That’s an interesting anomaly all right. I guess it goes to show how comparatively little the letter ‘b’ occurs elsewhere.

  4. A random observation: Isn’t it peculiar that in the list of the most common words we find he, I, his, we, and her (in that order) – but no “she”? Does it mean that in written language, men typically are portrayed as active agents and women as passive (as in “he told her”, “he held her hand” etc.)?

    • Jez Flores says:

      yes.
      But I have used the examples of “He hit her” and “He had her” …. object. It’s so telling.

    • dainichi says:

      I think this only proves that male pronouns/determiners are used more often than female ones. The list doesn’t seem to distinguish between ‘her’ as an oblique/objective pronoun and ‘her’ as a possessive determiner, or between ‘his’ as a possessive determiner and ‘his’ as a possessive pronoun. (Not sure if this is the preferred nomenclature. What I mean is: ‘her’ corresponds to both ‘his’ and ‘him’, and ‘his’ corresponds to both ‘her’ and ‘hers’)

  5. […] hace un retuit de @StanCarey sobre las palabras y letras más comunes en inglés, enlazando a un post con una data muy interesante de Peter Norvig, director de investigación en […]

  6. wisewebwoman says:

    You come up with the most interesting studies, Stan!
    XO
    WWW

  7. Stan says:

    Christian: That’s probably one factor. Also, historically the greater part of printed prose has, I think, been men writing about men (where human subjects appear). And he has traditionally been the generic third-person singular personal pronoun; only in recent decades has there been a widespread effort to use gender-neutral alternatives.

    WWW: Glad you found it interesting! It’s fine work by Norvig.

  8. marc leavitt says:

    Stan:
    I think it’s statistically discriminatory that “e'”s cousin, schwa isn’t included.
    Also interesting: “I” is only #19, while “you” is #32.

  9. Stan says:

    Marc: The schwa is the unsung (and unstressed) hero of speech. Of the top 50 words, more is the one whose inclusion surprised me most.

  10. […] ETAOIN SRHLDCU, or: What are the most common words and letters in English? […]

  11. LanceR says:

    How does th, sh, ck, ch, etc. figure in this? I wonder if they were considered their own letters, how would the order change? When we were kids, a friend and I created a code that replaced those with symbols. It also replaced the, od and and. Crazy that we happened upon the most common words.

    • Stan says:

      Lance: Norvig examined the frequencies of two-letter sequences (bigrams); ‘th’ was top of the chart, with ‘ch’ also featuring in the top 50. You can see the rest of that list on his page. Your code sounds interesting.

  12. Canehan says:

    I surprised nobody mentioned Pogo ….

  13. John says:

    No one mentioned it so I will…the ascension of “C” into what must have been a fairly well-studied and exclusive group of letters, and not by a hair, 0.61%.
    I’d guess that the printers who originally started the stamp signature, who handled actual letters daily, would know whether C should be in the group.
    What’s the anyone’s guess, some general tide in the written language or a miscalculation on the part of the printers, some third thing?

    I guess, “Google Book Corpus” isn’t a very specific definition of the sample. Is there more information on the textual selection?

    Great fun ~

    • Stan says:

      John: I don’t know what lies behind the apparent rise of C, but I’d be interested in any educated guesses. You’ll find some background information on the Google Books corpora here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 9,196 other followers