Most of us know that ‘e’ is the most common letter in English and the is the most common word. Many are familiar with ETAOIN SHRDLU, the nonsense string that used to appear in print because of early-20thC printer design and now serves as shorthand for the most popular letters.
Beyond prevailing lore and trivia, we’re generally less certain about the English language’s most common words and letters. Different studies over the years have produced varying results, depending on the datasets and methods used.
Now Google’s director of research Peter Norvig has used the vast data from the Google Books corpus – over 743 billion words – to produce updated word- and letter-frequency tables. Here’s his letter count:
As you can see, it violates ETAOIN SHRDLU only slightly, becoming ETAOIN SRHLDCU.
The 50 most common words, in order of frequency, are: the, of, and, to, in, a, is, that, for, it, as, was, with, be, by, on, not, he, I, this, are, or, his, from, at, which, but, have, an, had, they, you, were, there, one, all, we, can, her, has, there, been, if, more, when, will, would, who, so, no.
Norvig also investigated the most common word lengths, sequences of letters (“n-grams”), letters in various positions in words, and much more. It’s a fascinating page – a feast for data fiends and word nerds alike. (And they are often alike.)