Word frequency game

August 13, 2014

The Red Words Game from Macmillan Dictionary is a new and addictive bit of fun that tests your awareness of word frequencies. It’s named after a feature of the dictionary, the so-called red words and stars.

The idea is that the core vocabulary of English has 7500 ‘red words’, comprising 90% of the language in Macmillan’s huge general corpus.¹ Macmillan Dictionary gives red words special treatment, describing their grammar, collocations, register, and so on. Three-star words are the 2500 most common, two-star words are next, then one-star words.

To play the game you guess how many stars a random series of words have, for 90 seconds. I’ve been scoring 225–300, but to get more than 300 I’d need more luck and free time than I have at the moment. It’s just maddening enough to make you feel hard done by and want another go, like when I had 250 points with 30 seconds to go and got every answer wrong after that.

There are bonus points for fast answers, so don’t dally. The tricky bit is not letting the answers distract you (implication has three stars, anonymous just one!?).² Watch out too for grammatical class, which appears under the word, as sometimes it will affect your answer. For example, the verb find has three stars but the noun has just one.

If you want to pass a few entertaining minutes, go play. It’s even subliminally educational.

*

¹ Link and description updated for accuracy.

² I suspect anonymous will gain a star or two when more recent data are included in the categorisation.


ETAOIN SRHLDCU, or: What are the most common words and letters in English?

January 7, 2013

Most of us know that ‘e’ is the most common letter in English and the is the most common word. Many are familiar with ETAOIN SHRDLU, the nonsense string that used to appear in print because of early-20thC printer design and now serves as shorthand for the most popular letters.

Beyond prevailing lore and trivia, we’re generally less certain about the English language’s most common words and letters. Different studies over the years have produced varying results, depending on the datasets and methods used.

Now Google’s director of research Peter Norvig has used the vast data from the Google Books corpus – over 743 billion words – to produce updated word- and letter-frequency tables. Here’s his letter count:

Peter Norvig - English language letter count frequency table

As you can see, it violates ETAOIN SHRDLU only slightly, becoming ETAOIN SRHLDCU.

The 50 most common words, in order of frequency, are: the, of, and, to, in, a, is, that, for, it, as, was, with, be, by, on, not, he, I, this, are, or, his, from, at, which, but, have, an, had, they, you, were, there, one, all, we, can, her, has, there, been, if, more, when, will, would, who, so, no.

Norvig also investigated the most common word lengths, sequences of letters (“n-grams”), letters in various positions in words, and much more. It’s a fascinating page – a feast for data fiends and word nerds alike. (And they are often alike.)


Follow

Get every new post delivered to your Inbox.

Join 6,511 other followers