Wordable awareness

April 7, 2022

I came across an interesting word in Don DeLillo’s novel Falling Man (Picador, 2007). It appears in the middle of a conversation between an estranged couple, here discussing their son:

‘We talked about it,’ Keith said. ‘But only once.’
‘What did he say?’
‘Not much. And neither did I.’
‘They’re searching the skies.’
‘That’s right,’ he said.
She knew there was something she’d wanted to say all along and it finally seeped into wordable awareness.
‘Has he said anything about this man Bill Lawton?
‘Just once. He wasn’t supposed to tell anyone.’
‘Their mother mentioned this name. I keep forgetting to tell you. First I forget the name. I forget the easy names. Then, when I remember, you’re never around to tell.’

Seeped into wordable awareness is a lovely phrase, and wordable is a curiously rare word, given its straightforward morphology and transparent meaning. It has virtually no presence in large language corpora:

Read the rest of this entry »


We ourself can use this pronoun

March 25, 2022

On a recent rewatch of the 1979 film The Warriors, I noticed an unusual pronoun spoken by Cleon, played by Dorsey Wright:*

Still image from The Warriors. Cleon, played by Dorsey Wright, is shown in close-up wearing a head-dress, saying, 'I think we'd better go have a look for ourself.' It's night time, and the background shows pale blurry lights.

Ourself, once in regular use, is now scarce outside of certain dialects, and many (maybe most) people would question its validity. I’ve seen it followed by a cautious editorial [sic] even in linguistic contexts. The Cambridge Grammar of the English Language (2002), describing it as the reflexive form of singular we – ‘an honorific pronoun used by monarchs, popes, and the like’ – says it is ‘hardly current’ in present-day English.

But that’s not the whole story, and it belies the word’s surprising versatility and stubborn survival outside of mainstream Englishes, which this post will outline. There are graphs and data further down, but let’s start with usage.

Read the rest of this entry »


‘Cuckquean’, abbreviations, and vocabulary change

June 22, 2017

Catching up on my column for Macmillan Dictionary Blog, I have three recent posts to share.

Golly, matey – vocabulary change is massively awesome looks at how the words we use reflect our shifting habits and preoccupations:

To look more broadly at these ripples in the collective lexicon, we can turn to big data in the form of language corpora. One of these, the Spoken British National Corpus, allows many kinds of linguistic research, such as studying how English vocabulary and regional dialects are shifting. The project was in the news recently with a story about ‘words we no longer use’. The headline exaggerates, but there are indeed words we use much less – or much more – than we did twenty years ago. The corpus data can illustrate how our lives have changed over the years.

TL;DR: Abbreviations FTW is an overview of the different types of abbreviations and the different ways we style and use them:

Efficiency is intrinsic to communication, and can drive language change. Set phrases that are used repeatedly are commonly abbreviated, as they save people time and effort. In digital communication, abbreviations may also serve as tribal markers – tfw users are in the know about internet lingo. Ikr. Sometimes, as in the case of lol, abbreviations may even undergo grammatical transformation.

Cucks, cuckolds, cuckqueans and cuckoos briefly explores the origins and applications of this nest of interconnected words:

Quean is a notable word in its own right. It comes from Old English cwene, meaning ‘woman’, from Proto-Indo-European *gwen-, which is also the root of queen, misogyny, and gynaecology. In English, cwene was originally a neutral word; but like many terms of female reference, it gradually took on negative senses and connotations, coming to mean ‘impudent woman’, ‘hussy’, and ‘prostitute’. In Scots it has retained its original neutral sense.

Each post is bite-sized, readable in 2–3 minutes. For more, you can browse the full archive.


ETAOIN SRHLDCU, or: What are the most common words and letters in English?

January 7, 2013

Most of us know that ‘e’ is the most common letter in English and the is the most common word. Many are familiar with ETAOIN SHRDLU, the nonsense string that used to appear in print because of early-20thC printer design and now serves as shorthand for the most popular letters.

Beyond prevailing lore and trivia, we’re generally less certain about the English language’s most common words and letters. Different studies over the years have produced varying results, depending on the datasets and methods used.

Now Google’s director of research Peter Norvig has used the vast data from the Google Books corpus – over 743 billion words – to produce updated word- and letter-frequency tables. Here’s his letter count:

Peter Norvig - English language letter count frequency table

As you can see, it violates ETAOIN SHRDLU only slightly, becoming ETAOIN SRHLDCU.

The 50 most common words, in order of frequency, are: the, of, and, to, in, a, is, that, for, it, as, was, with, be, by, on, not, he, I, this, are, or, his, from, at, which, but, have, an, had, they, you, were, there, one, all, we, can, her, has, there, been, if, more, when, will, would, who, so, no.

Norvig also investigated the most common word lengths, sequences of letters (“n-grams”), letters in various positions in words, and much more. It’s a fascinating page – a feast for data fiends and word nerds alike. (And they are often alike.)


Google’s Ngram Viewer and wild treacle

November 21, 2012

I have two new posts up at Macmillan Dictionary Blog. If you subscribe to it, or follow me on Twitter, you may already’ve seen them, in which case please indulge or disregard.

The first is a report on the new features of a recently relaunched linguistic corpus tool: Google’s Ngram Viewer 2.0:

It has improved the datasets and publisher metadata and added many more books to the corpus, so the results are more accurate and comprehensive than before. The interface remains much the same – you can modify searches by timeframe, degree of detail, and corpus type, including several different languages – but it comes with a whole new bag of tricks.

A significant innovation is the ability to search by part of speech. Say you want to look for a word as a verb, but it also functions as a noun. Just append “_VERB” to your search term – the capital letters are essential – and the Ngram Viewer filters accordingly.

You can also now compare BrE and AmE in the same graph. Here’s one I did of color vs. colour on both side of the Atlantic (click to enlarge):

See colour’s conspicuous double-dip in early-19th-century U.S.? Read on for my interpretation of this shift.

*

My latest piece, Getting ‘treacle’ from wild animals, traces the strange origins of treacle, beginning with the Proto-Indo-European root *ghwer– “wild”, from which we get Latin ferus (→ fierce, feral) and ferox (→ ferocious).

*Ghwer– also gave rise to the Greek word thēr, meaning “beast” or “wild animal”, whence the diminutive thērion – a word Aristotle used to refer to vipers. We see the same root in Therapoda (“beast feet”), a category of dinosaurs . . . . From thērion came thēriakos (adj.) “of a wild animal”, which led to thēriakē “antidote for poisonous wild animals”.

Latin borrowed this as theriaca, which became *triacula in Vulgar Latin. From this we get Old French triacle “antidote”, subsequently imported into Middle English and later to become treacle. Treacle was used especially against venomous bites such as snakes’ – the remedy often included snake flesh – then gradually the word’s meaning shifted from antidote to general cure or prophylactic. Sir Thomas More mentions “a most strong treacle against those venomous heresies”. Eventually the medicinal connotations faded.

You can read the rest of this peculiar etymology at Macmillan Dictionary Blog, and older posts are available here.

Edit: Something else I meant to mention. A couple of weeks ago Macmillan announced it would be phasing out its printed dictionaries. Editor-in-Chief Michael Rundell writes about the decision here. “[E]xiting print is a moment of liberation,” he says, “because at last our dictionaries have found their ideal medium.”


Would of, could of, might of, must of

October 23, 2012

When we say would have, could have, should have, must have, might have, may have and ought to have, we often put some stress on the modal auxiliary and none on the have. We may show this in writing by abbreviating to could’ve, must’ve, etc. (Would can contract further by merging with the subject: We would have → We’d’ve.)

Unstressed ’ve is phonetically identical (/əv/) to unstressed of: hence the widespread misspellings would of, could of, should of, must of, might of, may of, and ought to of. Negative forms also appear: shouldn’t of, mightn’t of, etc. This explanation – that misanalysis of the notorious schwa lies behind the error – has general support among linguists.

The mistake dates to at least 1837, according to the OED, so it has probably been infuriating pedants for almost 200 years. Common words spelt incorrectly provoke particular ire, sometimes accompanied by aspersions cast on the writer’s intelligence, fitness for society, degree of evolution, and so on. But there’s no need for any of that.

Read the rest of this entry »


Tweetolife: visualising gender differences in language

August 8, 2011

Michael Rundell, a lexicographer at Macmillan Dictionary, wrote last year about a new area of linguistic research “based not on conventional corpora, but on Twitter feeds”. The demo website he linked to has since been updated, and is worth another look.

Now called Tweetolife (grandiosely subtitled “the science of human life in Twitter messages”), it offers a slick and simple interface that shows how words and phrases used on Twitter break down according to gender, or time of day. Like this:

Read the rest of this entry »