Earlier this year there was a minor media flurry about the English language supposedly attaining its millionth word. This (non-)event, which was dragged out for a few years, was such raiméis that I decided not to write about it at all, apart from a curt dismissal on Twitter. However, it did serve some useful purposes. For one thing, it prompted many interested parties – amateurs and academics alike – to write thoughtful criticisms. It got people writing, talking, and thinking about language. It also showed the extent of credulity or cynicism, or both, in many commercial news organisations. Shocking, I know.
I could get carried away posting hyperlinks to worthwhile criticisms of the project (or ‘publicity stunt’, if you prefer), but out of mercy for my readers I’ll limit the links to a handful, each of which I heartily recommend: Ben Zimmer, Jesse Sheidlower, Grant Barrett, Michael Covarrubias, and David Crystal, whose post attracted a response from the man behind the millionth-word shenanigans. Language Log, an outstanding linguistics blog, covered the pseudo-event in considerable detail. To one of those posts, written by Zimmer, I added a comment with an excerpt from James A. H. Murray’s introduction to the first volume of the original Oxford English Dictionary. The main purpose of this post is to repeat that excerpt here on Sentence first.
Before that, though, a few words about Murray (pictured, in his Scriptorium). It is possible that no one else could have done the job he did; it is almost certain that no one could have done it so well and with such commitment to quality over haste. He embodied a rare combination of knowledge, ability, temperament and dedication that made him ideal for the job, though over its many years he had more than occasional cause to doubt his suitability for it and the probability of its eventual completion.
As well as being as great lexicographer, Murray was a polymath whose keen interests included astronomy, botany, archaeology, mathematics and geology. These interests prepared him well for the decades he spent working on the OED. They gave him expertise in a wide range of subjects, which enabled him to draw useful analogies between different disciplines. This is beautifully apparent in the excerpt below. As his granddaughter K. M. Elisabeth Murray wrote in Caught in the Web of Words, “because he never compartmentalised his interests, he never missed seeing something because he had allowed himself to become preoccupied with another line of research.”
Here is Murray’s own diagram of the structure of the English vocabulary, followed by his exceptionally lucid explanation of it.
The centre is occupied by the ‘common’ words, in which literary and colloquial usage meet. ‘Scientific’ and ‘foreign’ words enter the common language mainly through literature; ‘slang’ words ascend through colloquial use; the ‘technical’ terms of crafts and processes, and the ‘dialect’ words, blend with the common language both in speech and literature. Slang also touches on one side of the technical terminology of trades and occupations, as in ‘nautical slang’, ‘Public School slang’, ‘the slang of the Stock Exchange’, and on another passes into true dialect. Dialects similarly pass into foreign languages. Scientific terminology passes on one side into purely foreign words, on another it blends with the technical vocabulary of art and manufactures. It is not possible to fix the point at which the ‘English Language’ stops, along any of these diverging lines.
The Vocabulary of a widely-diffused and highly-cultivated living language is not a fixed quantity circumscribed by definite limits. That vast aggregate of words and phrases which constitutes the Vocabulary of English-speaking men presents, to the mind that endeavours to grasp it as a definite whole, the aspect of one of those nebulous masses familiar to the astronomer, in which a clear and unmistakable nucleus shades off on all sides, through zones of decreasing brightness, to a dim marginal film that seems to end nowhere, but to lose itself imperceptibly in the surrounding darkness. In its constitution it may be compared to one of those natural groups of the zoologist or botanist, wherein typical species forming the characteristic nucleus of the order, are linked on every side to other species, in which the typical character is less and less distinctly apparent, till it fades away in an outer fringe of aberrant forms, which merge imperceptibly in various surrounding orders, and whose own position is ambiguous and uncertain. For the convenience of classification, the naturalist may draw the line, which bounds a class or order, outside or inside of a particular form; but Nature has drawn it nowhere. So the English Vocabulary contains a nucleus or central mass of many thousand words whose ‘Anglicity’ is unquestioned; some of them only literary, some of them only colloquial, the great majority at once literary and colloquial,- they are the Common Words of the language. But they are linked on every side with other words which are less and less entitled to this appellation, and which pertain ever more and more distinctly to the domain of local dialect, of the slang and cant of ‘sets’ and classes, of the peculiar technicalities of trades and processes, of the scientific terminology common to all civilized nations, of the actual languages of other lands and peoples. And there is absolutely no defining line in any direction: the circle of the English language has a well-defined centre but no discernible circumference.
A wonderful passage, which I had occasion to quote during my BBC Radio 4 appearance.
Ben: Yes, it is wonderful. Murray’s description is sensible, helpful, authoritative and quite poetic.
Thank you for the link. I recall listening to it some weeks ago and being struck anew by Payack’s emphasis on his “proprietary algorithm”. The BBC’s description – “Word experts Paul Payack and Benjamin Zimmer” – is too kind to Payack: it was a bogus project to begin with, but the more he talks about it the sillier he makes it sound!
Fascinating stuff, Stan. Thanks for posting. I agree that the whole millionth-word thing is utterly preposterous, and I must take time to read the various links about it. Even just thinking about it here, I’m wondering: do you include slang, American English, old English or even Anglo-Saxon, obsolete words, computer-speak, and foreign words that are so prevalent as to be part of the language, not forgetting the strange gibberish spoken by certain of our more rural elected representatives? I’m sure these questions are answered in your links, though…
My pleasure, Doubtful. Thanks for your comment! Yes, your questions are largely answered in the links, the essential point being that there is no clear dividing line between a word that is part of the English language and one that isn’t. Because who decides, and how? To contend that this can be done is preposterous, as you put it. And by an algorithm? Nonsense. The precise extent of the English lexicon is non-computable.
I’m curious about your mention of American English, though. Do you consider it less standard than British English or Irish English?
What I meant by American English in this context was that if you’re trying to calculate the millionth word in the English language, one of the questions would be about whether you restrict your selection to words used primarily in England (or the UK). Would you count the American usage of ‘gas’ meaning ‘petrol’ as a separate word, or ‘fall’ for ‘autumn’? What about ‘freeway’, ‘icebox’ ‘motel’ ‘quarterback’ ‘deadbeat’ ‘televangelist’ or the odious ‘guesstimate’? (These are just off the top of my head). I’m all for expanding the langauge as much as possible, but I’m just thinking of the type of pedant, a kind of linguistic Amish, that can barely tolerate any word coined after 1900…
By the by, watching the TV last night I saw an early 20th century cookbook which had as its byline “Suggestive Recipes…”. How meaning does change over time!
Doubtful: If you’re trying to calculate the millionth word in the English language, chances are you would not rule out words that are standard in American usage. To do so would require a perverse form of purism, with little to do with pedantry as I understand it.
While the lexical status of some terms is subject to reasonable doubt – not least the supposed millionth ‘word’ itself: web 2.0 – I can’t see how an argument could be made against any of the words you listed. Even guesstimate is in the OED now.
“Suggestive recipes” is a fine notion! Even as a malapropism it reflects the way the chief connotation of epicureanism has changed from pleasure to sensual pleasure to gustatory pleasure.
Although I don’t use it, I think ‘fall’ for autumn has a certain poetry about it (and is also the title of a very beautiful Miles Davis piece (on Nefertiti)). I have no problem with any of the above words either (or Americanisms in general!) although I do feel that ‘guesstimate’ is just wrong, wrong, wrong! It’s ugly, unnecessary, and self-consciously clever (like ‘sheeple’, another neologism I loathe for the same reason). Correct me if I’m wrong, but either you guess something because you don’t have enough data, or you estimate it because you do. There isn’t really an in-between area. It’s a personal thing, I suppose. (I also misspelled ‘language’ in the comment above. Oops!)
Doubtful: I agree with you about fall: I think it’s a lovely word, though I don’t use it either. Guesstimate doesn’t exactly fill me with joy, but neither does it rouse me to rage. (Personal and professional equanimity preclude this!) Like stagflation it’s a fairly cumbersome portmanteau, but I think there’s a place for it. Estimates are usually based on calculation, whereas guesstimates are estimates based more on guesswork than on calculation. A guess could be wildly arbitrary; a guesstimate could not, or at least ought not to be. In other words I think there is an in-between area. But as you say it’s a personal thing – at least to some degree. Everyone has different ideas about what words mean!
It is not possible to fix the point at which the ‘English Language’ stops, along any of these diverging lines.
I think this goes for all languages.
As for word-creations like ‘guesstimate’: The purist in the Sean shares D.E.’s opinion/feelings; on the other hand I do myself enjoy playing with words/language, and sometimes – the more in a certain context – it’s nice and refreshing to create – or hear/read – a ‘new’ word.
The emphasis on sometimes. It’s similar like what we have been discussing, recently..
A nice tiny aprosdokedon (the anexpected) from time is fine; be it an uncommon word, or a ‘creation’. Five aprosdoketa in one sentence are four too much.
By the way, sometimes I am, myself running the risk of what we in German (sic! ha ha ha) use to call “overgag(ging)”.
Conclusion: Sometimes less is more. And simplicity sometimes has its own beauty.
Thanks for another great post, Stan. And to you, D.E. and all commenters to come a simply magic weekend.
‘I think this goes for all languages.’
I imagine it does, Sean.
My attitude to neologisms tends to be informed primarily by their usefulness, and more shallowly their attractiveness. Guesstimate is no spirogyra but it’s not an especially bad-looking or -sounding creation, and it has its own semantic spot. Whether or not five aprosdoketa in one sentence is four too many, I will not answer in good faith until I have read Finnegans Wake!
I just noticed your link to my post stan. My thanks (belated as they are).
But I must say that my inclusion alongside those other gentlemen reminds me of George Gobel’s line from his visit to the Tonight Show with Johnny Carson:
“Did you ever get the feeling that the world was a tuxedo and you were a pair of brown shoes?”
(video available here– http://www.youtube.com/watch?v=vsEkR5WFlw0 )
My pleasure, Michael. I noticed that you had written about Payack a few times; the post I linked to was both a robust and entertaining dismissal of his millionth-word claims, and a handy point from which to visit your other posts on the matter.
Thanks for the clip! I had fun with it. Gobel has some great lines, not least the one you quoted. To re-work an old phrase: if the shoes fit, I’ll link to them.
This goes to show the ay in which the language has changed throughout the ages.
It certainly does, Derek. Languages change continuously, though this fact is sometimes obscured or resisted.
[…] but it is a crude one based on a small data pool, especially given the extent of derivation, the size of English vocabulary, and the significant variation between types of text. Just as I began to wonder in earnest about […]
[…] description recalls James Murray’s earlier characterisation of English as having “no discernible circumference” but a nucleus that “shades off on all sides, through zones of decreasing brightness, to a dim […]