‘Smuggle plot tomatoes’ and other distant compounds

June 27, 2012

I’ve written before about noun pileups, where nouns pile up to form strange or baffling strings, typically in headlines, such as “Slough sausage choke baby death woman jailed”. Some, like “Ben Douglas Bafta race row hairdresser James Brown ‘sorry’”, are almost parse-proof.

There are also noun compounds that don’t grow to great length, but still manage to be obscure unless you’re already following the story they relate to. Today’s BBC News website contains the following headline:

Read the rest of this entry »


News website headline noun pile-up amusement

June 6, 2010

A common characteristic of headlinese — the form of English used in news headlines — is the presence of noun pile-ups (aka noun piles, noun clusters, noun stacks, etc.). The BBC excels at them, offering many modest pile-ups every day, and occasionally a more eye-catching example.

Browsing the site yesterday, I saw among its “most read” stories a teaser that was simultaneously horrific and hilarious: “Sausage baby death woman jailed”. Upon clicking through, the headline grew, and became a little clearer:

The syntax may be dubious, but the sad and gruesome gist of the “Slough sausage choke baby death woman jailed” story is easy to guess from these seven words, and the bizarre juxtapositions in this keyword-heavy phrase probably enticed a few readers who might not have been tempted by a blander headline (or “hed”, in journalist jargon). It may seem like tabloidese, but the convention is well established in reputable news agencies, especially with developing stories that presume some familiarity on the part of the reader.

The press on this side of the Atlantic indulge in headline noun pile-ups much more than their American counterparts. Headsup: the blog, which monitors these matters closely, observed that “British hed writers can pull an attributive noun across a lot more barriers than we can [in the U.S.]”, and notes an exception in the Rupert-Murdoch-owned foxnews.com, which offers such far-out formulations as “Pregnant frying pan attack teen surrenders”. Language Log considers the transatlantic difference a sociological and linguistic puzzle.

You might say, hed noun pile-up geography difference puzzle.


37 per cent of my favourite things

May 24, 2010

Have you ever wondered what proportion of words are nouns, verbs, prepositions, adjectives and adverbs? According to the word specialists at Oxford Dictionaries,

The Second Edition of the Oxford English Dictionary contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. Over half of these words are nouns, about a quarter adjectives, and about a seventh verbs; the rest is made up of interjections, conjunctions, prepositions, suffixes, etc. These figures take no account of entries with senses for different parts of speech (such as noun and adjective).

This is an interesting estimate, but it is a crude one based on a small data pool, especially given the extent of derivation, the size of English vocabulary, and the significant variation between types of text. Just as I began to wonder in earnest about all this, I spotted a tantalising datum in A Student’s Introduction to English Grammar:

In any language, the nouns make up by far the largest category in terms of number of dictionary entries, and in texts we find more nouns than words of any other category (about 37 per cent of the words in almost any text).

Evidently there had been systematic investigation. I soon found what I presume was the authors’ source: “About 37% of Word-Tokens are Nouns”, a paper by Richard Hudson that appeared in Language in 1994. Hudson, an emeritus professor of linguistics at University College London, examined two major corpora and related studies: the Brown University corpus of a million words of written American English, and the Lancaster–Oslo–Bergen (LOB) corpus of a million words of British English.

Both corpora divide texts into the same 15 genre categories — press reportage, science fiction, religion, learned & scientific writings, humour, mystery & detective, etc. — and tag words according to their word class. This enabled Hudson to cross-compare, by category, between the corpora. The genres can readily be arranged into two self-explanatory supergenres that Hudson calls “informational” and “imaginative”. He observes that “the similarities are sufficiently striking to suggest an underlying constancy”:

More detailed examination of the word types in genre categories reveals that the proportion of nouns varies, as we would expect it to, but only between 33% (in learned & scientific writings) and a more “nounful” 42% (in press reportage). Hudson speculates on possible causes for the observable patterns and variations, notes that the generalisation in his paper’s title is an “oversimplification of a system which is complex but quite regular”, and describes the trends as “facts . . . in search of a theory”.

Much to my delight, the paper also contains data on the proportions of other word classes in written and spoken English and other languages from various sources, including children’s speech at different ages:

(P stands for prepositions, cN for common nouns, nN for proper nouns, pN for pronouns, V for verbs, Adj for adjectives, and Adv for adverbs.)

Hudson concludes that language has “regularities which involve the statistical probability of any randomly selected word belonging to a particular word-class”. I haven’t looked for follow-up studies yet, so I don’t know what has been made of these and similar data since 1994, but it’s a fascinating paper in its own right.


Follow

Get every new post delivered to your Inbox.

Join 5,697 other followers