The OED Text Visualizer is an amazing new research tool from OED Labs based on a powerful data engine that automatically annotates text. The Visualizer displays etymological information in an attractive visual format that can ‘open up new areas of questioning and means of discovery’.
It works like this: Paste up to 500 words into the box on this page, add the text’s date, click the button, and you get an instant display of word origins, helpfully colour-coordinated, along a 1,000-year timeline.
Here’s what I got with the first eight paragraphs of my post on the word culchie:
[click to embiggen]
Each bubble represents a word in the inputted text, its size proportionate to its frequency in the text. When you hover the cursor over a bubble, you get information about the word. The x-axis is a timeline from Old English to today, and the y-axis shows a word’s frequency in modern English on a logarithmic scale.
The Germanic clump in the top left are the, be, and, to, and in. The big blue words are of, or, a, its, and Irish – this one an obvious indicator of the topic of my text. The Latin cluster from the 15th century on have a scholarly, multisyllabic flavour: significance, dictionary, equivalent, indicate, speculate, synonym, connotation. Purple was a good choice.
The sole word identified as Celtic is bog. Culchie is unquestionably Irish, but the Annotator categorizes it as ‘place name’, hence the big yellow bubble in the bottom right. Below the display are fuller breakdowns of each word in tables of tokens and lexemes; these can be exported at a click as CSV or JSON files:
The OED says a fully optimized version of the Text Visualizer – which is currently in beta – will be along soon. In the meantime it invites feedback: ‘You are welcome to trial different types of text, play with the visualization and raw data, test out the tool functionality, and then share your thoughts.’
I highly recommend giving it a spin. It’s a fun, fascinating, and intuitive tool.
I’m trying to understand their distinction between Germanic and English and Latin and Romance. The former is explained as ‘Germanic for inherited words … and English for all internal formations (e.g. compounds, derivatives, shortenings, etc.)’ but that doesn’t leave me any the wiser. How is a compound of Germanic words not Germanic?
I think it has to do with the ‘language of immediate origin’, which is what the colours indicate, i.e. ‘the language from which the word has been either inherited or borrowed or within which it has been formed’. So a compound of Germanic words that was formed – compounded – in English gets the ‘English’ tag. But if you find it unclear, I’m sure they’d appreciate your feedback.
It was discussed at the Hattery beginning here, and the commenter who brought it up wrote:
Then the commenter quoted a bit from the Visualizer page that says: “this will be Germanic for inherited words, a variety of foreign languages for borrowings, and English for all internal formations (e.g. compounds, derivatives, shortenings, etc.),” and I said “Ah, so of is “English” because it’s a low-stress variant of æf. Hm.” Still seems odd to me; your thoughts?
Ah, I hadn’t seen that exchange on your site – thank you. Given that the matter is confusing or unclear to several users, and to people with a background in language, no less, it would seem to warrant better explanation on the Visualizer page. Whether it needs a deeper redirection, I can’t say, and I don’t have enough expertise in etymology to weigh in with any authority on the finer points of Old English derivation.
Fascinating! Thanks for sharing.
I can see how the visualisation of your ‘cluchie’ might throw up questions that might not otherwise occur – is there a reason for the gap between 1675 and 1750, for example?
I tried Shakespeare’s ‘plea for strangers’. Clicking between the frequency of usage in 1600 and modern day English shows remarkably little difference (though I guess the log scale might be slightly misleading here). I wonder if this reflects the immediacy Shakespeare intended, as opposed to some of his more poetic passages.
* ‘culchie’ not ‘cluchie’ . . .
Glad you’re enjoying it, Edward. I don’t know if there’s a reason for the gap – if relatively fewer words were coined in that period – or if it’s just an arbitrary outcome of the particular text I used.