A thousand monkeys typing Hamlet: what does “good” mean?

Nick Fisher

4 October 2018

hamlet

At Lexico, we’re really focused on automating paperwork. Our goal is pretty simple – we want people to focus on questions and answers, not typing and editing.

As you’d expect, we’re looking at how machine learning can help do this. This means we’re constantly experimenting with natural language models – generating sentences, guessing at answers, filling in the blanks, just to name a few. We’re not releasing anything into production just yet, but we’re convinced this will play a huge role in the future workplace.

You’re probably familiar with the trope – a hall of monkeys, madly tapping away on typewriters, eventually coming up with Hamlet. Many of us would have encountered the image in any number of places, from The Simpsons to The Hitchhiker’s Guide to the Galaxy, despite having its origin in statistical mechanics.

One thing we’d never even considered was the automatic generation of poetry, which we came across at a conference in Melbourne earlier this year.

Lexico definitely won’t be releasing a poetry generator any time soon. But there are interesting similarities beneath the surface.

The creators of the poetry generator focused on form: rhyme, rhythm and meter. From a technical perspective, looking at the metrics they focused on in building the models, the algorithms performed very well. A “crowd” evaluation by lay people (i.e. not poetry experts) couldn’t tell the automated poetry from that authored by breathing humans – it turns out most people just focus on rhyme. An expert in the form of a Professor of English literature, though, found the machine-generated work to “lack readability and emotion”.

While our domain is certainly different from poetry, the underlying question is definitely valid – what does it mean for a piece of language to be “good”?

Routine paperwork is probably easier to assess. It aims to convey information, so it can be said to succeed as long as the grammar is correct and the meaning comes across accurately (which is easier said than done, based on some documentation we’ve come across). Of course, there are more and less elegant ways to do anything, including paperwork, but there aren’t many people who go around criticizing the literary style and emotional impact of their superannuation form.

But poetry is a subjective art. We consider how – and how much – it makes us feel, which is an intensely personal experience. Who’s to say what’s “good”?

How can we really be confident that the outputs are correct? What does “correct” even mean in this context?

Going beyond poetry, there’s a critical question of ethics if we want to rely on automated systems for high-impact tasks: running dangerous manufacturing facilities, assessing probabilities of certain outcomes for insurance policies, sentencing convicted offenders, filtering potential candidates for jobs. There’s real difficulty in evaluating whether the results fit within a human conception of “right” in these contexts, where deep learning and big data challenge human understanding.

It is in reaction to this that we see efforts to bring more elusive, emotional human traits into AI. The poetic algorithms mentioned above were trained on 3000 sonnets written by humans. Facebook is working to develop human contextual understanding in the algorithms it is introducing to its content moderation processes – systems which could spare humans from exposure to images of atrocities and graphically explicit material. And the Allen Institute for Artificial Intelligence is pursuing a research initiative named Project Alexandra, which aims to introduce to AI a fundamentally human trait: common sense.

Lexico isn’t going to pass an emotional Turing Test any time soon. It won’t need to – after all, paperwork isn’t a particularly emotional experience, unless you’re talking about frustration. But we’re definitely focused on getting the right words in the right places – and ideally, we won’t need a thousand monkeys to do so.