Anna Hjalmarsson - making dialogue systems more human

Producing dialogue systems that behave like, and can interact with, humans may sound easy enough. But once you start delving into the practical details of producing such a system, you soon discover the complexities involved. Anna Hjalmarsson, researcher at the department of Speech, Music and Hearing, is aware of how complex such a system may be.

It was a lecture by researchers from KTH Royal Institute of Technology at Linköping university on speech technology that initially attracted Anna Hjalmarssons’s interest and led her to pursue research on machine-based dialogue systems.

“The subject had most or all of the components of what I was interested and educated in, such as cognitive science, psychology, linguistics and computer science. This made for a good match when I joined the cross-disciplinary department of Speech, Music and Hearing”, she says.

Her doctoral thesis dealt with the objectives of creating a dialogue system, how a dialogue system behaves, but also how it could and should behave in order to mimic human behaviour.

She has studied conversational prosody, our ability to vary speech with the help of intonation, rhythm etc. Anna Hjalmarsson now studies pauses in dialogue systems. Among other things, this covers what is known as filled pauses, that is sections in dialogues where humans will use expressions such as uh, hm, ah. The research group specifically focuses on the interactional aspects of communication, for example how to manage turn-taking between the system and the user, and also learning more about interaction between humans, generally.

Anna Hjalmarsson points out that it is one thing to make a dialogue system answer questions or provide information. And already, there are systems in use to perform these tasks, for example for selling trips, perform automatic call-routing and also in computer games.

Another aspect that has received less attention is the interactional aspect of those interfaces, that is characteristics such as such as turn taking, facial expressions, nods, speech pauses etc.

“Actually, how you say something can sometimes be just as important as what you say.”

Research has shown that a human user will more readily provide feedback to an artificial face, on a screen for example, if the face is gazing toward the human user, than if the gaze is turned to the side.

Humans are experienced speakers and system designers can benefit from this experience. A dialogue system with human capabilities may encourage users to transfer some of the knowledge gained from their long experience in human communication. More human-like dialogue systems may also often be perceived to be more intuitive and engaging to talk to.

“Many of the things we try to make dialogue systems do are already automated in humans. That’s why we find it difficult to explain how they are done, so that they can be replicated by non-humans” says Anna.

There are many different ways to create a good dialogue system. One way is to use a speech corpus, such as Spontal, a database created at the department of Speech, Music and Hearing at KTH Royal Institute of Technology. It contains a large amount of recorded speech from different people. Such a corpus can then be analysed in terms of prosody, contents etc. to provide researchers with clues for how to develop dialogue systems.

Another way is to use a so-called eye-tracker on conversation spectators, to investigate whom they choose to watch in a visually recorded dialogue. As a rule, it is the person who is speaking that gets most focus from spectators.

So how similar to humans can dialogue systems become? According to Hjalmarsson, this depends on what the systems are expected to do. The more limited in scope, the more similar to a human you can make a system.

“For example, it would be possible to create a system that is specialised on filled pauses and make it almost indistinguishable from humans”, she says.

But dialogue systems are still a long way off from humans in terms of interaction. For example, human interaction involves people adapt to each other, in terms of facial expressions and vocabulary, especially if it goes on for a longer period.

Research continues. Anna Hjalmarsson envisions adding more aspects of human conversation into the creation of a dialogue system. In the longer term, this may include adding physical features, such as the robotic head FurHat.

Emma Bayne

Top page top