Skip to main content

Communication Beyond Words: Grounding Visual Body Motion with Spoken Language

Time: Fri 2021-04-09 15.15

Location: Zoom

Participating: Chaitanya Ahuja

Presenter: Chaitanya Ahuja
Affiliation: Language Technologies Institute, Carnegie Mellon University

Communication is essential in sharing knowledge and ideas. It encourages
collaboration and teamwork, an important step towards inducing positive
change in human societies. It is also a key building block for forging
new relationships through self-expression as well as understanding
others' emotions and thoughts. Communication is often categorized with
verbal and nonverbal messages where nonverbal includes both vocal (e.g.
prosody) and visual modalities (e.g. hand gestures, facial expressions).
These three modalities have a fruitful and complex relationship with
each other when communicating. Evolving technologies for online
communication such as virtual reality have created a need for generating
high-fidelity nonverbal communication along with verbal and vocal cues
(e.g. communication in a virtual space). One key communicative cue is
visual body motions which can express a wide range of messages across
arms, hands, gait, physical skills (such as jumping, running, and so on)
and interaction with the environment. Body motions also include gestures
that accompany spoken language. These co-speech gestures allow speakers
to articulate the intent and express emphasis.

The central theme of this talk is to understand the relationships
(a.k.a. grounding) between human body motions and its associated spoken
language, which includes both verbal and vocal cues. Understanding this
complex relationship will help us to both better understand the meaning
intended by body gestures and provide us with the knowledge necessary to
generate more realistic nonverbal body animations with interactive
technologies. With these motivations in mind, we study Nonverbal
Grounding in the context of two key challenges (1) Few-shot Learning
with Long-Tail Distributions when gestures occur infrequently or the
amount of labelled data is limited and often unbalanced, and (2) Gesture
Style to better understand idiosyncrasies and commonalities on how
people gesture. These challenges investigate the commonalities,
uniqueness and generalizability of visual body communication
respectively in the presence of verbal and vocal information.

Chaitanya Ahuja is a final-year PhD student advised by Louis-Philippe
Morency in the Language Technologies Institute, School of Computer
Science at Carnegie Mellon University. He is a member of the Multicomp
Lab and works on grounding human nonverbal behaviour in Speech, and
Language. He is one of the creators of PATS which is a large benchmark
for studying the relationships between hand gestures and spoken
language. Previously, he graduated with a B.Tech in Electrical
Engineering from the Indian Institute of Technology, Kanpur. Given the
multidisciplinary nature of his work, he has published in diverse
conferences domains including Computer Vision (ECCV, 3DV), Natural
Language Processing (EMNLP), Multimodal Interaction (ICMI) and
Artificial Intelligence (AAAI). You can find more about him on his
personal webpage:

Belongs to: Speech, Music and Hearing
Last changed: Mar 29, 2021