End-to-end listening agent for audiovisual emotional and naturalistic interactions

Kevin El Haddad, Yara Rizk, Louise Heron, Nadine Hajj, Yong Zhao, Jaebok Kim, Ngô Trọng Trung, Minha Lee, Marwan Doumit, Payton Lin, Yelin Kim, Hüseyin Çakmak

Research output: Contribution to journalArticlepeer-review


In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.

Original languageEnglish
Pages (from-to)49-61
Number of pages13
JournalJournal of Science and Technology of the Arts
Issue number2
Publication statusPublished - 8 Nov 2018


  • Dyadic conversation database
  • Emotion database
  • Eyebrow movement
  • Head movement
  • Laughter
  • Listening agent
  • Multimodal synthesis
  • Nonverbal expression detection
  • Nonverbal expression synthesis
  • Sequence-to-sequence prediction systems
  • Smile
  • Speech emotion recognition

ASJC Scopus subject areas

  • Visual Arts and Performing Arts
  • Arts and Humanities (miscellaneous)
  • Music
  • Computer Science Applications


Dive into the research topics of 'End-to-end listening agent for audiovisual emotional and naturalistic interactions'. Together they form a unique fingerprint.

Cite this