Automatic audio driven animation of non-verbal actions

D. Cosker, C. Holt, D. Mason, G. Whatling, D. Marshall, P.L. Rosin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While speech driven animation for lip-synching and facial expression synthesis from speech has previously received much attention, there is no previous work on generating non-verbal actions such as laughing and crying automatically from an audio signal. In this article initial results on a system designed to address this issue are presented. 3D facial data is recorded for a participant making different actions-i.e. laughing, crying, yawning and sneezing-using a Qualysis (Sweden) optical motion-capture system while simultaneously recording audio data. 30 retro-reflective markers were placed on the participant's face to capture movement. Using this data, an analysis and synthesis machine was then trained consisting of a dual-input Hidden Markov Model (HMM) and a trellis search algorithm which converts HMM visual states and new input audio into new 3D motion-capture data.
Original languageEnglish
Title of host publicationIET 4th European Conference on Visual Media Production (CVMP 2007)
PublisherIET
Pages16
DOIs
Publication statusPublished - 2007
EventIET 4th European Conference on Visual Media Production - London, UK United Kingdom
Duration: 27 Nov 200728 Nov 2007

Conference

ConferenceIET 4th European Conference on Visual Media Production
CountryUK United Kingdom
CityLondon
Period27/11/0728/11/07

Fingerprint

Hidden Markov models
Animation
Audio recordings
Data acquisition

Cite this

Cosker, D., Holt, C., Mason, D., Whatling, G., Marshall, D., & Rosin, P. L. (2007). Automatic audio driven animation of non-verbal actions. In IET 4th European Conference on Visual Media Production (CVMP 2007) (pp. 16). IET. https://doi.org/10.1049/cp:20070048

Automatic audio driven animation of non-verbal actions. / Cosker, D.; Holt, C.; Mason, D.; Whatling, G.; Marshall, D.; Rosin, P.L.

IET 4th European Conference on Visual Media Production (CVMP 2007). IET, 2007. p. 16.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cosker, D, Holt, C, Mason, D, Whatling, G, Marshall, D & Rosin, PL 2007, Automatic audio driven animation of non-verbal actions. in IET 4th European Conference on Visual Media Production (CVMP 2007). IET, pp. 16, IET 4th European Conference on Visual Media Production, London, UK United Kingdom, 27/11/07. https://doi.org/10.1049/cp:20070048
Cosker D, Holt C, Mason D, Whatling G, Marshall D, Rosin PL. Automatic audio driven animation of non-verbal actions. In IET 4th European Conference on Visual Media Production (CVMP 2007). IET. 2007. p. 16 https://doi.org/10.1049/cp:20070048
Cosker, D. ; Holt, C. ; Mason, D. ; Whatling, G. ; Marshall, D. ; Rosin, P.L. / Automatic audio driven animation of non-verbal actions. IET 4th European Conference on Visual Media Production (CVMP 2007). IET, 2007. pp. 16
@inproceedings{8c25b4f6f83442f1a9e9c2679108812a,
title = "Automatic audio driven animation of non-verbal actions",
abstract = "While speech driven animation for lip-synching and facial expression synthesis from speech has previously received much attention, there is no previous work on generating non-verbal actions such as laughing and crying automatically from an audio signal. In this article initial results on a system designed to address this issue are presented. 3D facial data is recorded for a participant making different actions-i.e. laughing, crying, yawning and sneezing-using a Qualysis (Sweden) optical motion-capture system while simultaneously recording audio data. 30 retro-reflective markers were placed on the participant's face to capture movement. Using this data, an analysis and synthesis machine was then trained consisting of a dual-input Hidden Markov Model (HMM) and a trellis search algorithm which converts HMM visual states and new input audio into new 3D motion-capture data.",
author = "D. Cosker and C. Holt and D. Mason and G. Whatling and D. Marshall and P.L. Rosin",
year = "2007",
doi = "10.1049/cp:20070048",
language = "English",
pages = "16",
booktitle = "IET 4th European Conference on Visual Media Production (CVMP 2007)",
publisher = "IET",

}

TY - GEN

T1 - Automatic audio driven animation of non-verbal actions

AU - Cosker, D.

AU - Holt, C.

AU - Mason, D.

AU - Whatling, G.

AU - Marshall, D.

AU - Rosin, P.L.

PY - 2007

Y1 - 2007

N2 - While speech driven animation for lip-synching and facial expression synthesis from speech has previously received much attention, there is no previous work on generating non-verbal actions such as laughing and crying automatically from an audio signal. In this article initial results on a system designed to address this issue are presented. 3D facial data is recorded for a participant making different actions-i.e. laughing, crying, yawning and sneezing-using a Qualysis (Sweden) optical motion-capture system while simultaneously recording audio data. 30 retro-reflective markers were placed on the participant's face to capture movement. Using this data, an analysis and synthesis machine was then trained consisting of a dual-input Hidden Markov Model (HMM) and a trellis search algorithm which converts HMM visual states and new input audio into new 3D motion-capture data.

AB - While speech driven animation for lip-synching and facial expression synthesis from speech has previously received much attention, there is no previous work on generating non-verbal actions such as laughing and crying automatically from an audio signal. In this article initial results on a system designed to address this issue are presented. 3D facial data is recorded for a participant making different actions-i.e. laughing, crying, yawning and sneezing-using a Qualysis (Sweden) optical motion-capture system while simultaneously recording audio data. 30 retro-reflective markers were placed on the participant's face to capture movement. Using this data, an analysis and synthesis machine was then trained consisting of a dual-input Hidden Markov Model (HMM) and a trellis search algorithm which converts HMM visual states and new input audio into new 3D motion-capture data.

UR - http://www.scopus.com/inward/record.url?scp=84868998652&partnerID=8YFLogxK

UR - http://dx.doi.org/10.1049/cp:20070048

U2 - 10.1049/cp:20070048

DO - 10.1049/cp:20070048

M3 - Conference contribution

SP - 16

BT - IET 4th European Conference on Visual Media Production (CVMP 2007)

PB - IET

ER -