Innovation Newsletter Issue 31: Spring 2019

A database full of emotion

The largest validated multimodal database of emotional speech and song in the world was created here at Ryerson and first published in spring 2018. Since then, the database has drawn the attention of not only researchers, but also companies, leading to the start of the commercialization process.

SMART Lab director and psychology professor, Frank Russo, along with former lab postdoctoral fellow, Steven Livingstone, a computer scientist and now a computer science professor at the University of Otago, New Zealand, spent years developing and designing the database. In it, 24 actors use North American English to express emotions – like calm, happy or angry – in speech and song at different levels of emotional intensity. The database of 7,356 video and audio recordings has been validated by participants who were asked to label the emotions presented.

The database was named the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and it was introduced in a paper published in the online journal PLOS ONE, which is available via open access for researchers. Industrial interest soon followed from companies hoping to use the database for a variety of purposes, like training people in security positions to recognize emotion or for use in training artificial intelligence.

"Together, we realized there's a big demand for these kinds of things on the research side, and we'll have use for this database for years to come and so will our colleagues. But we also realized there might be an industry demand for this too," professor Russo said. "We knew the work that we were planning over the next few years, and Steven and I thought it would be a good investment of time and resources to build the biggest database on the planet and validate it."

The development of the database was inspired in part by the pair's experiences having to build emotion databases for previous projects. Some of professor Russo's work includes applied research on rehabilitative technologies to help improve vocal-facial emotion perception and production using music-based interventions. In particular, professor Russo has explored the impact of singing for communities where emotion recognition and production can be a challenge, such as among people with hearing loss or Parkinson's disease.

Funding for this project was provided by the Natural Sciences and Engineering Research Council of Canada, the Hear the World Research Chair in Music and Emotional Speech from Phonak, and the Ontario Ministry of Research, Innovation and Science.