Dr. Joao Paulo Cabral

Visiting Research Fellow, ADAPT

Visiting Research Fellow

cabralj@tcd.ie http://ie.linkedin.com/in/cabraljoao

Biography

João Cabral is a research fellow at Trinity College Dublin, in the School of Computer Science and Statistics, as part of the ADAPT Centre. He received B.Sc. and M.Sc. degrees from Instituto Superior Técnico (IST), Lisbon, Portugal, in Electrical and Computer Engineering, in 2003 and 2006 respectively. He was awarded a Ph.D. degree in Computer Science and Informatics from the University of Edinburgh, U.K., in 2010, funded by a European Commission Marie Curie Fellowship, under the Early Stage Research Training (E.S.T) scheme. Before joining Trinity College Dublin in 2013, he also worked as a postdoctoral research fellow at the University College Dublin, as part of the CNGL research centre, from 2010.

Publications and Further Research Outputs

Peer-Reviewed Publications
Non-Peer-Reviewed Publications

Séamus Lawless, Peter Lavin, Mostafa Bayomi, João P. Cabral and M. Rami Ghorab, Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations, 20th International Conference on Application of Natural Language to Information Systems (NLDB), Passau, germany, June 17-19, edited by Springer , Springer, 2015, pp307 - 320Conference Paper, 2015, DOI , URL
Yuyun, Huang, Christy Elias., Cabral, João P. Cabral, Atul Nautiyal, Christian Saam and Nick Campbell, Towards Classification of Engagement in Human Interaction with Talking Robots, Communications in Computer and Information Science , 17th International Conference on Human-Computer Interaction, Los Angeles, USA, 2-7 August 2015, 528, Springer, 2015, pp741 - 746Published Abstract, 2015, DOI , URL
Elias Christy, João P. Cabral and Nick Campbell, Audio features for the Classification of Engagement, Workshop on Engagement in Social Intelligent Virtual Agents, Delft, Netherlands, 25th August 2015, 2015, pp8 - 12Conference Paper, 2015, URL , TARA - Full Text
João P. Cabral, Yuyun Huang, Christy Elias, Ketong Su and Nick Campbell, Interface for Monitoring of Engagement from Audio-Visual Cues, The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, Vienna, Austria, 11-13 Septmeber, ISCA, 2015Poster, 2015, URL
Eva Vanmassenhove, João P. Cabral, Fasih Haider, Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis, 9th ISCA Workshop on Speech Synthesis, Sunnyvale, CA, USA, 13-15 September, 2016, pp22 - 27Conference Paper, 2016, URL , TARA - Full Text
João P. Cabral, Christian Saam, Eva Vanmassenhove, Stephen Bradley, Fasih Haider, The ADAPT entry to the Blizzard Challenge 2016, Blizzard Challenge 2016 Workshop, Cupertino, CA, USA, 2016Conference Paper, 2016, URL
João P. Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell, The Influence of Synthetic Voice on the Evaluation of a Virtual Character, Interspeech 2017, Stockholm, Sweden, 20-24 August, ISCA, 2017, pp229 - 233Conference Paper, 2017, DOI , URL , TARA - Full Text
Beatriz R. de Medeiros and João P. Cabral, Acoustic distinctions between speech and singing: Is singing acoustically more stable than speech?, Speech Prosody, Poznań, Poland, 13-16 June, 2018, pp542 - 546Conference Paper, 2018, URL , TARA - Full Text
João P. Cabral, Estimation of the Asymmetry Parameter of the Glottal Flow Waveform Using the Electroglottographic Signal, INTERSPEECH 2018, Hyderabad, India, 2-6 Septmeber, 2018Conference Paper, 2018
Leigh Clark, João Cabral, Benjamin Cowan, The CogSIS Project: Examining the Cognitive Effects of Speech Interface Synthesis, British Human Computer Interaction Conference, Belfast, 2-6 July, 2018Conference Paper, 2018, URL , TARA - Full Text
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schögl, Jens Edlund, Matthew Aylett, Cosmin Munteanu, João P. Cabral, and Benjamin R. Cowan, The State of Speech in HCI: Trends, Themes and Challenges, Interacting with Computers, 2019Journal Article, 2019
João P. Cabral and Alexsandro R. Meireles, Transformation of voice quality in singing using glottal source features, Workshop on Speech, Music and Mind 2019 (SMM 2019), Vienna, Austria, 14 September 2019, ISCA, 2019, pp31 - 35Conference Paper, 2019, URL
Benjamin R. Cowan, Philip Doyle, Justin Edwards, Diego Garaialde, Ali Hayes-Brady, Holly P. Branigan, João Cabral, Leigh Clark, What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue, the 1st International Conference on Conversational User Interfaces, Dublin, Ireland, 2019Conference Paper, 2019, URL
Éva Székely Zeeshan Ahmed, João P. Cabral and Julie Carson-Berndsen, WinkTalk: A Demonstration of a Multimodal Speech Synthesis Platform Linking Facial Expressions to Expressive Synthetic Voices, the Third Workshop on Speech and Language Processing for Assistive Technologies, Montreal, Canada, 7 June, Association for Computational Linguistics, 2012, pp5 - 8Conference Paper, 2012, TARA - Full Text
Beatriz Raposo de Medeiros, João Paulo Cabral, Alexsandro R. Meireles, and Andre A. Baceti, A comparative study of fundamental frequency stability between speech and singing, Speech Communication, 128, 2021, p15 - 23Journal Article, 2021, DOI
Katja Zibrek, Joao Cabral, Rachel McDonnell, Does Synthetic Voice Alter Social Response to a Photorealistic Character in Virtual Reality?, Motion, Interaction and Games (MIG), Virtual Event, Switzerland, Association for Computing Machinery, 2021, pp1 - 6Conference Paper, 2021, URL
Darragh Higgins, Katja Zibrek, Joao Cabral, Donal Egan, Rachel McDonnell, Sympathy for the digital: Influence of synthetic voice on affinity, social presence and empathy for photorealistic virtual humans, Computers & Graphics, 2022Journal Article, 2022, URL
João P. Cabral, Nick Campbell, Sree Ganesh, Mina Kheirkhah, Emer Gilmartin, Fasih Haider, Eamonn Kenny, Andrew Murphy, Neasa Ní, Chiaráin, Thomas Pellegrini and Odei Rey Orozko, MILLA: A Multimodal Interactive Language Learning Agent, SemDial 2014, Edinburgh, United Kingdom, September 1st-3rd, 2014Conference Paper, 2014, URL
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Speech Synthesis, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Statistical Parametric Speech Synthesis, 8, (2), 2014, p195 - 208Journal Article, 2014, DOI , URL
Éva Székely, Zeeshan Ahmed, Shannon Hennig, João P. Cabral and Julie Carson-Berndsen, Predicting synthetic voice style from facial expressions. An application for augmented conversations, Speech Communication, 57, 2014, p63 - 75Journal Article, 2014, DOI , URL
João P. Cabral, Uniform Concatenative Excitation Model for Synthesising Speech without Voiced/Unvoiced Classification, INTERSPEECH, Lyon, France, August 2013, edited by International Speech Communication Association (ISCA) , 2013, pp1082 - 1085Conference Paper, 2013, URL
Zeeshan Ahmed and João P. Cabral, HMM-Based Speech Synthesiser For The Urdu Language, Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), Saint Petersburg, Russia, 14 May, 2014, pp92 - 97Conference Paper, 2014, DOI , URL , TARA - Full Text
João P. Cabral and Julie Carson-Berndsen, Towards a Better Representation of Glottal Pulse Shape Characteristics in Modelling the Envelope Modulation of Aspiration Noise, Lecture Notes in Computer Science: Advances in Nonlinear Speech Processing, NOLISP International Conference, Mons, Belgium, 19-21 June, edited by Thomas Drugman and Thierry Dutoit , 7911, 2013, pp67 - 74Conference Paper, 2013, DOI , URL
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron, SAPA - SCALE Conference, Portland, USA, 7-8 September, 2012Conference Paper, 2012, URL
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov Model, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 3-5 July, IEEE, 2012, pp700 - 705Conference Paper, 2012, DOI
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using multilayer perceptron for voicing strength estimation in HMM-based speech synthesis, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 2-5 July, IEEE, 2012, pp683 - 688Conference Paper, 2012, DOI , URL
Amalia Zahra, João P. Cabral, Mark Kane and Julie Carson-Berndsen, Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition Technology, International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, 6-8 June, 2012, pp65 - 69Conference Paper, 2012, URL
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis, International Conference on Speech Prosody, Shanghai, China, 22-25 May, 2012, pp67 - 70Conference Paper, 2012, URL
João P. Cabral, Mark Kane, Zeeshan Ahmed, Mohamed Abou-Zleikha, Éva Székely, Amalia Zahra, Udochukwu Kalu Ogbureke, Peter Cahill, Julie Carson-Berndsen, Stephan Schlögl, Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz, International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, 21-27 May, 2012, pp23 - 25Conference Paper, 2012, URL
Mark Kane, João P. Cabral, Amalia Zahra and Julie Carson-Berndsen, Introducing Difficulty-Levels in Pronunciation Learning, International Speech Communication Association Special Interest Group on Speech and Language Technology in Education (SLaTE), Venice, Italy, 24-26 August, International Speech Communication Association (ISCA), 2011, pp37 - 40Conference Paper, 2011, TARA - Full Text
João P. Cabral, John Kane, Christer Gobl and Julie Carson-Berndsen, Evaluation of glottal epoch detection algorithms on different voice types, INTERSPEECH, Florence, Italy, 28-31 August, International Speech Communication Association (ISCA), 2011, pp1989 - 1992Conference Paper, 2011
Éva Székely, João P. Cabral, Peter Cahill and Julie Carson-Berndsen, Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters, INTERSPEECH, Florence, Italy, International Speech Communication Association (ISCA), 2011, pp2409 - 2412Conference Paper, 2011
João P. Cabral, Steve Renals, Junichi Yamagishi and Korin Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22-27 May, IEEE, 2011, pp4704 - 4707Conference Paper, 2011, DOI , URL
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, An HMM-based speech synthesiser using glottal post-filtering, 7th ISCA Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010, pp365 - 370Conference Paper, 2010, URL
J. Sebastian Andersson, João P. Cabral, Leornado Badino, Junichi Yamagishi and Robert A.J. Clark, Glottal source and prosodic prominence modelling in HMM-based speech synthesis for the Blizzard Challenge 2009, The Blizzard Challenge 2009, Edinburgh, UK, 4 Septmeber, 2009Conference Paper, 2009, URL
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, "HMM-based speech synthesis with an acoustic glottal source model, The First Young Researchers Workshop in Speech Technology, Dublin, Ireland, 25 April, 2009Conference Paper, 2009, TARA - Full Text
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Parametric Speech Synthesis, INTERSPEECH 2008, Brisbane, Australia, 22-26 September, International Speech Communication Association (ISCA), 2008, pp1829 - 1832Conference Paper, 2008, URL
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Towards an Improved Modeling of the Glottal Source in Statistical Parametric Speech Synthesis, 6th ISCA Workshop on Speech Synthesis (SSW6), Bonn, Germany, 22-24 August, International Speech Communication Association (ISCA), 2007, pp113 - 118Conference Paper, 2007, URL
Guilherme Raimundo, João P. Cabral, Celso Melo, Luís C. Oliveira, Ana Paiva, Isabel Trancoso , Telling Stories with a Synthetic Character: Understanding Inter-modalities Relations, Lecture Notes in Computer Science: Verbal and Nonverbal Communication Behaviours, COST Action 2102 International Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, Italy, 29-31 March, 4775, Springer Berlin Heidelberg, 2007, pp310 - 323Conference Paper, 2007, DOI
João P. Cabral and Luís C. Oliveira, EmoVoice: A System to Generate Emotions in Speech, INTERSPEECH, Pittsburgh, USA, 17-21 Septmeber, International Speech Communication Association (ISCA), 2006, pp1798 - 1801Conference Paper, 2006, URL
João P. Cabral, Luís C. Oliveira, Guilherme Raimundo, Ana Paiva, What voice do we expect from a synthetic character?, 11th International Conference on Speech and Computer (SPECOM 2006), St. Petersburg, Russia, 26-29 June, 2006Conference Paper, 2006
João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for Prosodic and Voice Quality Transformations, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1137 - 1140Conference Paper, 2005, URL
João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for High-Frequency Excitation Regeneration, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1513 - 1516Conference Paper, 2005, URL
João Paulo Cabral, HMM-based Speech Synthesis Using an Acoustic Glottal Source Model, The University of Edinburgh, 2010Thesis, 2010, URL
João P. Cabral, Transforming Prosody and Voice Quality to Generate Emotions in Speech, Instituto Superior Técnico (IST), 2006Thesis, 2006, URL
João P. Cabral, Evaluation of Methods for Excitation Regeneration in Bandwidth Extension of Speech, Instituto Superior Técnico (IST) and Royal Institute of Technology (KTH), 2003Thesis, 2003

Peter Cahill, Udochukwu Ogbureke, Jo ̃ao Cabral, Eva Szekely,Mohamed Abou-Zleikha, Zeeshan Ahmed and Julie Carson-Berndsen, UCD Blizzard Challenge 2011 Entry, Blizzard Challenge Workshop 2011, Turin, Italy, 2 September, 2011Conference Paper

Research Expertise

My main research work in on Text-To-Speech synthesis (TTS) and development of innovative commercial applications of this research, such as expressive AI voices for Audiobooks, Spoken Dialogue Systems, and Animation. I'm also interested in analysis of emotion and affect in speech. I've great expertise in analysis and modelling of glottal source parameters. These features are important in TTS for better transforming the type of voice, such as breathy or tense voices, and emotions. Other areas of expertise include speech signal processing, statistical learning algorithms for speech processing and deep learning.

Title

CogSIS - Cognitive Effects of Speech Interface Synthesis

Summary

Through the growth of intelligent personal assistants, pervasive and wearable computing and robot based technologies speech interfaces are set to become a common dialogue partner. Technological challenges around the production of natural synthetic voices have been widely researched. Yet comparatively little is understood about how synthesis affects user experience, in particular how design decisions around naturalness (e.g. accent used and expressivity) impact the assumptions we make about speech interfaces as communicative actors (i.e. our partner models). Our ground-breaking project examines the psychological consequences of synthesis design decisions on the relationship between humans and speech technology. It fuses knowledge, concepts and methods from psycholinguistics, experimental psychology, and human-computer interaction (e.g. perspective taking and partner modelling research in human-human dialogue, controlled experiments, questionnaires) and speech technology (generation of natural speech synthesis) to 1) understand how synthesis design choices, specifically accent and expressivity, impact a user's partner model, 2) how these choices interact with context and 3) impact language production.

Funding Agency

Irish Research Council

Date From

2017

Date To

2018
Title

Expressive Speech Synthesis: VoiceTune

Summary

Research project to develop expressive Text-to-Speech commercial applications for industry. The project aims to validate prototype product/service and commercial value to companies that need AI expressive voice solutions.

Funding Agency

Enterprise Ireland

Date From

2020

Date To

2022
Title

Production, Perception and Cognition in the interception between speech and singing

Summary

The issue raised in this project is that although we know how to intuitively distinguish between speech and singing, there are portions of each in which one perceives the coexistence of both and this suggests that there is a gradation, more than an abrupt change in phonation and other aspects of speech. The aim of this research project is to focus on some aspects of production and perception of speech and singing in order to answer the question: Are speech and singing completely different phenomena? Experimental studies are conducted that include: collection of a corpus of spoken and singing data, measurements of acoustic differences between the two types of data, a perception test which aims to designate the presented stimulus as speaking or singing, and to use machine learning to further study the acoustic differences between the two sound categories. The results are analysed by taking into account cognitive aspects of speech and song.

Funding Agency

São Paulo Research Foundation (FAPESP)

Date From

2016

Date To

2017

Communication engineering and systems; telecommunications, Computer science - Artificial Intelligence,

Recognition

Memberships

Member of the Marie Curie Fellows Association (MCFA)
Member of the International Speech Communication Association (ISCA)