Joao Paulo Cabral
Research Fellow, Computer Science

Biography

João Cabral is a research fellow at Trinity College Dublin, in the School of Computer Science and Statistics, as part of the ADAPT Centre. He received B.Sc. and M.Sc. degrees from Instituto Superior Técnico (IST), Lisbon, Portugal, in Electrical and Computer Engineering, in 2003 and 2006 respectively. He spent the final year of his B.Sc. at the Royal Institute of Technology (KTH), Sweden, under the programme Socrates-Erasmus, where he started working in speech signal processing funded by the Department of Signals, Sensors and Systems. In his MSc he developed the Pitch-Synchronous Time-Scaling (PSTS) algorithm which permits to transform glottal parameters by manipulating the estimated source signal in the time-domain. PSTS is a great contribution to high-quality voice conversion, e.g. applied to emotions in the EmoVoice system (J.P. Cabral and L. C. Oliveira, 2006) or speech recognition of children's speech (Shweta Ghai and Rohit Sinha, 2011) He was awarded a Ph.D. degree in Computer Science and Informatics from the University of Edinburgh, U.K., in 2010, funded by a European Commission Marie Curie Fellowship, under the Early Stage Research Training (E.S.T) scheme. His Ph.D. thesis contributed with the novel integration of an acoustic glottal source model in HMM-based speech synthesis, for improvement of speech quality and control over voice characteristics. Before joining Trinity College Dublin in 2013, he also worked as a postdoctoral research fellow at the University College Dublin, as part of the CNGL research centre, from 2010.

Publications and Further Research Outputs

Peer-Reviewed Publications

Beatriz R. de Medeiros and João P. Cabral, Acoustic distinctions between speech and singing: Is singing acoustically more stable than speech?, Speech Prosody, Poznań, Poland, 13-16 June, 2018 Conference Paper, 2018

João P. Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell, The Influence of Synthetic Voice on the Evaluation of a Virtual Character, Interspeech 2017, Stockholm, Sweden, 20-24 August, ISCA, 2017, pp229 - 233 Conference Paper, 2017 DOI URL TARA - Full Text

João P. Cabral, Christian Saam, Eva Vanmassenhove, Stephen Bradley, Fasih Haider, The ADAPT entry to the Blizzard Challenge 2016, Blizzard Challenge 2016 Workshop, Cupertino, CA, USA, 2016 Conference Paper, 2016 URL

Eva Vanmassenhove, João P. Cabral, Fasih Haider, Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis, 9th ISCA Workshop on Speech Synthesis, Sunnyvale, CA, USA, 13-15 September, 2016, pp22 - 27 Conference Paper, 2016 TARA - Full Text URL

João P. Cabral, Yuyun Huang, Christy Elias, Ketong Su and Nick Campbell, Interface for Monitoring of Engagement from Audio-Visual Cues, The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, Vienna, Austria, 11-13 Septmeber, ISCA, 2015 Poster, 2015 URL

Yuyun, Huang, Christy Elias., Cabral, João P. Cabral, Atul Nautiyal, Christian Saam and Nick Campbell, Towards Classification of Engagement in Human Interaction with Talking Robots, Communications in Computer and Information Science , 17th International Conference on Human-Computer Interaction, Los Angeles, USA, 2-7 August 2015, 528, Springer, 2015, pp741 - 746 Published Abstract, 2015 URL DOI

Séamus Lawless, Peter Lavin, Mostafa Bayomi, João P. Cabral and M. Rami Ghorab, Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations, 20th International Conference on Application of Natural Language to Information Systems (NLDB), Passau, germany, June 17-19, edited by Springer , Springer, 2015, pp307 - 320 Conference Paper, 2015 URL DOI

Elias Christy, João P. Cabral and Nick Campbell, Audio features for the Classification of Engagement, Workshop on Engagement in Social Intelligent Virtual Agents, Delft, Netherlands, 25th August 2015, 2015, pp8 - 12 Conference Paper, 2015 TARA - Full Text URL

João P. Cabral and Zeeshan Ahmed, HMM-Based Speech Synthesiser For The Urdu Language, Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), Saint Petersburg, Russia, 14 May, 2014, pp92 - 97 Conference Paper, 2014 DOI URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Speech Synthesis, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Statistical Parametric Speech Synthesis, 8, (2), 2014, p195 - 208 Journal Article, 2014 URL DOI

Éva Székely, Zeeshan Ahmed, Shannon Hennig, João P. Cabral and Julie Carson-Berndsen, Predicting synthetic voice style from facial expressions. An application for augmented conversations, Speech Communication, 57, 2014, p63 - 75 Journal Article, 2014 URL DOI

João P. Cabral, Nick Campbell, Sree Ganesh, Mina Kheirkhah, Emer Gilmartin, Fasih Haider, Eamonn Kenny, Andrew Murphy, Neasa Ní, Chiaráin, Thomas Pellegrini and Odei Rey Orozko, MILLA: A Multimodal Interactive Language Learning Agent, SemDial 2014, Edinburgh, United Kingdom, September 1st-3rd, 2014 Conference Paper, 2014 URL

João P. Cabral and Julie Carson-Berndsen, Towards a Better Representation of Glottal Pulse Shape Characteristics in Modelling the Envelope Modulation of Aspiration Noise, Lecture Notes in Computer Science: Advances in Nonlinear Speech Processing, NOLISP International Conference, Mons, Belgium, 19-21 June, edited by Thomas Drugman and Thierry Dutoit , 7911, 2013, pp67 - 74 Conference Paper, 2013 URL DOI

João P. Cabral, Uniform Concatenative Excitation Model for Synthesising Speech without Voiced/Unvoiced Classification, INTERSPEECH, Lyon, France, August 2013, edited by International Speech Communication Association (ISCA) , 2013, pp1082 - 1085 Conference Paper, 2013 URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov Model, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 3-5 July, IEEE, 2012, pp700 - 705 Conference Paper, 2012 DOI

Amalia Zahra, João P. Cabral, Mark Kane and Julie Carson-Berndsen, Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition Technology, International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, 6-8 June, 2012, pp65 - 69 Conference Paper, 2012 URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis, International Conference on Speech Prosody, Shanghai, China, 22-25 May, 2012, pp67 - 70 Conference Paper, 2012 URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron, SAPA - SCALE Conference, Portland, USA, 7-8 September, 2012 Conference Paper, 2012 URL

João P. Cabral, Mark Kane, Zeeshan Ahmed, Mohamed Abou-Zleikha, Éva Székely, Amalia Zahra, Udochukwu Kalu Ogbureke, Peter Cahill, Julie Carson-Berndsen, Stephan Schlögl, Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz, International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, 21-27 May, 2012, pp23 - 25 Conference Paper, 2012 URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using multilayer perceptron for voicing strength estimation in HMM-based speech synthesis, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 2-5 July, IEEE, 2012, pp683 - 688 Conference Paper, 2012 DOI URL

Éva Székely, João P. Cabral, Peter Cahill and Julie Carson-Berndsen, Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters, INTERSPEECH, Florence, Italy, International Speech Communication Association (ISCA), 2011, pp2409 - 2412 Conference Paper, 2011

Mark Kane, João P. Cabral, Amalia Zahra and Julie Carson-Berndsen, Introducing Difficulty-Levels in Pronunciation Learning, International Speech Communication Association Special Interest Group on Speech and Language Technology in Education (SLaTE), Venice, Italy, 24-26 August, International Speech Communication Association (ISCA), 2011, pp37 - 40 Conference Paper, 2011

João P. Cabral, John Kane, Christer Gobl and Julie Carson-Berndsen, Evaluation of glottal epoch detection algorithms on different voice types, INTERSPEECH, Florence, Italy, 28-31 August, International Speech Communication Association (ISCA), 2011, pp1989 - 1992 Conference Paper, 2011

João P. Cabral, Steve Renals, Junichi Yamagishi and Korin Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22-27 May, IEEE, 2011, pp4704 - 4707 Conference Paper, 2011 DOI URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, An HMM-based speech synthesiser using glottal post-filtering, 7th ISCA Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010, pp365 - 370 Conference Paper, 2010 URL

João Paulo Cabral, HMM-based Speech Synthesis Using an Acoustic Glottal Source Model, The University of Edinburgh, 2010 Thesis, 2010 URL

J. Sebastian Andersson, João P. Cabral, Leornado Badino, Junichi Yamagishi and Robert A.J. Clark, Glottal source and prosodic prominence modelling in HMM-based speech synthesis for the Blizzard Challenge 2009, The Blizzard Challenge 2009, Edinburgh, UK, 4 Septmeber, 2009 Conference Paper, 2009 URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, "HMM-based speech synthesis with an acoustic glottal source model, The First Young Researchers Workshop in Speech Technology, Dublin, Ireland, 25 April, 2009 Conference Paper, 2009

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Parametric Speech Synthesis, INTERSPEECH 2008, Brisbane, Australia, 22-26 September, International Speech Communication Association (ISCA), 2008, pp1829 - 1832 Conference Paper, 2008 URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Towards an Improved Modeling of the Glottal Source in Statistical Parametric Speech Synthesis, 6th ISCA Workshop on Speech Synthesis (SSW6), Bonn, Germany, 22-24 August, International Speech Communication Association (ISCA), 2007, pp113 - 118 Conference Paper, 2007 URL

Guilherme Raimundo, João P. Cabral, Celso Melo, Luís C. Oliveira, Ana Paiva, Isabel Trancoso , Telling Stories with a Synthetic Character: Understanding Inter-modalities Relations, Lecture Notes in Computer Science: Verbal and Nonverbal Communication Behaviours, COST Action 2102 International Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, Italy, 29-31 March, 4775, Springer Berlin Heidelberg, 2007, pp310 - 323 Conference Paper, 2007 DOI

João P. Cabral, Luís C. Oliveira, Guilherme Raimundo, Ana Paiva, What voice do we expect from a synthetic character?, 11th International Conference on Speech and Computer (SPECOM 2006), St. Petersburg, Russia, 26-29 June, 2006 Conference Paper, 2006

João P. Cabral and Luís C. Oliveira, EmoVoice: A System to Generate Emotions in Speech, INTERSPEECH, Pittsburgh, USA, 17-21 Septmeber, International Speech Communication Association (ISCA), 2006, pp1798 - 1801 Conference Paper, 2006 URL

João P. Cabral, Transforming Prosody and Voice Quality to Generate Emotions in Speech, Instituto Superior Técnico (IST), 2006 Thesis, 2006 URL

João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for High-Frequency Excitation Regeneration, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1513 - 1516 Conference Paper, 2005 URL

João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for Prosodic and Voice Quality Transformations, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1137 - 1140 Conference Paper, 2005 URL

João P. Cabral, Evaluation of Methods for Excitation Regeneration in Bandwidth Extension of Speech, Instituto Superior Técnico (IST) and Royal Institute of Technology (KTH), 2003 Thesis, 2003

Research Expertise

Description

My main research topic is speech processing for improving natural interaction with computer systems. It includes expressive Text-To-Speech synthesis (TTS) and recognition of verbal and non-verbal signals. For example, analysis and generation of social signals and paralinguistic features such as laughter and tone of the voice, which play an important role on engagement of people with interactive systems. I'm also interested in analysis of emotion and affect in speech. I've expertise in analysis and modelling of glottal source parameters which are strongly correlated with voice quality, such as breathy or tense voices. In addition, I also have interest in using modalities for improving the prediction of human cognition states and physical context in real scenarios using machine learning algorithms (from audio, video and biometric sensors data).

Projects

  • Title
    • CogSIS - Cognitive Effects of Speech Interface Synthesis
  • Summary
    • Through the growth of intelligent personal assistants, pervasive and wearable computing and robot based technologies speech interfaces are set to become a common dialogue partner. Technological challenges around the production of natural synthetic voices have been widely researched. Yet comparatively little is understood about how synthesis affects user experience, in particular how design decisions around naturalness (e.g. accent used and expressivity) impact the assumptions we make about speech interfaces as communicative actors (i.e. our partner models). Our ground-breaking project examines the psychological consequences of synthesis design decisions on the relationship between humans and speech technology. It fuses knowledge, concepts and methods from psycholinguistics, experimental psychology, and human-computer interaction (e.g. perspective taking and partner modelling research in human-human dialogue, controlled experiments, questionnaires) and speech technology (generation of natural speech synthesis) to 1) understand how synthesis design choices, specifically accent and expressivity, impact a user's partner model, 2) how these choices interact with context and 3) impact language production.
  • Funding Agency
    • Irish Research Council
  • Date From
    • 2017
  • Date To
    • 2018
  • Title
    • Production, Perception and Cognition in the interception between speech and singing
  • Summary
    • The issue raised in this project is that although we know how to intuitively distinguish between speech and singing, there are portions of each in which one perceives the coexistence of both and this suggests that there is a gradation, more than an abrupt change in phonation and other aspects of speech. The aim of this research project is to focus on some aspects of production and perception of speech and singing in order to answer the question: Are speech and singing completely different phenomena? Experimental studies are conducted that include: collection of a corpus of spoken and singing data, measurements of acoustic differences between the two types of data, a perception test which aims to designate the presented stimulus as speaking or singing, and to use machine learning to further study the acoustic differences between the two sound categories. The results are analysed by taking into account cognitive aspects of speech and song.
  • Funding Agency
    • São Paulo Research Foundation (FAPESP)
  • Date From
    • 2016
  • Date To
    • 2017

Keywords

Audio Signal Processing; Computer Assisted Language Learning (CALL); MACHINE LEARNING; SPEECH RECOGNITION; Speech synthesis; statistical parametric speech synthesis; VOICE QUALITY; VOICE SOURCE; voice transformation

Recognition

Awards and Honours

Awarded a Commercial Case Feasibility Support Grant from Enterprise Ireland 2014

Memberships

Member of the International Speech Communication Association (ISCA) 2005

Member of the Marie Curie Fellows Association (MCFA) 2006