Skip to main content

Trinity College Dublin, The University of Dublin

Trinity Menu Trinity Search



Dr. Joao Paulo Cabral
Visiting Research Fellow, ADAPT
Visiting Research Fellow, Computer Science

Biography

João Cabral is a research fellow at Trinity College Dublin, in the School of Computer Science and Statistics, as part of the ADAPT Centre. He received B.Sc. and M.Sc. degrees from Instituto Superior Técnico (IST), Lisbon, Portugal, in Electrical and Computer Engineering, in 2003 and 2006 respectively. He was awarded a Ph.D. degree in Computer Science and Informatics from the University of Edinburgh, U.K., in 2010, funded by a European Commission Marie Curie Fellowship, under the Early Stage Research Training (E.S.T) scheme. Before joining Trinity College Dublin in 2013, he also worked as a postdoctoral research fellow at the University College Dublin, as part of the CNGL research centre, from 2010.

Publications and Further Research Outputs

Peer-Reviewed Publications

Darragh Higgins, Katja Zibrek, Joao Cabral, Donal Egan, Rachel McDonnell, Sympathy for the digital: Influence of synthetic voice on affinity, social presence and empathy for photorealistic virtual humans, Computers & Graphics, 2022 Journal Article, 2022 URL

Katja Zibrek, Joao Cabral, Rachel McDonnell, Does Synthetic Voice Alter Social Response to a Photorealistic Character in Virtual Reality?, Motion, Interaction and Games (MIG), Virtual Event, Switzerland, Association for Computing Machinery, 2021, pp1 - 6 Conference Paper, 2021 URL

Beatriz Raposo de Medeiros, João Paulo Cabral, Alexsandro R. Meireles, and Andre A. Baceti, A comparative study of fundamental frequency stability between speech and singing, Speech Communication, 128, 2021, p15 - 23 Journal Article, 2021 DOI

João P. Cabral and Alexsandro R. Meireles, Transformation of voice quality in singing using glottal source features, Workshop on Speech, Music and Mind 2019 (SMM 2019), Vienna, Austria, 14 September 2019, ISCA, 2019, pp31 - 35 Conference Paper, 2019 URL

Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schögl, Jens Edlund, Matthew Aylett, Cosmin Munteanu, João P. Cabral, and Benjamin R. Cowan, The State of Speech in HCI: Trends, Themes and Challenges, Interacting with Computers, 2019 Journal Article, 2019

Benjamin R. Cowan, Philip Doyle, Justin Edwards, Diego Garaialde, Ali Hayes-Brady, Holly P. Branigan, João Cabral, Leigh Clark, What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue, the 1st International Conference on Conversational User Interfaces, Dublin, Ireland, 2019 Conference Paper, 2019 URL

João P. Cabral, Estimation of the Asymmetry Parameter of the Glottal Flow Waveform Using the Electroglottographic Signal, INTERSPEECH 2018, Hyderabad, India, 2-6 Septmeber, 2018 Conference Paper, 2018

Beatriz R. de Medeiros and João P. Cabral, Acoustic distinctions between speech and singing: Is singing acoustically more stable than speech?, Speech Prosody, Poznań, Poland, 13-16 June, 2018, pp542 - 546 Conference Paper, 2018 TARA - Full Text URL

Leigh Clark, João Cabral, Benjamin Cowan, The CogSIS Project: Examining the Cognitive Effects of Speech Interface Synthesis, British Human Computer Interaction Conference, Belfast, 2-6 July, 2018 Conference Paper, 2018 URL TARA - Full Text

João P. Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell, The Influence of Synthetic Voice on the Evaluation of a Virtual Character, Interspeech 2017, Stockholm, Sweden, 20-24 August, ISCA, 2017, pp229 - 233 Conference Paper, 2017 TARA - Full Text DOI URL

João P. Cabral, Christian Saam, Eva Vanmassenhove, Stephen Bradley, Fasih Haider, The ADAPT entry to the Blizzard Challenge 2016, Blizzard Challenge 2016 Workshop, Cupertino, CA, USA, 2016 Conference Paper, 2016 URL

Eva Vanmassenhove, João P. Cabral, Fasih Haider, Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis, 9th ISCA Workshop on Speech Synthesis, Sunnyvale, CA, USA, 13-15 September, 2016, pp22 - 27 Conference Paper, 2016 TARA - Full Text URL

João P. Cabral, Yuyun Huang, Christy Elias, Ketong Su and Nick Campbell, Interface for Monitoring of Engagement from Audio-Visual Cues, The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, Vienna, Austria, 11-13 Septmeber, ISCA, 2015 Poster, 2015 URL

Séamus Lawless, Peter Lavin, Mostafa Bayomi, João P. Cabral and M. Rami Ghorab, Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations, 20th International Conference on Application of Natural Language to Information Systems (NLDB), Passau, germany, June 17-19, edited by Springer , Springer, 2015, pp307 - 320 Conference Paper, 2015 URL DOI

Elias Christy, João P. Cabral and Nick Campbell, Audio features for the Classification of Engagement, Workshop on Engagement in Social Intelligent Virtual Agents, Delft, Netherlands, 25th August 2015, 2015, pp8 - 12 Conference Paper, 2015 TARA - Full Text URL

Yuyun, Huang, Christy Elias., Cabral, João P. Cabral, Atul Nautiyal, Christian Saam and Nick Campbell, Towards Classification of Engagement in Human Interaction with Talking Robots, Communications in Computer and Information Science , 17th International Conference on Human-Computer Interaction, Los Angeles, USA, 2-7 August 2015, 528, Springer, 2015, pp741 - 746 Published Abstract, 2015 DOI URL

Éva Székely, Zeeshan Ahmed, Shannon Hennig, João P. Cabral and Julie Carson-Berndsen, Predicting synthetic voice style from facial expressions. An application for augmented conversations, Speech Communication, 57, 2014, p63 - 75 Journal Article, 2014 DOI URL

Zeeshan Ahmed and João P. Cabral, HMM-Based Speech Synthesiser For The Urdu Language, Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), Saint Petersburg, Russia, 14 May, 2014, pp92 - 97 Conference Paper, 2014 TARA - Full Text URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Speech Synthesis, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Statistical Parametric Speech Synthesis, 8, (2), 2014, p195 - 208 Journal Article, 2014 DOI URL

João P. Cabral, Nick Campbell, Sree Ganesh, Mina Kheirkhah, Emer Gilmartin, Fasih Haider, Eamonn Kenny, Andrew Murphy, Neasa Ní, Chiaráin, Thomas Pellegrini and Odei Rey Orozko, MILLA: A Multimodal Interactive Language Learning Agent, SemDial 2014, Edinburgh, United Kingdom, September 1st-3rd, 2014 Conference Paper, 2014 URL

João P. Cabral, Uniform Concatenative Excitation Model for Synthesising Speech without Voiced/Unvoiced Classification, INTERSPEECH, Lyon, France, August 2013, edited by International Speech Communication Association (ISCA) , 2013, pp1082 - 1085 Conference Paper, 2013 URL

João P. Cabral and Julie Carson-Berndsen, Towards a Better Representation of Glottal Pulse Shape Characteristics in Modelling the Envelope Modulation of Aspiration Noise, Lecture Notes in Computer Science: Advances in Nonlinear Speech Processing, NOLISP International Conference, Mons, Belgium, 19-21 June, edited by Thomas Drugman and Thierry Dutoit , 7911, 2013, pp67 - 74 Conference Paper, 2013 DOI URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron, SAPA - SCALE Conference, Portland, USA, 7-8 September, 2012 Conference Paper, 2012 URL

Éva Székely Zeeshan Ahmed, João P. Cabral and Julie Carson-Berndsen, WinkTalk: A Demonstration of a Multimodal Speech Synthesis Platform Linking Facial Expressions to Expressive Synthetic Voices, the Third Workshop on Speech and Language Processing for Assistive Technologies, Montreal, Canada, 7 June, Association for Computational Linguistics, 2012, pp5 - 8 Conference Paper, 2012 TARA - Full Text

Amalia Zahra, João P. Cabral, Mark Kane and Julie Carson-Berndsen, Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition Technology, International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, 6-8 June, 2012, pp65 - 69 Conference Paper, 2012 URL

João P. Cabral, Mark Kane, Zeeshan Ahmed, Mohamed Abou-Zleikha, Éva Székely, Amalia Zahra, Udochukwu Kalu Ogbureke, Peter Cahill, Julie Carson-Berndsen, Stephan Schlögl, Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz, International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, 21-27 May, 2012, pp23 - 25 Conference Paper, 2012 URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov Model, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 3-5 July, IEEE, 2012, pp700 - 705 Conference Paper, 2012 DOI

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis, International Conference on Speech Prosody, Shanghai, China, 22-25 May, 2012, pp67 - 70 Conference Paper, 2012 URL

Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using multilayer perceptron for voicing strength estimation in HMM-based speech synthesis, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 2-5 July, IEEE, 2012, pp683 - 688 Conference Paper, 2012 URL DOI

Mark Kane, João P. Cabral, Amalia Zahra and Julie Carson-Berndsen, Introducing Difficulty-Levels in Pronunciation Learning, International Speech Communication Association Special Interest Group on Speech and Language Technology in Education (SLaTE), Venice, Italy, 24-26 August, International Speech Communication Association (ISCA), 2011, pp37 - 40 Conference Paper, 2011 TARA - Full Text

João P. Cabral, John Kane, Christer Gobl and Julie Carson-Berndsen, Evaluation of glottal epoch detection algorithms on different voice types, INTERSPEECH, Florence, Italy, 28-31 August, International Speech Communication Association (ISCA), 2011, pp1989 - 1992 Conference Paper, 2011

Éva Székely, João P. Cabral, Peter Cahill and Julie Carson-Berndsen, Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters, INTERSPEECH, Florence, Italy, International Speech Communication Association (ISCA), 2011, pp2409 - 2412 Conference Paper, 2011

João P. Cabral, Steve Renals, Junichi Yamagishi and Korin Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22-27 May, IEEE, 2011, pp4704 - 4707 Conference Paper, 2011 URL DOI

João Paulo Cabral, HMM-based Speech Synthesis Using an Acoustic Glottal Source Model, The University of Edinburgh, 2010 Thesis, 2010 URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, An HMM-based speech synthesiser using glottal post-filtering, 7th ISCA Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010, pp365 - 370 Conference Paper, 2010 URL

J. Sebastian Andersson, João P. Cabral, Leornado Badino, Junichi Yamagishi and Robert A.J. Clark, Glottal source and prosodic prominence modelling in HMM-based speech synthesis for the Blizzard Challenge 2009, The Blizzard Challenge 2009, Edinburgh, UK, 4 Septmeber, 2009 Conference Paper, 2009 URL

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, "HMM-based speech synthesis with an acoustic glottal source model, The First Young Researchers Workshop in Speech Technology, Dublin, Ireland, 25 April, 2009 Conference Paper, 2009 TARA - Full Text

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Parametric Speech Synthesis, INTERSPEECH 2008, Brisbane, Australia, 22-26 September, International Speech Communication Association (ISCA), 2008, pp1829 - 1832 Conference Paper, 2008 URL

Guilherme Raimundo, João P. Cabral, Celso Melo, Luís C. Oliveira, Ana Paiva, Isabel Trancoso , Telling Stories with a Synthetic Character: Understanding Inter-modalities Relations, Lecture Notes in Computer Science: Verbal and Nonverbal Communication Behaviours, COST Action 2102 International Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, Italy, 29-31 March, 4775, Springer Berlin Heidelberg, 2007, pp310 - 323 Conference Paper, 2007 DOI

João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Towards an Improved Modeling of the Glottal Source in Statistical Parametric Speech Synthesis, 6th ISCA Workshop on Speech Synthesis (SSW6), Bonn, Germany, 22-24 August, International Speech Communication Association (ISCA), 2007, pp113 - 118 Conference Paper, 2007 URL

João P. Cabral, Transforming Prosody and Voice Quality to Generate Emotions in Speech, Instituto Superior Técnico (IST), 2006 Thesis, 2006 URL

João P. Cabral and Luís C. Oliveira, EmoVoice: A System to Generate Emotions in Speech, INTERSPEECH, Pittsburgh, USA, 17-21 Septmeber, International Speech Communication Association (ISCA), 2006, pp1798 - 1801 Conference Paper, 2006 URL

João P. Cabral, Luís C. Oliveira, Guilherme Raimundo, Ana Paiva, What voice do we expect from a synthetic character?, 11th International Conference on Speech and Computer (SPECOM 2006), St. Petersburg, Russia, 26-29 June, 2006 Conference Paper, 2006

João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for High-Frequency Excitation Regeneration, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1513 - 1516 Conference Paper, 2005 URL

João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for Prosodic and Voice Quality Transformations, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1137 - 1140 Conference Paper, 2005 URL

João P. Cabral, Evaluation of Methods for Excitation Regeneration in Bandwidth Extension of Speech, Instituto Superior Técnico (IST) and Royal Institute of Technology (KTH), 2003 Thesis, 2003

Non-Peer-Reviewed Publications

Peter Cahill, Udochukwu Ogbureke, Jo ̃ao Cabral, Eva Szekely,Mohamed Abou-Zleikha, Zeeshan Ahmed and Julie Carson-Berndsen, UCD Blizzard Challenge 2011 Entry, Blizzard Challenge Workshop 2011, Turin, Italy, 2 September, 2011 Conference Paper, 2011

Research Expertise

Description

My main research work in on Text-To-Speech synthesis (TTS) and development of innovative commercial applications of this research, such as expressive AI voices for Audiobooks, Spoken Dialogue Systems, and Animation. I'm also interested in analysis of emotion and affect in speech. I've great expertise in analysis and modelling of glottal source parameters. These features are important in TTS for better transforming the type of voice, such as breathy or tense voices, and emotions. Other areas of expertise include speech signal processing, statistical learning algorithms for speech processing and deep learning.

Projects

  • Title
    • Expressive Speech Synthesis: VoiceTune
  • Summary
    • Research project to develop expressive Text-to-Speech commercial applications for industry. The project aims to validate prototype product/service and commercial value to companies that need AI expressive voice solutions.
  • Funding Agency
    • Enterprise Ireland
  • Date From
    • 2020
  • Date To
    • 2022
  • Title
    • CogSIS - Cognitive Effects of Speech Interface Synthesis
  • Summary
    • Through the growth of intelligent personal assistants, pervasive and wearable computing and robot based technologies speech interfaces are set to become a common dialogue partner. Technological challenges around the production of natural synthetic voices have been widely researched. Yet comparatively little is understood about how synthesis affects user experience, in particular how design decisions around naturalness (e.g. accent used and expressivity) impact the assumptions we make about speech interfaces as communicative actors (i.e. our partner models). Our ground-breaking project examines the psychological consequences of synthesis design decisions on the relationship between humans and speech technology. It fuses knowledge, concepts and methods from psycholinguistics, experimental psychology, and human-computer interaction (e.g. perspective taking and partner modelling research in human-human dialogue, controlled experiments, questionnaires) and speech technology (generation of natural speech synthesis) to 1) understand how synthesis design choices, specifically accent and expressivity, impact a user's partner model, 2) how these choices interact with context and 3) impact language production.
  • Funding Agency
    • Irish Research Council
  • Date From
    • 2017
  • Date To
    • 2018
  • Title
    • Production, Perception and Cognition in the interception between speech and singing
  • Summary
    • The issue raised in this project is that although we know how to intuitively distinguish between speech and singing, there are portions of each in which one perceives the coexistence of both and this suggests that there is a gradation, more than an abrupt change in phonation and other aspects of speech. The aim of this research project is to focus on some aspects of production and perception of speech and singing in order to answer the question: Are speech and singing completely different phenomena? Experimental studies are conducted that include: collection of a corpus of spoken and singing data, measurements of acoustic differences between the two types of data, a perception test which aims to designate the presented stimulus as speaking or singing, and to use machine learning to further study the acoustic differences between the two sound categories. The results are analysed by taking into account cognitive aspects of speech and song.
  • Funding Agency
    • São Paulo Research Foundation (FAPESP)
  • Date From
    • 2016
  • Date To
    • 2017

Keywords

Audio Signal Processing; Computer Assisted Language Learning (CALL); deep learning; MACHINE LEARNING; Natural Language Processing; Speech Emotions; Speech synthesis; statistical parametric speech synthesis; VOICE QUALITY; VOICE SOURCE; voice transformation

Recognition

Memberships

Member of the International Speech Communication Association (ISCA) 2005

Member of the Marie Curie Fellows Association (MCFA) 2006