Skip to main content

Trinity College Dublin, The University of Dublin

Trinity Menu Trinity Search



You are here Our Research > Current Research Projects

Corpas na Gaeilge Labhartha

Researchers: Elaine Uí Dhonnchadha, Alessio Frenda.
with Brian Vaughan (Recording), Daniel Jettka (XSLT)
Funding: Foras na Gaeilge (GaLa 2011) (Comhrá 2012)

Gaeilge

Background

The aim of this project is to create a comprehensive corpus of spoken Irish.
This corpus will provide valuable material for linguistic research, the teaching of Irish and language technology such as automatic speech recognition.

It will be a diachronic* corpus, as we are collecting the earliest audio material available to us (going back seventy years or more), as well as making  new contemporary recordings, in the Speech Communication Laboratory, and in the Gaeltacht regions around the country. We will endeavour to create a balanced corpus as regards speaker dialect, gender and age.

Creating a corpus of spoken language requires transcribing audio or video recordings (e.g. spoken conversations, interviews, speeches etc). We use specialised  transcription software which enables XML formatting of  the transcripts as well as time-alignment of the transcript with the audio/video recording.

Transcription Guidelines for Irish

These guidelines were designed by Alessio Frenda, Elaine Uí Dhonnchadha and Pauline Welby (CNRS, France). If you have any queries or suggestions regarding the guidelines, please e-mail us at uidhonne@tcd.ie.

There are a number of aspects of speech which do not need to be recorded in the transcription as they can be automatically generated at a later stage, e.g. the length of pauses. Dialectal pronunciation is not represented in the orthographic transcription as dialectal pronunciations can be more accurately represented in a separate phonetic transcription (apart from a list of pre-defined exceptions, details below). A large percentage of dialectal pronunciations can be automatically generated from the standard orthography using dialect-specific letter-to-sound rules.

Further information regarding standardised spelling of various aspects of spoken language may be found in the following lists:

Transcription Software

To download the Transcriber software, go to the following web page and then install the software on your computer. http://sourceforge.net/projects/trans/files/transcriber/1.5.1/ [Do not use TranscriberAG]

*to study how a language changes over time