Skip to main content

Trinity College Dublin, The University of Dublin

Trinity Menu Trinity Search

You are here Undergraduate > Computer Science, Linguistics and a Language

LI4036 Linguistics Project


Identifying Multi-word-expressions in Irish

Multiword expressions (MWEs) as a term covers a broad range of linguistic phenomena including idioms (“once bitten, twice shy”), phrasal verbs (“make up”), light verb constructions (“take a walk”), compound nouns (“traffic light”) and many more. The correct processing of MWEs is important for many tasks in NLP, such as machine translation, grammar parsing or question answering, and requires the development of resources for these expressions. (See Sag et al. and the PARSEME Shared Task). There is a lack of such resources for Irish.

This project involves the creation of an annotated corpus of verbal MWEs in Irish (i.e. those MWEs that contain a verb) and the development of annotation guidelines for the correct identification of these MWEs. 
Possible questions to explore in this project include:

  • an investigation of the types and frequency of MWE found in Irish
  • a comparison of MWE types and their functions in Irish as compared to one other language
  • an examination of verbs most frequently occurring in MWE constructions in Irish
  • development of a detailed annotation guideline for MWEs in Irish including the linguistic background.
  • the development of rules for automatic identification of MWE types, based on syntactic patterns identified in the guidelines (see the MWEtoolkit for details)

This project is suitable for students with a strong command of Irish, and a keen interest in linguistic analysis or computational linguistics.

If you are interested in this project please contact Elaine Uí Dhonnchadha (