Shared Task

SHARED TASK |  HACKATHON

The Organising Committee of the South African Conference for Artificial Intelligence Research 2021 invites your participation in a Shared Task / Hackathon that will be jointly hosted by the organising committee of the Third conference of the Digital Humanities Association of Southern Africa (DHASA) DHASA 2021 (https://dh2021.digitalhumanities.org.za).

The Shared Task: NLAPOST: Nguni Languages Part of Speech Tagging challenge

Parts of Speech Tagging (POS Tagging) is a process of assigning labels to each word in text, to indicate its lexical category based on the context it appears in in text. The POS tagging problem is considered a mostly solved problem in languages with a lot of NLP resources such as English. However, this problem is still an open problem for languages with less NLP resources such as the Nguni languages. This is owing to unavailability of large amounts of labelled data to train POS tagging models. The rich morphological structure and the agglutinative nature of these languages makes the POS tagging problem more challenging when compared to a language like English. With this in mind, we have organised a challenge for training POS Tagging models on a limited amount of data for four Nguni Languages, isiZulu, siSwati, isiNdebele, and isiXhosa.

You are invited to join the challenge of developing a POS tagging model that performs well on all the four languages. The models should be trained on the training dataset provided. Apart from the POS tags the dataset is also morphologically tagged to capture the agglutinative nature of the languages. You are allowed to advantage of the morphological segmentation in developing your POS tagging models. However, the evaluation of your solutions will not explicitly take the morphological tags as input. This implies that your test set will just be raw running text. Our decision not to have morphological tags in the test set is informed by the fact that if your models are to be useful they don’t have to depend on availability of morphologically tagged text.

You can participate via Codalab by clicking on this link: (https://competitions.codalab.org/competitions/33738).

Important Dates:

  • Start of the competition: 1 August 2021
  • Training data release: 15 August  2021 
  • Test data release: 15 September 2021
  • Competition ends: 1 October 2021
  • Conference Papers due: 15 October 2021

Contact the organizers via: nlapost2021@gmail.com

The best work on the task will be presented at the SACAIR2021 unconference on Monday, 6 December.

css.php