Papers
Topics
Authors
Recent
Search
2000 character limit reached

Low-Resource Neural Machine Translation for Southern African Languages

Published 1 Apr 2021 in cs.CL and cs.LG | (2104.00366v2)

Abstract: Low-resource African languages have not fully benefited from the progress in neural machine translation because of a lack of data. Motivated by this challenge we compare zero-shot learning, transfer learning and multilingual learning on three Bantu languages (Shona, isiXhosa and isiZulu) and English. Our main target is English-to-isiZulu translation for which we have just 30,000 sentence pairs, 28% of the average size of our other corpora. We show the importance of language similarity on the performance of English-to-isiZulu transfer learning based on English-to-isiXhosa and English-to-Shona parent models whose BLEU scores differ by 5.2. We then demonstrate that multilingual learning surpasses both transfer learning and zero-shot learning on our dataset, with BLEU score improvements relative to the baseline English-to-isiZulu model of 9.9, 6.1 and 2.0 respectively. Our best model also improves the previous SOTA BLEU score by more than 10.

Citations (14)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.