Spoken across some of the world's northernmost territories, Yakut is an indigenous language with fewer than half a million people currently using it in their everyday life. Listed as one of the endangered languages, Yakut is risking disappearing off the face of the earth and taking a unique culture and original group identity with it.

Thanks to the passion, dedication and enthusiasm of one single individual, as well as support from the Yandex team, two years ago Yakut was added to Yandex's automated translation service Yandex Translate, and this year we have released a film telling the story of this journey.

Yandex's translation technology not only bridges gaps and breaks barriers, it also contributes to the preservation and promotion of languages and cultures. It offers automated translation between over a hundred languages, including rare, such as Hill Mari, fictional, such as Elvish, and even symbolic, such as Emoji.

For its translating capabilities, Yandex Translate's technology relies on a large number of examples on the one hand, and neural networks on the other. Despite its mind-blowingly complex structure that lets Yandex Translate's neural model learn a new language on its own, it still needs to learn from a large number of so-called 'parallel' examples, equivalent pieces of text in two different languages.

The problem with the Yakut language was that there was simply not enough written text for a language model to learn from. Inspired by the grassroot enthusiasm of a group of the Yakut people and with their hands-on contribution, Yandex engineers developed a way for their neural model to learn this disappearing language.

This film is about how technology can help people in their collective effort to support a vulnerable language and culture. You can now watch here or on YouTube (please, enable the English subtitles in Settings).

Attachments

  • Original Link
  • Original Document
  • Permalink

Disclaimer

Yandex NV published this content on 20 December 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 20 December 2021 11:49:09 UTC.