Global Voices

A New Audio Uploading Tool for Crowdsourced Wiktionary Project in Odia Language

One Global Voices contributor who's passionate about the Odia language has created an open-source solution for recording and uploading words under open licenses for projects like Wiktionary.

A home recording setup for the Kathabhidhana project for Wiktionary. Image via Subhashish Panigrahi from Wikimedia Commons. CC BY-SA 4.0

Wiktionary, Wikipedia's multilingual sister project, promises a great deal. At present, there are not many open-licensed audio recordings that you can hear or download — especially if your mother tongue is not one of the major languages. Wiktionary is already available in multiple languages and in addition to the definitions of the words, many phonetic notations — at least in terms of the International Phonetic Alphabet (IPA) — are available. Now, an Odia-language community project is helping to simplify the process of volunteer contributions to the Odia Wiktionary project.

Kathabhidhana, a community project led by Global Voices contributor and Odia Wikipedian Subhashish Panigrahi, is an open-source solution for recording large chunks of words. It then uploads them under open licenses so that they can be useful for projects like Wiktionary.

Odia, one of the state languages in India, is a Indo-Aryan language that is spoken mostly in eastern India by around 40 million native speakers. With over 5,000 years of literary heritage, it has been recognized as one of the oldest South Asian languages, and has been given the status of a classical language by the Indian government.

But thanks to the use of non-Unicode-based typing systems, the language's online presence is still lagging behind. To address these issues, a bunch of character encoding converters that change typed text to Unicode using various non-Unicode encoding systems, are incorporated in Odia Wikipedia; it now has more than 12,000 entries. The Odia Wiktionary, on the other hand, as a free, online-based and completely crowdsourced dictionary in the Odia language, is trying to bridge the gap.

The project draws its inspiration largely from other open-source software created by Shrinivasan T, who used Python programming language to automate and simplify the process. He posted this tutorial on YouTube:

Panigrahi was inspired to do the Kathabhidhana project because the existing method was a cumbersome process: you have to pronounce and record a word, then export it in Ogg Vorbis format to your Wikimedia Commons account, which is a central repository of media files for all Wikimedia projects. Once uploaded, the entry is added to the Wiktionary project. Apart from manually recording pronunciation, there is also an open-source text-to-speech project called Dhvani that works for most Indian languages.

In contrast, having audio recordings of words in Wiktionary helps non-native speakers — as well as people with visual disabilities — listen to the pronunciation of different words. The word library can also be used for several Natural Language Processing projects, like building text-to-speech and speech-to-speech engines.

You can download a copy of Kathabhidhana and find all the audio recordings made using this software.

Originally published in Global Voices.

More from Global Voices

Global Voices5 min read
Imran Khan's Conflict With The Military Establishment In Pakistan, And His Political Future
Former Pakistan Prime Minister and jailed politician Imran Khan's recent tweet has stirred significant controversy on social media, sparking debates and speculation about his and his party's political future.
Global Voices6 min read
As A Strong Supporter Of Reparations, Barbados Has Had To Reconsider Purchasing Land From A Former Slave-owning Family
Barbados' plans to purchase land — to the tune of GBP 3 million — from British Conservative MP Richard Drax whose forebears were slave traders, have been put on pause.
Global Voices6 min read
In Brazil, The Intelligence Services Spied On Over 300,000 Citizens During The Military Dictatorship
SNI, the predecessor of the current Brazilian Intelligence Agency (Abin) sought to continue illegal activities after the end of the military dictatorship, according to previously unpublished records

Related Books & Audiobooks