Web-Based English to Yoruba Machine Translation
Akinwale O. I., Adetunmbi A. O., Obe O. O., Adesuyi A. T.
Computer Science Department, Federal University of Technology, Akure, Nigeria
To cite this article:
Akinwale O. I., Adetunmbi A. O., Obe O. O., Adesuyi A. T.. Web-Based English to Yoruba Machine Translation. International Journal of Language and Linguistics. Vol. 3, No. 3, 2015, pp. 154-159. doi: 10.11648/j.ijll.20150303.17
Abstract: The growth of globalization in the world today has increased the rate at which people interact and integrate, thereby increasing the level of international integration from interchange of world views, products, ideas and other aspects of culture. Language differences therefore pose a major barrier to smooth running of these processes. Therefore there is need for existence of system that will help translate between languages. English is a West Germanic language which has become the lingual Franca in Nigeria and 53 other countries. Therefore vital information are written and spoken in English language in Nigeria. Meanwhile,Yoruba language is lagerly spoken in Nigeria with over 40 million speakers in the south-western part of the country and also in parts of Benin republic. This research deals with the translation of English text to Yoruba text using rule based method. Twenty two rules were formulated for the translation which is specified using context free grammar. A bilingual dictionary dataset containing English words and the corresponding translation in Yoruba language was used. The research model was implemented with ASP.net and C# programming languages which has been hosted on http://www.naijatranslate.com. The translator was evaluated to have accuracy of 90.5%.
Keywords: Machine Translation, Rule-based Machine Translation, English Language, Yoruba Language, Computational Rules, Translation System
English language is the Nigerian lingua franca which is commonly spoken among tribes in the country.This has therefore posed a threat to the survival of indigenous Nigerian languages. Consequently, most children cannot speak their mother tongue. Therefore, experts are agitated that if a child cannot speak his or her mother tongue today, there is probaility that in the next 20 to 25 years the sons and daughters of the child may lose the language. This implies that in the next 50 years or more, the fate of Nigerian languages such as Yoruba, would be close to near extinction.The recent policy of Nigeria Federal Ministry of Educationthat made the study of indigineous languages optional in the Senior Secondary Schools do not help matters.This research provides a means of preventing extinction of Yoruba language. Also it helps in the flow of globilization by developing a web-based user friendly English to Yoruba Machine Translation System. This system is easily accessible to learning and to teaching the indigenes and anyone interested in Yoruba language. The translator is userfriendly and English words are easily translated to Yoruba words. More so, it assists in understandingthe Yoruba language with English language.
2. Related Works
 worked on Web based English to Yoruba machine translation. In the research, computational models were formulated using finite state automata, which was used todevelop a web-based translation system for Noun-phrases in English language to Yoruba language. Linguists were consulted and there was a detailed study of the syntactic structures of both languages with emphasis on noun-phrases. Rules were formulated for the generation of Noun-phrases from English to Yoruba which were specified using context-free grammar. Also,  worked on Development of an English to Yoruba Machine Translation system. The research work carried out computational analysis of English to Yoruba texts translation process. Rule-based approach was used to carry out the research. The translator was modeled using context-free grammar and re-write rules, Parse Tree and Automata theory-based techniques and design of corresponding software using UML.
Google incorporation language translation service, Google translate, is a system based on statistical machine translation which started in the year 2006 with two languages . It is currently probably the best known online language translation service provider . It performs hundreds of millions of translations every day. Presently, it offers full support for translation between 64 different languages. Google translate is a common existing tool that can translate Yoruba language to other languages and vice versa . The efficiency of this research model will be evaluated with Google translate.  worked on Using Statistical Machine Translation As A language Translation tool for understanding Yoruba. Translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. Existing software tool kits were used. There was no English -Yoruba parallel corpus, which  had to create English - Yoruba parallel corpus.
3. Research Model
Rule based approach of machine translation was used for this system. Specifically, dictionary based type of rule based was used as the main approach which is the most realistic of all the types of machine translation. Figure 1 shows the architecture of the system and the process that every translation will follow.
|S/N||English rules||Yoruba arrangement of the rule||Translator’s rule|
|R1||NP = det + N||NP = N + det||Re-ordering determinants|
|R2||NP = det + a + N||NP = N + a + det||Noun Phrase|
|R3||NP = p + N||NP = N + p||Re-ordering determinants|
|R4||NP = p + a + N||NP = N + a + p||Noun Phrase|
|R5||V = "is" + a||V = ɛ + a||"is" = Empty(IsTracking)|
|R6||V = "is" + lvc||V = "n" + LV||Continuous verb|
|R7||V = "is" + det "a"||V = "je" + ɛ||Det "a" = empty|
|R8||V = "is" + det "the" +N||V = "ni" + N + det||(IsTracking)|
|R9||NP = pn||NP = "awon" + N||Plural Noun|
|R10||NP = det + pn||NP = "awon" + N + det||Plural Noun|
|R11||NP = det + a + pn||NP = "awon" + N + a + det||Plural Noun|
|R12||NP = p + pn||NP = "awon" + N + p||Plural Noun|
|R13||NP = p + a + pn||NP = "awon" + N + a + p||Plural Noun|
|R14||V = lvs||V = "maa n" + LV||Singular verbs|
|R15||V = lvc||V = LV||Continuous verb|
|R16||V = "has" + det + N||V = "ni" + N + det||CheckforHas|
|R17||V = "has" + det + a + N||V = "n" + N + a + det|
|R18||V = "has" + LV||V = "ti" + LV|
|R19||V = "has" + "to"||V = "ni" + Lati|
|R20||V = "to" + det + N||V = "si" + N + det|
|R21||V = V + "d"||V = LV||Past tense verb|
|R22||V = V + "ed"||V = LV||Past tense verb|
From figure 1, tokenization is the first step in the translation process. It is the splitting of the input sentence which is in English language into words which are tokens. Then each token will be tagged with part of speech. 22 computational rules were formulated and they form the basis for the translator, which is called Y-Translator. These computational rules were formed based on some selected English Grammar rules and their arrangement in Yoruba language. The 22 computational rules are represented in table 1.
From the computational rules, production rules based on context free grammar were also formulated for English and Yoruba sentence structure. For English structure, the production rules are as follow:
1. S à NP VP
2. NP àN | p | dN | daN | pN | paN | a | ɛ
3. VP à V NP
4. V à avLV | av | LV | LVab
5. LV à lvc | lvp | lvs
6. N à sn | pn
The Production rules based on Yoruba sentence structure are as follow:
1. S à NP VP
2. NP à N | p | Np | Nad | Np | Nap | a | ɛ
3. VP à V NP
4. V à avLV | av | LV | Lvab
5. N à sn | pn
Where ‘S’ means Sentence, ‘NP’ means Noun Phrase, ‘VP’ means verb phrase, ‘N’ means noun, ‘P’ means Pronoun, ‘d’ means determinant, ‘a’ means adjective, ‘V’ means verb‘av’ means auxiliary verb, ‘LV’ means Lexical verb, ‘lvc’ means continuous lexical verb, ‘lvp’ means plural lexical verb, ‘lvs’ means singular lexical verb, ‘sn’ means singular noun, ‘pn’ means plural noun, ‘ab’ means adverb and ‘ɛ’ represents empty.
The computational rules are categorized into nine which comprise of; word replacers, ContinuousVerbTracker, PluralNounTracker, SingularVerbFlag, re-ordering determinant, PastTenseVerbFlag, IsTracker, NounPhraseRule and WordTapRule. ContinuousVerbTracker component recognizes and translates continuous verbs. PluralNounTracker recognizes and translate plural noun by removing the suffix‘s’. SingularVerbFlag recognizes and translate singular verbs. Re-ordering determinants recognizes determinants in the English sentence, then re-order the position of its translation in the Yoruba sentence. PastTenseVerbFlag recognizes and translate verbs in past tense and translate by retrieving the present form of the verb which exist in the database dictionary in Yoruba language. NounPhraseRule identifies, translate and re-arrange noun phrases in a sentence. WordTapeRule identifies the appropriate translation in a sentence, for words that have more than one part of speech.
Some of the rules generated for the model will require morphological analysis which are ContinuousVerbTracker, SingularVerbFlag, PluralNounTracker and PastTenseVerbFlag. The bound morphemes (Suffix) that the translator recognizes are "–ing", "-ed", "-d" and "-s". Meanwhile, the translation of the root word is retrieved from the dictionary.
The dictionary is bilingual that is, it contains words in English with their parts of speech with the equivalent word in Yoruba language. Some data were extracted from the data set to exist independently from the raw data. The data extracted were the "determinants", "auxiliary verbs" and "Pronoun" in English words with their corresponding translation in Yoruba. The raw data was also edited updated with irregular verbs which do not exist in the dictionary ideally and all words are well arranged. The data sets serves as the database structure for the machine translation. Figure 2 shows the structure of the general data set which contains every other word arranged alphabetically, while figure 3 shows the determinant data set.
The research model was implemented with VISUAL STUDIO 2012, ASP.net – an interface design tool that makes use of HTML and C# programming language to code the design. Figure 4 shows the snapshot of the home page of the translator while figure 5 shows the snapshot of the ADMIN page which is purposely for updating the data set with new English and Yoruba which does not exist in the data set.
4. System Evaluation
The evaluation was carried out by comparing the efficiency of this translator with Google translate. Table 2 is an extract of the sentences tested on Google translate (http://www.translate.google.com/m) and Y-Translation model (http://www.naijatranslate.com).
|Input Sentenes||Y-Translation Model output||Google Translate Output|
|She wrote easily||Obinrin naa kọẹ lainira||O kọwe awọn isoro|
|She has written in the book||Obinrin naa kọ ninu iwe naa||O ti kọ ninu awon iwe|
|The boy hit the dog||ọmọkunrin naa lu aja naa||Awọn ọmọkunrin lu awọn aja|
|She is a girl||Obinrin naa je ọmọbinrin||O ni kan ọmọbinrin|
|The lady must be in the car||Iyalode naa gbudọ wa ninu ọkọ naa||Awọn iyaafin gbọdọ wa ni awọn ọkọ|
|He bought a book for them||Okunrin naa ra iwe kan fun wọn||O si ra iwe kan fun wọn|
|She bought a weapon for herself||Obinrin naa ra ohun ija kan fun ontikarare||O ra kanija fun ara|
|Each man must eat their food||Okunrin ikọọkan gbudọ je ounje ti wọn||Kọọkan ọkunrin gbọdọ je wọn ounje|
|She bought a weapon for him||Obinrin naa ra ohun ija kan fun on||O ra kan ija fun u|
|Translator||Sentences Generated||Correctly Translated||Wrongly Translated||Partially Translated||Partial Translation%||Absolute Accuracy %|
5. Result of the Evaluation
From the result shown in table 3, the partial translation refers to sentences translated that are grammatically correct but not perfectly translated that is, the translated sentence in Yoruba does not have the same meaning with the English sentence inputed to the system. Partial Accuracy is the percentage of all grammatically correct sentences. The absolute accuraccy is the percentage of senteces translated that are grammaticaly correct with perfect translation of the English sentence inputed.Wrongly translated are sentences that are not perfectly translated and grammatically incorrect in Yoruba language. Figure 6 shows the number of correctly translated, wrongly translated and partially translated for Y-Translator and Google translate. Meanwhile, figure 7 shows the percentage distribution of the accuracy level of both translators.
The rule based approach to machine translation is still the most realizable method for translationas shown from the result. The translator has been able to translate past tense verbs, continous verb tense and plural noun which will reqiure additional words in Yoruba language. This is an improvement on existing research work on English language to Yoruba language tanslation. The translator has been able to translate a complete sentence in English language to Yoruba languge which is an improvement to the research carried out by Abiola et al. (2014). Also, the translator has a higher translation accuracy than Google translate. There is still room for improvements in this research work. There is need for a rule that can translate English words that have more than one meaning due to their part of speech. That is, a rule that will be able to recognise which translation will be appropriate for such words based on their part of speech in a particular sentence. The translator sometimes have to apply more than on rule to translate a particular sentence. Therefore fewer rules are recommended to avoid confusion for the translator.