Modeling WordNet Type Thesaurus for Uzbek Language Semantic Dictionary

: These days creating the corpus of texts for Uzbek language, creating and developing linguistic databases, search-engine systems – are one of the crucial tasks of computational linguistics. Particularly, electronic dictionary-thesauruses, semantic dictionaries are one of them. Dictionary-thesaurus formation structure for Uzbek language, transferring the terminological dictionary into the e-version and implementing rules for establishing semantic relations between words where it gives a chance to establish automation linguistic processes of dictionary-thesauruses, which is the foundation of linguistic databases. Analyzing logical structure of paper-based dictionary thesauruses has given a chance to formalize its structure and creating rules for converting to e-version of dictionary-thesaurus syllables by using predicates language. Descriptors system is suggested in PROLOG language rules set for constructing e-version of dictionary – syllables


Introduction
The main aim of the paper is analyzing logical structure of Uzbek dictionary-thesaurus, formation its structure and developing conversion rules of existent paper-based Uzbek dictionaries to electronic dictionary-thesaurus.
Nowadays, constructing Uzbek language thesaurus level semantic dictionaries is not developing in a high temp. One of the main issues in last years as regards NLP activities is the increasingly fast development of generic language resources. A lot of such resources, including both software and lingware items (lexicons, lexical databases, grammars, corpora marked in several ways) have been made available for research and industrial applications [1]. Special interest presents, for knowledge-based NLP tasks, the availability of wide coverage ontologies. Princeton WORDNET, BABLE NET, FRAMENET and European Word NET are considered one of the most known ontologies. The construction of a WordNET for a language depends on the lexical source available. Building the lexical source manually can be very costly. However, it accuracy will be high.
Existent dictionaries are only limited by fulfilling databases [1][2][3]. However, formalization of Uzbek language linguistics, implementing linguistic processors for automation process of developing electronic thesauruses are still crucial task. Some of the authors for other languages has shown its importance to use XML type special expanded language [5,6], in turn, this gives a chance to work with structured data. The paper called Building a Wordnet for Turkish [12] which is used the Princeton model to build Turkish wordNET and Uzbek language is also in Turkic language category.
Having structure in abstract lexicographic system is obvious and it has two parts: left (registry) and right (interpretation). Only right sides of the dictionary differentiate its meaning. However, thesaurus has deeper structure where it can establish relations for both (left and right) sides. Thereby, dictionary is such types of text where lexical description of language (s) is described systematic and structured.
The challenge of building dictionaries is that, it is not always possible to describe exactly all its elements by using above mentioned method. There are very many uncertain elements in the real dictionary structure, which in turn, it will sometimes be challenging problem to solve them.

Method
The set of structured elements of dictionary and their association are include the dictionary's meta-language feature. Determining its systematic properties can be a foundation of developing formal model of thesaurus. The process of building dictionary meta-language is the typical method of describing lexicographic. Info-lexicographic model of any lexicographic system can be described as follows [6]: Here, -is the lexicographic system which includes dictionary unit sets; Λ -left side part unit set of dictionary; -right side part unit set of dictionary; Η -reflection, that is, fitting the set Λ to the set : Η reflection in describing lexicographic system appears by establishing conformity function with its left side to right side and provides dichotomous fullness in building appropriate thesaurus.
Some of the elements is used in the reflection of : Λ → in the dictionary and appears in specific terms. The left and right sides of the lexicography are not only formal placement, but also have to implement them with functional relations. Lexicographic sorted, indexed set is determined in set. We can observe that, current Uzbek language dictionaries are one element set. Resemble to paper-based dictionaries, lexicographic model structure also have same approach that begins from headword and it serves as an identification (ID) of dictionary unit in lexicographic system.
The reasons of strong agglutinative existence in building Uzbek language, words are constructed by adding prefixes and suffixes. For that reason, it is important to mention these relations separately for thesaurus dictionary, as an example, "mansab"-"mansabdor", "suv-suvchi".

Result
We can determine type of pointer of A automorphism dictionary syllables, for example, " . ". This type automorphism of dictionary syllables determines such reflection, where it is as follows:V → . As its ID, usually any of pointer pseudo-word is used (in the example -" . "), thus, it correspondingly puts to V . Need to mention that, constructing automorphism is more complex than above example. Firstly, number of pointer can be more than one, that is to say, it can have recurrent property: Moreover, automorphic reflection can be constructed as following: , , , … . , , , ….
In Uzbek language also, thesaurus structure elemental unit is -dictionary unit of descriptor and it is constructed as alphabetical order. We can describe thesaurus dictionary unit as following for Uzbek language: Here the title descriptor; ! " -is alphabetically sorted conditional synonyms set of given title descriptor and they together consist conditional equivalence class; ! # -in every title descriptor is connected with "tur-mansub" relation (hyperonym, hyponym) and alphabetically sorted descriptors set; ! $ -in every title descriptor is connected with "butunqism" relation (member, member-of, meronym) and alphabetically sorted descriptors set; ! % -in every title descriptor is connected with at least following paradigmatic relation and alphabetically sorted descriptors set: "sababoqibat", "oqibat -sabab", "funksional moslik" (associative relations); ! & -in every title descriptor is connected with "antonim"(antonym) relation and alphabetically sorted descriptors set; ! ( -because of strong agglunativity in every title descriptor is connected with affixes (constructing words by adding affixes) and alphabetically sorted descriptors set; These relations establish relationships of X word to Y word. By these relationships, semantic net of language is constructed. Any one of presented sets can be single value or empty, dictionary even may not be in unit.

Discussion
The set ! " consist conditional equivalence class together with title description and it is also descriptor. This ! " set Taking into account the fact that most of the information base of the Internet is based on symbolic information, it is desirable to use Prolog [8] to use the logical programming language to work with symbolic structures, text files, and intelligent computer programs. Prolog is an easy programming language to describe objects and their relationships when looking for a solution.

Conclusion
Analyzing logical structure of paper-based dictionary thesauruses has given a chance to formalize its structure and creating rules for converting to e-version of dictionarythesaurus syllables by using predicates language. Descriptors system is suggested in PROLOG language rules set for constructing e-version of dictionary -syllables.
The model of prepared dictionary-thesaurus and the rules for converting dictionary-thesaurus into e-version form can be a base of constructing lexicographic process. When forming a lexicographical database, the elements of the system structure are defined as the database elements and its search parameters. The process of forming the lexicographical database on the basis of forming the system elements of the system leads to a fully automated procedure.
The dictionary can be used as part of the linguistic support of an automated system built into a suitable subject area.