Scopus İndeksli Yayınlar / Scopus Indexed Publications
Permanent URI for this collectionhttps://hdl.handle.net/11413/6358
Browse
Browsing Scopus İndeksli Yayınlar / Scopus Indexed Publications by Author "Adalı, Eşref"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Publication Metadata only A Comparative Study to Determine the Effective Window Size of Turkish Word Sense Disambiguation Systems(Springer, 233 Spring Street, New York, Ny 10013, United States, 2013) Adalı, Eşref; Tantuğ, Ahmet Cüneyd; İLGEN, BAHAR; 141812; 8786; 21833In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical SampleDataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window sizes based on several machine learning algorithms. We follow 2/3 splitting strategy (2/3 for training, 1/3 for test-ing) and determine the most frequently used words in the training part. After re-moving stop words, we repeated the experiments by using most frequent 100, 75, 50 and 25 content words of the training data. Our findings show that the usage of most frequent 75 words as features improves the accuracy in results for Turkish verbs. Similar results have been obtained for Turkish nouns when we use the most frequent 100 words of the training set. Considering this information, selected al-gorithms have been tested on varying window sizes {30, 15, 10 and 5}. Our find-ings show that Naive Bayes and Functional Tree methods yielded better accuracy results. And the window size +/-5 gives the best average results both for noun and the verb groups. It is observed that the best results of the two groups are 65.8 and 56% points above the most frequent sense baseline of the verb and noun groups respectively.Publication Metadata only Building up lexical sample dataset for Turkish word sense disambiguation(2012-07-02) Adalı, Eşref; Tantuğ, Ahmet Cüneyd; İLGEN, BAHAR; 141812; 8786; 21833Word Sense Disambiguation (WSD) has become even more important research area in recent years with the widespread usage of Natural Language Processing (NLP) applications. WSD task has two variants: “Lexical Sample” and “All Words” approaches. Lexical Sample approach disambiguates the occurrences of a small sample of target words that were previously selected, while in the latter all the words in a piece of text are disambiguated. In the scope of this work, a Lexical Sample Dataset for Turkish has been prepared. As a first step, highly ambiguous words in Turkish have been selected. Collection of text samples for chosen words has been completed. Five taggers have annotated the word senses. This paper summarizes the step-by-step building-up process of a Lexical Sample Dataset in Turkish and presents the results of some experiments on it.Publication Metadata only Exploring feature sets for Turkish word sense disambiguation(TUBİTAK Scientific & Technical Research Council Turkey, Ataturk Bulvarı No 221, Kavaklıdere, Ankara, 00000, Turkey, 2016) Adalı, Eşref; Tantuğ, Ahmet Cüneyd; İLGEN, BAHAR; 141812; 8786; 21833This paper presents an exploration and evaluation of a diverse set of features that influence word-sense disambiguation (WSD) performance. WSD has the potential to improve many natural language processing (NLP) tasks as being one of the most crucial steps in the area. It is known that exploiting effective features and removing redundant ones help improving the results. There are two groups of feature sets to disambiguate senses and select the most appropriate ones among a set of candidates: collocational and bag-of-words (BoW) features. We introduce the effects of using these two feature sets on the Turkish Lexical Sample Dataset (TLSD), which comprises the most ambiguous verb and noun samples. In addition to our results, joint setting of feature groups has been applied to measure additional improvement in the results. Our results suggest that joint setting of features improves accuracy up to 7%. The effective window size of the ambiguous words has been determined for noun and verb sets. Additionally, the suggested feature set has been investigated on a different corpus that had been used in the previous studies on Turkish WSD. The results of the experiments to investigate diverse morphological groups show that word root and the case marker are significant features to disambiguate senses.