Publication:
Exploring feature sets for Turkish word sense disambiguation

dc.contributor.authorAdalı, Eşref
dc.contributor.authorTantuğ, Ahmet Cüneyd
dc.contributor.authorİLGEN, BAHAR
dc.contributor.authorID141812tr_TR
dc.contributor.authorID8786tr_TR
dc.contributor.authorID21833tr_TR
dc.date.accessioned2018-07-19T13:33:34Z
dc.date.available2018-07-19T13:33:34Z
dc.date.issued2016
dc.description.abstractThis paper presents an exploration and evaluation of a diverse set of features that influence word-sense disambiguation (WSD) performance. WSD has the potential to improve many natural language processing (NLP) tasks as being one of the most crucial steps in the area. It is known that exploiting effective features and removing redundant ones help improving the results. There are two groups of feature sets to disambiguate senses and select the most appropriate ones among a set of candidates: collocational and bag-of-words (BoW) features. We introduce the effects of using these two feature sets on the Turkish Lexical Sample Dataset (TLSD), which comprises the most ambiguous verb and noun samples. In addition to our results, joint setting of feature groups has been applied to measure additional improvement in the results. Our results suggest that joint setting of features improves accuracy up to 7%. The effective window size of the ambiguous words has been determined for noun and verb sets. Additionally, the suggested feature set has been investigated on a different corpus that had been used in the previous studies on Turkish WSD. The results of the experiments to investigate diverse morphological groups show that word root and the case marker are significant features to disambiguate senses.tr_TR
dc.identifier.issn1300-0632
dc.identifier.other1303-6203
dc.identifier.scopus2-s2.0-84978280214
dc.identifier.scopus2-s2.0-84978280214en
dc.identifier.urihttps://doi.org/10.3906/elk-1408-77
dc.identifier.urihttps://hdl.handle.net/11413/2209
dc.identifier.wos378097800079
dc.identifier.wos378097800079en
dc.language.isoen_UStr_TR
dc.publisherTUBİTAK Scientific & Technical Research Council Turkey, Ataturk Bulvarı No 221, Kavaklıdere, Ankara, 00000, Turkeytr_TR
dc.relationTurkish Journal of Electrical Engineering and Computer Sciencestr_TR
dc.subjectBag-of-words featurestr_TR
dc.subjectcollocational featurestr_TR
dc.subjectfeature selectiontr_TR
dc.subjectsupervised methodstr_TR
dc.subjectword-sense disambiguationtr_TR
dc.titleExploring feature sets for Turkish word sense disambiguationtr_TR
dc.typeArticle
dspace.entity.typePublication
local.indexed.atscopus
local.indexed.atwos
relation.isAuthorOfPublication21454e00-d332-448d-8e35-698b7d3cc9ee
relation.isAuthorOfPublication.latestForDiscovery21454e00-d332-448d-8e35-698b7d3cc9ee

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: