Building up lexical sample dataset for Turkish word sense disambiguation

Adalı, Eşref; Tantuğ, Ahmet Cüneyd; İLGEN, BAHAR

Publication:
Building up lexical sample dataset for Turkish word sense disambiguation

Date

2012-07-02

Authors

Adalı, Eşref

Tantuğ, Ahmet Cüneyd

İLGEN, BAHAR

Abstract

Word Sense Disambiguation (WSD) has become even more important research area in recent years with the widespread usage of Natural Language Processing (NLP) applications. WSD task has two variants: “Lexical Sample” and “All Words” approaches. Lexical Sample approach disambiguates the occurrences of a small sample of target words that were previously selected, while in the latter all the words in a piece of text are disambiguated. In the scope of this work, a Lexical Sample Dataset for Turkish has been prepared. As a first step, highly ambiguous words in Turkish have been selected. Collection of text samples for chosen words has been completed. Five taggers have annotated the word senses. This paper summarizes the step-by-step building-up process of a Lexical Sample Dataset in Turkish and presents the results of some experiments on it.

Keywords

Natural Language Processing, Word Sense Disambiguation, Lexical Sample, Feature Selection, Machine Learning

URI

https://doi.org/10.1109/INISTA.2012.6247026
https://hdl.handle.net/11413/2936

Collections

Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering
Scopus İndeksli Yayınlar / Scopus Indexed Publications

Full item page

Publication:
Building up lexical sample dataset for Turkish word sense disambiguation

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication: Building up lexical sample dataset for Turkish word sense disambiguation

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication:
Building up lexical sample dataset for Turkish word sense disambiguation