A tree learning approach to web document sectional hierarchy extraction

Pembe, F.Canan; Göngör, Tunga

Publication:
A tree learning approach to web document sectional hierarchy extraction

Date

2010

Authors

Pembe, F.Canan

Göngör, Tunga

Type

Book chapter

Abstract

There is an increasing availability of documents in electronic form due to the widespread use of the Internet. Hypertext Markup Language (HTML) which is mostly concerned with the presentation of documents is still the most commonly used format on the Web, despite the appearance of semantically richer markup languages such as XML. Effective processing of Web documents has several uses such as the display of content on small-screen devices and summarization. In this paper, we investigate the problem of identifying the sectional hierarchy of a given HTML document together with the headings in the document. We propose and evaluate a learning approach suitable to tree representation based on Support Vector Machines.

Keywords

Machine Learning, Document Structure, World Wide Web, Hypertext Markup Language, Makine Öğrenme, Belge Yapısı, Dünya Çapında Ağ, Köprü Metni Biçimlendirme Dili

URI

https://hdl.handle.net/11413/6307

Collections

Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering
WoS İndeksli Yayınlar / WoS Indexed Publications

Publication:
A tree learning approach to web document sectional hierarchy extraction

Date

Organizational Units

KU Authors

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Type

Publisher

Research Projects

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

8

Views

0

Downloads

Publication: A tree learning approach to web document sectional hierarchy extraction

Date

Organizational Units

KU Authors

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Type

Publisher

Research Projects

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

8

Views

0

Downloads

Publication:
A tree learning approach to web document sectional hierarchy extraction