Machine Learning Based Phishing Detection from URIs

Buber, Ebubekir; Demir, Önder; Diri, Banu; ŞAHİNGÖZ, ÖZGÜR KORAY

Publication:
Machine Learning Based Phishing Detection from URIs

Authors

Buber, Ebubekir

Demir, Önder

Diri, Banu

ŞAHİNGÖZ, ÖZGÜR KORAY

Date

2017-12

Type

conferenceObject

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Abstract

Due to the rapid growth of the Internet, users change their preference from traditional shopping to the electronic commerce. Instead of bank/shop robbery, nowadays, criminals try to find their victims in the cyberspace with some specific tricks. By using the anonymous structure of the Internet, attackers set out new techniques, such as phishing, to deceive victims with the use of false websites to collect their sensitive information such as account IDs, usernames, passwords, etc. Understanding whether a web page is legitimate or phishing is a very challenging problem, due to its semantics-based attack struc ture, which mainly exploits the computer users’ vulnerabilities. Although software companies launch new anti-phishing products, which use blacklists, heuristics, visual and machine learning-based approaches, these products cannot prevent all of the phishing attacks. In this paper, a real-time anti-phishing system, which uses seven different classification algorithms and natural language processing (NLP) based features, is proposed. The system has the following distinguishing properties from other studies in the literature: language independence, use of a huge size of phishing and legitimate data, real-time execution, detection of new websites, independence from third-party services and use of feature-rich classifiers. For mea suring the performance of the system, a new dataset is constructed, and the experimental results are tested on it. According to the experimental and comparative results from the implemented classification algorithms, Random Forest algorithm with only NLP based features gives the best performance with the 97.98% accuracy rate for detection of phishing URLs.

Keywords

Cyber Security, Phishing Attack, Machine Learning, Classification Algorithms, Cyber Attack Detection, Siber Güvenlik, Kimlik Avı Saldırısı, Makine Öğrenme, Sınıflandırma Algoritmaları, Siber Saldırı Tespiti

Publication:
Machine Learning Based Phishing Detection from URIs

Organizational Units

Program

Authors

Advisor

Date

Language

Type

Publisher:

Journal Title

Journal ISSN

Volume Title

Creative Commons license

Abstract

Description

Source:

Keywords:

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

6

Views

0

Downloads

Publication: Machine Learning Based Phishing Detection from URIs

Organizational Units

Program

Authors

Advisor

Date

Language

Type

Publisher:

Journal Title

Journal ISSN

Volume Title

Creative Commons license

Abstract

Description

Source:

Keywords:

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

6

Views

0

Downloads

Publication:
Machine Learning Based Phishing Detection from URIs