Abstract
Online banking fraud is a signicant source of criminal revenue in the modern world, causinglosses of over 133M in the UK alone during 2015. We have seen a five-fold increase in the number of monthly phishing attacks reported since 2007, with 466,065 unique attacks detectedin the second quarter of 2016. Current eorts to detect phishing websites draw upon a widevariety of sources for their data, including the HTML content of the website, and the DNSrecords associated with the domain. Although such approaches are precise, the processingrate is heavily limited by network latency and bandwidth.In this dissertation, we investigate the classication of phishing websites using only lexical features derived from the URL itself. We implement an online random forest classier, and using a novel lexical feature set we achieve a precision 0.98106 and a recall of 0.93594 ata processing rate that is over 660,000 times faster than current content-based approaches.In addition, the proposed system is also more robust to feature manipulation from the attacker and free from any potential security risks posed by fetching a URL and analysing the contents.
| Original language | English |
|---|---|
| Place of Publication | Bath, U. K. |
| Publisher | Department of Computer Science, University of Bath |
| Number of pages | 105 |
| Publication status | Published - May 2017 |
Publication series
| Name | Department of Computer Science Technical Report Series |
|---|---|
| ISSN (Electronic) | 1740-9497 |
Bibliographical note
Supervised by Peter HallUN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 16 Peace, Justice and Strong Institutions
Fingerprint
Dive into the research topics of 'Feature Selection in Online Lexical Phishing URL Classifcation'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS