Feature Selection in Online Lexical Phishing URL Classifcation

Oliver Youle

Research output: Book/ReportOther report

510 Downloads (Pure)


Online banking fraud is a signicant source of criminal revenue in the modern world, causinglosses of over 133M in the UK alone during 2015. We have seen a five-fold increase in the number of monthly phishing attacks reported since 2007, with 466,065 unique attacks detectedin the second quarter of 2016. Current eorts to detect phishing websites draw upon a widevariety of sources for their data, including the HTML content of the website, and the DNSrecords associated with the domain. Although such approaches are precise, the processingrate is heavily limited by network latency and bandwidth.In this dissertation, we investigate the classication of phishing websites using only lexical features derived from the URL itself. We implement an online random forest classier, and using a novel lexical feature set we achieve a precision 0.98106 and a recall of 0.93594 ata processing rate that is over 660,000 times faster than current content-based approaches.In addition, the proposed system is also more robust to feature manipulation from the attacker and free from any potential security risks posed by fetching a URL and analysing the contents.
Original languageEnglish
Place of PublicationBath, U. K.
PublisherDepartment of Computer Science, University of Bath
Number of pages105
Publication statusPublished - May 2017

Publication series

NameDepartment of Computer Science Technical Report Series
ISSN (Electronic)1740-9497

Bibliographical note

Supervised by Peter Hall


Dive into the research topics of 'Feature Selection in Online Lexical Phishing URL Classifcation'. Together they form a unique fingerprint.

Cite this