Online banking fraud is a signicant source of criminal revenue in the modern world, causinglosses of over 133M in the UK alone during 2015. We have seen a five-fold increase in the number of monthly phishing attacks reported since 2007, with 466,065 unique attacks detectedin the second quarter of 2016. Current eorts to detect phishing websites draw upon a widevariety of sources for their data, including the HTML content of the website, and the DNSrecords associated with the domain. Although such approaches are precise, the processingrate is heavily limited by network latency and bandwidth.In this dissertation, we investigate the classication of phishing websites using only lexical features derived from the URL itself. We implement an online random forest classier, and using a novel lexical feature set we achieve a precision 0.98106 and a recall of 0.93594 ata processing rate that is over 660,000 times faster than current content-based approaches.In addition, the proposed system is also more robust to feature manipulation from the attacker and free from any potential security risks posed by fetching a URL and analysing the contents.
|Name||Department of Computer Science Technical Report Series|