Design and development of a machine learning-based framework for phishing website detection
MetadataShow full item record
Phishing is a social engineering cyber attack to steal personal information from users. Attackers solicit individuals to click phishing links by sending them emails or social media text messages with deceptive content. With the development and applications of machine learning technology, solutions for detecting phishing links have emerged. Subsequently, performance optimization achieved by machine learning-based approaches were predominantly limited to the datasets used to train the model, such as few open source datasets, poorly characterized data points, and outdated datasets. This thesis introduces a framework based on multiple phishing detection strategies, which are whitelist, blacklist, heuristic rules, and machine learning models, to improve accuracy and flexibility. In the machine learning-based method, three traditional models and three deep learning models are trained and compared the performance of their test results, and concluded that the Gated Recurrent Units (GRU) model achieved the highest accuracy of 99.18%. Furthermore, in the expert-driven heuristic rule-based strategy, seven new HTML-based features are proposed. Finally, a prototype has been developed, with a browser extension to display detection results in real-time.