Show simple item record

dc.contributor.advisorEl-Khatib, Khalil
dc.contributor.advisorSankaranarayanan, Karthik
dc.contributor.authorLaughlin, Brandon
dc.date.accessioned2021-05-25T19:28:46Z
dc.date.accessioned2022-03-29T19:06:38Z
dc.date.available2021-05-25T19:28:46Z
dc.date.available2022-03-29T19:06:38Z
dc.date.issued2021-04-01
dc.identifier.urihttps://hdl.handle.net/10155/1292
dc.description.abstractNatural language processing algorithms (NLP) have become an essential approach for processing large amounts of textual information with applications such as spam, phishing and content moderation. Malicious actors can craft manipulated inputs to fool NLP classifiers into making incorrect predictions. A large challenge with evaluating these adversarial attacks is the trade-o_ between attack efficiency and text quality. Higher constraints on the attack search space will improve text quality but reduce the attack success rate. In this thesis, I introduce a framework for the evaluation of NLP classifier robustness. Black-box attack algorithms are paired with a threat modelling system to apply a customizable set of constraints to the adversarial generation process. I introduce a mixed-method experimental design approach that combines metrics that compare how many adversarial documents can be made versus the impact the attack has on the text's quality. Measuring the attack efficiency involves combining the computational cost and success rate of the attack. To measure the text quality, an experimental study is run in which human participants report their subjective perception of text manipulation. I present a set of equations to reconcile the trade-offs between these tests to find an optimal balance. This pairing bridges the automated evaluation of the classifier decisions with the semantic insight of human reviewers. The methodology is then extended to evaluate adversarial training as a defence method using the threat modelling system. The framework is also paired with a collection of visualization tools to provide greater interpretability. Domain-agnostic tools for classifier behaviour are first presented, followed by an interactive document viewer that enables exploration of the attack search space and word-level feature importance. The framework proposed in this thesis supports any black-box attack and is model-agnostic, which offers a wide range of applicability. The end objective is a more unified, guided and transparent way to evaluate classifier robustness that is flexible and customizable.en
dc.description.sponsorshipUniversity of Ontario Institute of Technologyen
dc.language.isoenen
dc.subjectAdversarial machine learningen
dc.subjectRobustness evaluationen
dc.subjectNatural language processingen
dc.subjectText classificationen
dc.subjectInformation visualizationen
dc.titleA mixed-method approach to analyze the robustness of natural language processing classifiersen
dc.typeDissertationen
dc.degree.levelDoctor of Philosophy (PhD)en
dc.degree.disciplineComputer Scienceen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record