A mixed-method approach to analyze the robustness of natural language processing classifiers

Laughlin, Brandon

dc.contributor.advisor	El-Khatib, Khalil
dc.contributor.advisor	Sankaranarayanan, Karthik
dc.contributor.author	Laughlin, Brandon
dc.date.accessioned	2021-05-25T19:28:46Z
dc.date.accessioned	2022-03-29T19:06:38Z
dc.date.available	2021-05-25T19:28:46Z
dc.date.available	2022-03-29T19:06:38Z
dc.date.issued	2021-04-01
dc.identifier.uri	https://hdl.handle.net/10155/1292
dc.description.abstract	Natural language processing algorithms (NLP) have become an essential approach for processing large amounts of textual information with applications such as spam, phishing and content moderation. Malicious actors can craft manipulated inputs to fool NLP classifiers into making incorrect predictions. A large challenge with evaluating these adversarial attacks is the trade-o_ between attack efficiency and text quality. Higher constraints on the attack search space will improve text quality but reduce the attack success rate. In this thesis, I introduce a framework for the evaluation of NLP classifier robustness. Black-box attack algorithms are paired with a threat modelling system to apply a customizable set of constraints to the adversarial generation process. I introduce a mixed-method experimental design approach that combines metrics that compare how many adversarial documents can be made versus the impact the attack has on the text's quality. Measuring the attack efficiency involves combining the computational cost and success rate of the attack. To measure the text quality, an experimental study is run in which human participants report their subjective perception of text manipulation. I present a set of equations to reconcile the trade-offs between these tests to find an optimal balance. This pairing bridges the automated evaluation of the classifier decisions with the semantic insight of human reviewers. The methodology is then extended to evaluate adversarial training as a defence method using the threat modelling system. The framework is also paired with a collection of visualization tools to provide greater interpretability. Domain-agnostic tools for classifier behaviour are first presented, followed by an interactive document viewer that enables exploration of the attack search space and word-level feature importance. The framework proposed in this thesis supports any black-box attack and is model-agnostic, which offers a wide range of applicability. The end objective is a more unified, guided and transparent way to evaluate classifier robustness that is flexible and customizable.	en
dc.description.sponsorship	University of Ontario Institute of Technology	en
dc.language.iso	en	en
dc.subject	Adversarial machine learning	en
dc.subject	Robustness evaluation	en
dc.subject	Natural language processing	en
dc.subject	Text classification	en
dc.subject	Information visualization	en
dc.title	A mixed-method approach to analyze the robustness of natural language processing classifiers	en
dc.type	Dissertation	en
dc.degree.level	Doctor of Philosophy (PhD)	en
dc.degree.discipline	Computer Science	en

Files in this item

Name:: Laughlin_Brandon.pdf
Size:: 4.605Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Dissertations [71]
Doctoral Dissertations (FSCI)
Electronic Theses and Dissertations [1478]
Electronic Theses and Dissertations

Show simple item record