• Login
    View Item 
    •   eScholar Home
    • Faculty of Science
    • Doctoral Dissertations
    • View Item
    •   eScholar Home
    • Faculty of Science
    • Doctoral Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A mixed-method approach to analyze the robustness of natural language processing classifiers

    Thumbnail
    View/Open
    Laughlin_Brandon.pdf (4.605Mb)
    Date
    2021-04-01
    Author
    Laughlin, Brandon
    Metadata
    Show full item record
    Abstract
    Natural language processing algorithms (NLP) have become an essential approach for processing large amounts of textual information with applications such as spam, phishing and content moderation. Malicious actors can craft manipulated inputs to fool NLP classifiers into making incorrect predictions. A large challenge with evaluating these adversarial attacks is the trade-o_ between attack efficiency and text quality. Higher constraints on the attack search space will improve text quality but reduce the attack success rate. In this thesis, I introduce a framework for the evaluation of NLP classifier robustness. Black-box attack algorithms are paired with a threat modelling system to apply a customizable set of constraints to the adversarial generation process. I introduce a mixed-method experimental design approach that combines metrics that compare how many adversarial documents can be made versus the impact the attack has on the text's quality. Measuring the attack efficiency involves combining the computational cost and success rate of the attack. To measure the text quality, an experimental study is run in which human participants report their subjective perception of text manipulation. I present a set of equations to reconcile the trade-offs between these tests to find an optimal balance. This pairing bridges the automated evaluation of the classifier decisions with the semantic insight of human reviewers. The methodology is then extended to evaluate adversarial training as a defence method using the threat modelling system. The framework is also paired with a collection of visualization tools to provide greater interpretability. Domain-agnostic tools for classifier behaviour are first presented, followed by an interactive document viewer that enables exploration of the attack search space and word-level feature importance. The framework proposed in this thesis supports any black-box attack and is model-agnostic, which offers a wide range of applicability. The end objective is a more unified, guided and transparent way to evaluate classifier robustness that is flexible and customizable.
    URI
    https://hdl.handle.net/10155/1292
    Collections
    • Doctoral Dissertations [67]
    • Electronic Theses and Dissertations [1336]

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of eScholarCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV