Multi-character prediction using attention

Baeenh, Mohmmed

View/Open

Baeenh_Mohmmed.pdf (10.62Mb)

Date

2020-01-01

Author

Baeenh, Mohmmed

Metadata

Show full item record

Abstract

We propose a computational attention approach to localize and classify characters in a sequence in a given image. Our approach combines spatial soft-attention with attention regularization and learns “where-to-look” to carry out the sequence classification task. The image is first passed through a Convolutional Neural Network (CNN) that serves as feature extractor. Then at each Recurrent Neural Network (RNN) time step, the attention mechanism attends to the relevant features sequentially to make predictions. The attention mechanism also includes a start and stop state, which instructs the mechanism to start looking and guides it when to stop (e.g., when the sequence has been exhausted). We demonstrate our approach on two sequence detection tasks—multi-digit classification and CAPTCHA unlocking—using the publicly available Street View House Numbers (SVHN) dataset and a custom CAPTCHA dataset. The experiments confirm our hypothesis that the network learns to attend to relevant features by minimizing the loss between the ground truth attention masks and the predicted attention masks.

URI

https://hdl.handle.net/10155/1132

Collections

Electronic Theses and Dissertations [1478]
Master Theses & Projects [326]