Multi-character prediction using attention
MetadataShow full item record
We propose a computational attention approach to localize and classify characters in a sequence in a given image. Our approach combines spatial soft-attention with attention regularization and learns “where-to-look” to carry out the sequence classification task. The image is first passed through a Convolutional Neural Network (CNN) that serves as feature extractor. Then at each Recurrent Neural Network (RNN) time step, the attention mechanism attends to the relevant features sequentially to make predictions. The attention mechanism also includes a start and stop state, which instructs the mechanism to start looking and guides it when to stop (e.g., when the sequence has been exhausted). We demonstrate our approach on two sequence detection tasks—multi-digit classification and CAPTCHA unlocking—using the publicly available Street View House Numbers (SVHN) dataset and a custom CAPTCHA dataset. The experiments confirm our hypothesis that the network learns to attend to relevant features by minimizing the loss between the ground truth attention masks and the predicted attention masks.