An investigation of semantic patterns in passwords
Veras Guimaraes, Rafael
MetadataShow full item record
The advent of large password leaks in recent years has exposed the security problems of passwords and enabled deeper empirical investigation of password patterns. Researchers have only touched the surface of patterns in password creation, having characterized patterns in terms of frequency, length, composition rules and, to some extent, syntactic patterns. The semantics of passwords remain largely unexplored. In this thesis, we aim to fill this gap by employing Natural Language Processing techniques to extract and leverage understanding of semantic patterns in passwords. We present the first framework for segmentation, semantic classification and semantic generalization of passwords and a model that captures the semantic essence of password samples. The results of our investigation demonstrate that the knowledge captured by our model can be used to crack more passwords than the state-of-the-art approach. In experiments limited to 3 billion guesses, our approach can guess 67% more passwords from the LinkedIn leak and 32% more passwords from the MySpace leak. Furthermore, we explore the implications of using date patterns in guessing attacks and investigate the lexical differences between standard English and the language used in passwords.