A novel deep learning architecture and large-scale benchmark for robust object detection in traffic surveillance and analysis
MetadataShow full item record
Upstream technologies such as traffic surveillance provide helpful information to downstream intelligent transportation systems intending to improve the safety, mobility, and efficiency of road-users whilst reducing their environmental impact. Vision-based traffic surveillance sensors can provide more robust and rich information than other inroad or over-road sensors. However, they are more susceptible to challenges caused by environmental impacts, illumination changes, association issues, and viewpoint obstructions. In this thesis, we advance vision-based traffic surveillance for intelligent transportation systems by: (1) developing the “OTTSB Benchmark”, a new benchmark for largescale video-based traffic surveillance; (2) STIFFNet, a novel object detection algorithm for traffic surveillance that performs Spatial and Temporal self-attentIon-based deep Feature Fusion. The OTTSB contains over 1.7 million annotations across more than 135k frames captured from 30 different regions in 12 different countries. Our model STIFFNet outperformed the state-of-the-art results on the UA-DETRAC benchmark by 4.89% with an overall AP of 92.99%.