A modular interface framework for multimodal annotations and visualizations in Human-AI collaboration
Modular is a web-based annotation, visualization, and inference software plat-form for computational language and vision research. The platform enables researchers to set up an interface for efficiently annotating language and vision datasets, visualizing the predictions made by a machine learning model, and interacting with an intelligent system. Artificial intelligence (AI) research, including machine learning, computer vision, and natural language processing, requires large amounts of annotated data. The current research and development pipeline involves each group collecting their own datasets using an annotation tool tailored specifically to their needs, followed by a series of engineering efforts in loading other external datasets and developing their own interfaces, often mimicking some components of existing annotation tools. Extensible and customizable as required by individual projects, the framework has been successfully applied to a number of research efforts in human-AI collaboration, including commonsense grounding of language and vision data, conversational AI for collaboration with human users, and explainable AI in improving interpretability of the AI system. Facilitated by the aforementioned Modular framework, the dissertation ex-amines a notable set of opportunities that inspire the new, productive symbiosis between human users and AI agents, where the two parties can successfully complete a complex task together and mutually benefit in providing advantages missing from the other party. Finally, the dissertation sets out to evaluate whether human users can establish a level of appropriate trust and reliance through AI explanation.