Original title: LingoQA: Video Question Answering for Autonomous Driving
Authors: Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, Oleg Sinavski
In an article, the challenge of gaining public acceptance for autonomous driving is addressed. The main issue is the lack of transparency in the decision-making process. To bridge this gap, researchers have explored the use of video question-answering (QA) in natural language. However, evaluating the performance of these QA models has been difficult due to the lack of comprehensive benchmarks. To address this, the researchers have developed LingoQA, a benchmark specifically designed for Video QA in autonomous driving. The benchmark has been validated with a high Spearman correlation coefficient of 0.95 with human evaluations. Additionally, the researchers have introduced a dataset of central London consisting of 419k samples, which is made available for further research. They have also established a baseline vision-language model and conducted detailed studies on its performance. Overall, the introduction of LingoQA and the dataset aims to improve the understanding and evaluation of Video QA models for autonomous driving.
Original article: https://arxiv.org/abs/2312.14115