Talk2Car: Taking Control of Your Self-Driving Car

The Talk2Car dataset finds itself at the intersection of various research domains, promoting the development of cross-disciplinary solutions for improving the state-of-the-art in grounding natural language into visual space. The annotations were gathered with the following aspects in mind:

  1. Free-form high quality natural language commands, that stimulate the development of solutions that can operate in the wild.
  2. A realistic task setting. Specifically, we consider an autonomous driving setting, where a passenger can control the actions of an Autonomous Vehicle by giving commands in natural language.
  3. The Talk2Car dataset was build on top of the nuScenes dataset to include an extensive suite of sensor modalities, i.e. semantic maps, GPS, LIDAR, RADAR and 360-degree RGB images annotated with 3D bounding boxes. Such variety of input modalities sets the object referral task on the Talk2Car dataset apart from related challenges, where additional sensor modalities are generally missing.

arXiv link Github

Visual Grounding task

The self-driving car story is often one where passengers seemingly get hijacked while traveling to their destination, leaving out the possibility to deviate from the original plan. Interactions between passengers and vehicle can still be desirable though. People change their minds, they make unexpected stops, or they might prefer to park in another parking spot. The Talk2Car dataset was developed with these type of scenarios in mind.

The Talk2Car dataset provides natural language commands on top of the nuScenes dataset. Every command describes an action for the autonomous vehicle that is grounded in the visual plane by referring to an object visible through the front camera. We evaluate the Visual Grounding task by measuring the average precision (AP) on the predicted bounding box coordinates.

Dataset Statistics

Examples

For every command, the referred object is indicated in bold. For every image, the referred object from the command is indicated with a red bounding box.

You can park up ahead behind the silver car, next to that lamp post with the orange sign on it

My friend is getting out of the car. That means we arrived at our destination! Stop and let me out too!

Turn around and park in front of that vehicle in the shade

Yeah that would be my son on the stairs next to the bus. Pick him up please

If you decide to use the dataset, please cite us as follows

@inproceedings{deruyttere2019talk2car, title={Talk2Car: Taking Control of Your Self-Driving Car}, author={Deruyttere, Thierry and Vandenhende, Simon and Grujicic, Dusan and Van Gool, Luc and Moens, Marie Francine}, booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)}, pages={2088--2098}, year={2019}}

Part of