The Talk2Car dataset finds itself at the intersection of various research domains, promoting the development of cross-disciplinary solutions for improving the state-of-the-art in grounding natural language into visual space. The annotations were gathered with the following aspects in mind:
Visual Grounding task
The self-driving car story is often one where passengers seemingly get hijacked while traveling to their destination, leaving out the possibility to deviate from the original plan.
Interactions between passengers and vehicle can still be desirable though.
People change their minds, they make unexpected stops, or they might prefer to park in another parking spot.
The Talk2Car dataset was developed with these type of scenarios in mind.
The Talk2Car dataset provides natural language commands on top of the nuScenes dataset. Every command describes an action for the autonomous vehicle that is grounded in the visual plane by referring to an object visible through the front camera. We evaluate the Visual Grounding task by measuring the average precision (AP) on the predicted bounding box coordinates.
For every command, the referred object is indicated in bold. For every image, the referred object from the command is indicated with a red bounding box.
You can park up ahead behind the silver car, next to that lamp post with the orange sign on it
My friend is getting out of the car. That means we arrived at our destination! Stop and let me out too!
Turn around and park in front of that vehicle in the shade
Yeah that would be my son on the stairs next to the bus. Pick him up please
If you decide to use the dataset, please cite us as follows