arXiv 2410.23262

EMMA: End-to-End Multimodal Model for Autonomous Driving

By Jyh-Jing Hwang, Runsheng Xu, et al.

Published 2024-10-30

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We introduce EMMA, an End-to-end Multimodal Model for Autonomous driving. Built upon a multi-modal large language model foundation like Gemini, EMMA directly maps raw camera sensor data into various driving-specific outputs, including planner trajectories, perception objects, and road graph elements. EMMA maximizes the utility of world knowledge from the pre-trained large language models, by representing all non-sen…

View the original paper on arXiv