RoadTones: Tone Controllable Text Generation from Road Event Videos

Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style. This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy.

To mitigate this, we introduce a comprehensive dataset-model-evaluation suite for tone-controllable road video captioning. Our human-validated data generation pipeline expands road-video corpora with diverse tonal annotations and multi-tone captions, yielding the RoadTones-51K dataset.

We propose RoadTones-VL-CoT, a controllable video-to-text model that also generates tone-conditioned Chain-of-Thought intermediate drafts for interpretability. We also introduce RoadTones-Eval, a new evaluation suite that jointly measures factual consistency and tone adherence. Together, these contributions lay the foundation for context-sensitive tone-controllable video captioning.

Project page: https://roadtones.github.io/

Видео RoadTones: Tone Controllable Text Generation from Road Event Videos канала Siddhi Lipare

Комментарии отсутствуют

Информация о видео

20 мая 2026 г. 22:15:12

00:05:04

Siddhi Lipare

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала