Adversarial Tokenization

Authors:
Renato Geh,* Zilei Shao,* Guy Van den Broeck

Abstract:
Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the standard Llama tokenization of penguin is [p, enguin], yet [peng, uin]
is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.

Notes:
Non-archival presentation at TokShop 2025,
Archival publication at ACL 2025

Видео Adversarial Tokenization канала Tokenization Workshop (TokShop)

Комментарии отсутствуют

Информация о видео

14 июля 2025 г. 22:54:56

00:10:00

Tokenization Workshop (TokShop)

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Adversarial Tokenization

2025 Panel Discussion: Future of Tokenization

Causal Estimation of Tokenisation Bias

2025 Keynote: "Insights from Pixel Language Modeling"

MorphTok: Morphologically Grounded Tokenization for Indic languages

Pitfalls, Subtleties, and Techniques in Automata-Based Subword-Level Constrained Generation

2025 Keynote: "Beat them? Join them? Fix them? Tokenization Research in a Downstream World"

Tokenisation is NP-Complete

InCa and InDia: Inline Casing and Diacritization Preprocessing For Robust-to-Noise Tokenization ...

2025 Keynote: "Learning Dynamic Segmentation and Compression of Sequences in Transformer LLMs"

HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

GeneticBPE: Motif-Preserving Tokenization for Robust miRNA Modeling