Загрузка...

Time Travel in a Text Box: Running a 13B Language Model Trained Only on Pre-1931 Text

Talkie is an Apache 2.0 13B language model trained exclusively on pre-1931 text. VRAM requirements, three model variants (incl. modern-web control), and a working CLI toolkit for running it locally.

Read the full post: https://www.gladlabs.io/posts/time-travel-in-a-text-box-running-a-13b-language-m-1320

## What You'll Learn * What "vintage" language models are and why training-data cutoffs change a model's voice * The actual VRAM requirements for running a 13B model locally (and how to fit it on consumer GPUs) * How a model trained on pre-1931 text differs from a model trained on the modern web * Concrete use cases for historical AI in writing, linguistic research, and dataset curation ### Why a Pre-1931 Language Model Is Useful Modern AI models are hungry for the latest information, scraping the web and ingesting real-time news. A counter-trend has emerged in the developer community that challenges the assumption that "more data" is always "better data." Enter Talkie — a 13B parameter language model from the [talkie-lm](https://github.com/talkie-lm/talkie) project, trained exclusively on text published before 1931. Where modern Large Language Models (LLMs) hallucinate current events or default to internet-formatted prose, Talkie produces output filtered through the vocabulary, syntax, and worldview of the early 20th century. The project is Apache 2.0 licensed and was built by Alec Radford, Nick Levine, and David Duvenaud. The repo currently lists three model variants: * **talkie-1930-13b-base** — base model, pre-1931 corpus only * **talkie-1930-13b-it** — instruction-tuned variant; the instruction-following dataset itself is built from pre-1931 reference works (etiquette manuals, letter-writing manuals, encyclopedias, and poetry collections) * **talkie-web-13b-base** — same architecture trained on FineWeb (modern web data) as a control for comparison That third variant is the most interesting research artifact. It lets you A/B-test the effect of training-data era while holding architecture and parameter count constant. ### What Makes a Model "Vintage" A vintage model is one trained on data strictly before a specific cutoff date. Talkie's cutoff is pre-1931 — every token in the training corpus comes from books, periodicals, and documents published before that point. Ask Talkie about Python and the response will lean toward the snake. Ask it about cloud computing and you'll get something closer to weather. The model has no concept of computers, the internet, climate change, or any geopolitical event after 1930. The architecture is the same transformer-based GPT lineage modern LLMs descend from — Alec Radford's involvement is consistent with that. What changes is the training corpus. Where contemporary models are tuned on massive, mixed-era datasets to maximize general utility, Talkie is tuned to simulate a specific historical era at the cost of any post-1931 knowledge. That memory hole is the feature, not a bug. It's a more deliberate, principled version of the trade-off every fine-tuned model makes. ### Hardware: What It Actually Takes to Run 13B A 13B parameter model is significantly larger than the 7B–8B models common in casual local AI experimentation (Llama 3 8B, Mistral 7B). Memory requirements depend on the precision you load it at: * **fp16 (full precision):** ~26 GB VRAM. Needs an RTX 3090 / 4090 / 5090, an A6000, or two GPUs with model parallelism. * **int8 quantization:** ~13 GB VRAM. Fits on a 16 GB card (RTX 4060 Ti 16 GB, RTX 4080). * **q4 quantization:** ~7-8 GB VRAM. Fits on a 12 GB card (RTX 3060 12 GB, RTX 4070). If you don't have a GPU, llama.cpp can run a q4-quantized 13B model on CPU and system RAM, though token throughput drops from hundreds of tokens per second to single digits. Acceptable for batch analysis, painful for interactive use. The talkie-lm package handles model download from HuggingFace, multi-turn chat, streaming, and an interactive CLI. For developers who already have a local LLM stack, the workflow mirrors what you'd do with any other 13B model: pull the weights, point your inference engine at them, query. If you've used Ollama or llama.cpp for modern models, the muscle memory transfers directly. ### What Actually Changes vs. a Modern Model The technical setup is mostly the same. What's different is the output. A model trained on pre-1931 English will lean toward the vocabulary, sentence rhythm, and rhetorical patterns of that era. The training corpus included formal written prose — books, periodicals, reference works — without any internet-formatted content, modern instructional templates, or the "AI voice" that emerges from years of post-training instruction tuning on modern datasets. That voice difference is exactly what the project optimizes for. The fact that talkie-w

Видео Time Travel in a Text Box: Running a 13B Language Model Trained Only on Pre-1931 Text канала Glad Labs
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять