Загрузка...

"Directory Pipeline"—A Tool for Turning Historical Digital Collections into Structured Data

Directory Pipeline is an LLM-assisted, IIIF-native proof-of-concept tool for turning digitized collections items structured, browsable CSV interfaces, with snippet links back to the original source material. With any IIIF Manifest URL (from the Library of Congress, Internet Archive, NYPL Digital Collections, or any institution that publishes a public IIIF manifest) it uses a meta-prompting technique to select a few example entry pages, and then uses evaluation of those pages to generate item-specific OCR or HTR instructions, as well as item-specific data extraction instructions. It's built for digitized historical directories—city directories, gazetteers, trade directories—but works on just about any historical document with regular entry-like structure, including handwritten log entries, manuscripts, and more.

Josh Hadro

CNI Pre-Recorded Project Briefing Series
Spring 2026 Edition
More information: https://www.cni.org/news/ppbs_editionguide_spring26

*Subscribe to our channel*: https://www.youtube.com/c/cnivideo?sub_confirmation=1

*Stay connected with us*
Website: https://www.cni.org
Facebook: https://www.facebook.com/cni.org/
LinkedIn: https://www.linkedin.com/company/coalition-for-networked-information/
BlueSky: https://bsky.app/profile/cni-org.bsky.social
Mastodon: https://mastodon.social/@cni
X: https://x.com/cni_org
Subscribe to our listserv CNI-ANNOUNCE: https://www.cni.org/resources/follow-cni/cni-announce

Видео "Directory Pipeline"—A Tool for Turning Historical Digital Collections into Structured Data канала CNI: Coalition for Networked Information
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять