AI & ML interests
🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.
Recent Activity
📚 BigLAM
A community-run home for machine-learning-ready datasets from libraries, archives, and museums.
Most cultural-heritage data wasn't originally prepared with ML workflows in mind — it lives in catalogue systems, IIIF endpoints, METS/MODS records, and various idiosyncratic formats that each institution has its own version of. BigLAM is a place where those datasets get repackaged into formats ML practitioners can actually load and work with, contributed by the people who know the source material best.
The org started as a datasets hackathon inside the BigScience project in 2022 and has grown into a standing community for cultural-heritage ML.
What's here
The org is datasets-first: 46+ image, text, and tabular collections from libraries, archives, and museums, prepared so they load cleanly with the datasets library. A handful of models and spaces live here too — mostly early experiments from the BigScience-era hackathon.
For task-specific, deployable models built on top of these datasets, see the sibling org small-models-for-glam.
Contributing a dataset
If you've prepared a LAM dataset that other researchers might use, the best home is usually your institution's own Hugging Face organisation (e.g. NationalLibraryOfScotland). Institutional ownership signals authority over the data and makes long-term maintenance easier. Setting up a new org on the Hub is free and quick.
If your institution isn't on the Hub yet, or you'd prefer to host the dataset here, open a discussion and we'll help get it set up under BigLAM. Useful additions are typically datasets where the format conversion (METS/ALTO → parquet, IIIF manifest → loadable image splits, etc.) has already been done and the licensing is clear enough for open release.
Already have a dataset here that should sit under your institution's org? Open a discussion or issue on the dataset repo — we're happy to transfer ownership.
60+ contributors over the years. Day-to-day maintenance is light-touch; for help with a contribution, open a discussion and someone will see it.