Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
5
7
14
Catherine Arnett
catherinearnett
Follow
pkd's profile picture
ayymen's profile picture
Fishtiks's profile picture
109 followers
·
37 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset
about 1 month ago
catherinearnett/bilingual-tokenizer-training-data
published
a dataset
about 1 month ago
catherinearnett/bilingual-tokenizer-training-data
liked
a dataset
about 2 months ago
commoncrawl/CommonLID
View all activity
Organizations
catherinearnett
's datasets
4
Sort: Recently updated
catherinearnett/bilingual-tokenizer-training-data
Viewer
•
Updated
Feb 21
•
30.7M
•
139
catherinearnett/montok
Updated
Sep 19, 2025
•
10.1k
•
3
catherinearnett/morphscore
Viewer
•
Updated
Jul 10, 2025
•
5.09M
•
323
•
4
catherinearnett/monolingual-tokenizer-data
Viewer
•
Updated
May 15, 2025
•
139M
•
251
•
1