Organization Card

ThingsAI

Building efficient, specialist Small Language Models that run on consumer hardware. Zero telemetry. Open weights. Everything from tokenizer to training script is public.

Models

Dwarf-15M (training in progress) A 15.54M parameter shell/bash specialist. 12 layers, d_model=320, GQA 5Q/1KV, SwiGLU, RMSNorm, RoPE. Custom 8202-token vocabulary via DwarfGoToken. Training on 38.85B tokens (2500:1 token-to-parameter ratio) across 11 datasets spanning raw shell, Python, C, instruction pairs, and English web text. Target use case: CLI tool that translates natural language into bash commands with user review before execution.
Quark-270M Our largest model. 252M effective parameters, 32 layers, d_model=768, GQA 12Q/4KV, 65K bilingual vocabulary (Italian + English). Trained on curated multilingual data. Available as Base and Instruct variants.
Quark-135M Bilingual (Italian + English) general-purpose model. 135M parameters, 30 layers, 9 attention heads (3 KV, GQA), SwiGLU, RMSNorm, RoPE θ=10k. Trained on 15B+ tokens. Published benchmarks: HellaSwag 31.37%, ARC-Easy 41.46%, PIQA 61.26%.
Quark-72M (archived — research artifact) A 71.7M parameter model that taught us an expensive lesson. With vocab_size=65536 and d_model=512, the embedding matrix consumed ~33.5M of the 71.7M total parameters — nearly half the budget in pure lookup table. Effective transformer capacity was ~35M parameters, explaining why it underperformed the nominally smaller Quark-135M on every benchmark (PIQA 54.57% vs 58.32%, ARC-Easy 32.10% vs 47.73%). Additionally, zero-shot chain-of-thought prompting actively degraded performance, dropping ARC-Easy from 33% to 25.5% (random guess level). This model remains published with its limitations honestly documented. Every architectural decision in Dwarf-15M — the compact 8K vocabulary, the syntax-aware tokenizer, the instruction data mixed into pretraining — was a direct response to what went wrong here.
Quark-Mod Multi-label content moderation model. 9 categories: toxic, severe_toxic, obscene, threat, insult, identity_hate, cyberbullying, hate_speech, offensive.

Tokenizers

DwarfGoToken An 8202-token BPE tokenizer built for shell/bash. Uses ByteLevel pre-tokenization with syntax-aware protected tokens for shell operators (2>&1, &&, >>, ||) that would otherwise be split by standard BPE. Built on a 51MB corpus of shell, Python, C, and English text. Two critical bugs were found and fixed during Dwarf-15M training: space loss from incorrect pre-tokenizer configuration, and short shell keywords (fi, do, if) matching as substrings inside English words.
GoToken A BPE tokenizer written in Rust with Python bindings via PyO3. Published on crates.io and PyPI. Provides syntax-aware pre-tokenization for shell/bash patterns. Used as the foundation for DwarfGoToken.

What We Focus On

Specialist over generalist: A 15M model can't do everything, but it can excel at one thing. Dwarf targets shell/bash; future models will target math/physics.
Honest failure documentation: When something doesn't work (72M vocabulary problem, zero-shot CoT degradation, tokenizer bugs), we publish the failure and what we learned.
Extreme overtraining for small models: Following the Phi/SmolLM philosophy — small models need more tokens per parameter, not fewer. Dwarf trains at 2500:1, 125x beyond Chinchilla optimal.
Custom tooling from scratch: Tokenizers (gotoken, DwarfGoToken), training scripts with multi-source streaming, and inference tools — all built in-house, all open.
Consumer hardware: Everything runs on an RTX 3070 (8GB) or equivalent. No datacenter required.

Links

Models and tokenizers: HuggingFace
Script & Tool: GitHub
Website: things-ai.org
GoToken: crates.io · PyPI

Collections 1

models 12

datasets 1

ThingAI/OmniBook

Viewer • Updated May 11 • 1.91M • 109 • 2

ThingAI

AI & ML interests

Recent Activity

ThingsAI

Models

Tokenizers

What We Focus On

Links

Collections 1

ThingAI/Quark-270m-Instruct

ThingAI/Quark-270m-Base

ThingAI/Quark-135m-Bilingual

ThingAI/Quark-Mod

ThingAI/Quark-270m-Instruct

ThingAI/Quark-270m-Base

ThingAI/Quark-135m-Bilingual

ThingAI/Quark-Mod

models 12

ThingAI/Dwarf-15M

ThingAI/DwarfGoToken

ThingAI/Quark-72M

ThingAI/Quark-135m

ThingAI/Quark2Tokenizer

ThingAI/Quark-270m-Instruct

ThingAI/Quark-270m-Base

ThingAI/Quark-135m-Bilingual

ThingAI/Quark-135m-v0.2-intermediate-step

ThingAI/Quark-50m

datasets 1

ThingAI/OmniBook

AI & ML interests

Recent Activity

Team members 3

ThingsAI

Models

Tokenizers

What We Focus On

Links

Collections 1

models 12 Sort: Recently updated

datasets 1

models 12