TheArtist Music Transformer β F3 (Pop 2.5K Mix) β balanced sweet spot
Jazz-adapted chord model with a 2,500-sequence pop rehearsal buffer (β1.65Γ the jazz training volume). Pop top-1 preserved within 0.04 points of the Phase 0 baseline. Jazz top-1 +8.13 points. The paper's recommended balanced default.
One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.
Model summary
| Field | Value |
|---|---|
| Architecture | Music Transformer with relative positional attention |
| Parameters | 25,661,440 |
| Vocabulary size | 351 tokens |
| Max sequence length | 256 |
| d_model / heads / FFN / layers | 512 / 8 / 2048 / 8 |
| Fine-tune resumed from | Phase 0 pop baseline |
| Best epoch | 9 |
Training data
All 1,513 jazz training sequences plus 2,500 pop rehearsal sequences (seed 42). Pop:jazz β 1.65:1. The paper identifies this ratio as the minimum rehearsal volume that fully preserves pop fluency while delivering essentially the full jazz gain (paper Β§7.1).
Evaluation (held-out per-genre test sets)
| Metric | Pop test | Jazz test |
|---|---|---|
| Top-1 accuracy | 84.20% | 80.99% |
| Top-5 accuracy | 96.87% | 92.63% |
| Perplexity | 1.82 | 2.29 |
| Ξ vs. Phase 0 baseline | β0.04 | +8.13 |
Qualitative samples from F3 introduce secondary dominants, chromatic passing diminished chords, and other jazz voice-leading vocabulary that the Phase 0 baseline does not produce. See paper Β§6.4 for representative continuations.
Intended use
Recommended default for chord-composition workflows that need fluency in both pop and jazz registers. F1 (ft-pop80) and F4 (ft-pop29) are the stylistic endpoints when a more committed pop-leaning or jazz-leaning identity is desired.
Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.
Usage
The repo bundles the project's model.py and tokenizer.py at the repo
root, so external users can load the checkpoint end-to-end without
cloning anything from GitHub. snapshot_download materializes the full
repo on disk; sys.path makes the bundled model.py / tokenizer.py
importable.
Required dependencies: torch, huggingface_hub.
import sys
import torch
from huggingface_hub import snapshot_download
# Download the full repo (model.py, tokenizer.py, best.pt, config.json).
ckpt_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50")
sys.path.insert(0, ckpt_dir) # so the next two imports resolve
from model import MusicTransformer
from tokenizer import ChordTokenizer
tokenizer = ChordTokenizer()
ckpt = torch.load(f"{ckpt_dir}/best.pt", map_location="cpu", weights_only=False)
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Prompt = ii-V-I in C major; ask for a jazz-flavoured continuation.
song = {
"key": "Cmaj", "time_signature": "4/4", "genre": "jazz",
"bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
for _ in range(32):
logits = model(ids)
next_id = torch.multinomial(
torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
)
ids = torch.cat([ids, next_id], dim=-1)
if next_id.item() == tokenizer.eos_id:
break
print(tokenizer.decode(ids[0].tolist()))
For per-genre adaptation beyond pop and jazz, see the 11 LoRA adapter repos at PearlLeeStudio β they chain on top of this base.
Per-genre real-song eval (held-out 130-song set, 2026-05)
First per-genre evaluation of ft-pop50 beyond the pop/jazz split that the original paper reports.
Eval results
| Genre | n_songs | Top-1 (%) | Top-5 (%) | val_loss |
|---|---|---|---|---|
| pop | 10 | 86.31 | 95.80 | 0.5821 |
| rock | 10 | 87.05 | 97.40 | 0.4669 |
| jazz | 10 | 71.01 | 86.70 | 1.3335 |
| blues | 10 | 81.86 | 93.90 | 0.8056 |
| bossa | 10 | 82.02 | 95.81 | 0.7311 |
| classical | 10 | 49.73 | 83.61 | 2.1032 |
| country | 10 | 85.96 | 98.24 | 0.5182 |
| electronic | 10 | 87.02 | 98.29 | 0.5093 |
| folk | 10 | 84.80 | 98.70 | 0.5285 |
| funk | 10 | 83.99 | 96.27 | 0.6901 |
| gospel | 10 | 79.76 | 96.71 | 0.7359 |
| hip_hop | 10 | 89.92 | 98.62 | 0.4002 |
| rnb_soul | 10 | 84.99 | 97.06 | 0.5885 |
On this eval set F3 peaks on hip_hop (89.92%) and struggles most on classical (49.73%).
This is auxiliary signal β the 11 per-genre LoRAs (sister lora-* repos) are the recommended path for production use on the 9 non-pop, non-jazz genres. F-series cells on those genres show what the base model produces under [GENRE:none] conditioning (the model's [GENRE:X] token does not exist for the 9 new genres in the F-series vocab=351).
Eval dataset composition
130 songs total, 10 per genre Γ 13 genres. Drawn from the same splits/val.jsonl + splits/test.jsonl partitions every F-series model was held out from during training β no train-set leakage. Built by ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 (deterministic).
| Genre | n | Source(s) | Bar range | Avg duration Β· named |
|---|---|---|---|---|
| pop | 10 | billboard | 58β116 | 189s Β· 10/10 named |
| rock | 10 | chordonomicon_rock | 52β87 | 127s Β· 0/10 named |
| jazz | 10 | choco:jazz-corpus, choco:real-book, jazzstandards, jht | 16β89 | 72s Β· 10/10 named |
| blues | 10 | chordonomicon_blues | 24β46 | 93s Β· 0/10 named |
| bossa | 10 | chordonomicon_bossa | 24β78 | 88s Β· 0/10 named |
| classical | 10 | chordonomicon_classical | 11β40 | 60s Β· 10/10 named |
| country | 10 | chordonomicon_country | 30β81 | 110s Β· 0/10 named |
| electronic | 10 | chordonomicon_electronic | 25β84 | 89s Β· 0/10 named |
| folk | 10 | chordonomicon_folk | 33β82 | 114s Β· 0/10 named |
| funk | 10 | chordonomicon_funk | 30β60 | 92s Β· 0/10 named |
| gospel | 10 | chordonomicon_gospel | 24β85 | 98s Β· 0/10 named |
| hip_hop | 10 | chordonomicon_hip_hop | 24β81 | 136s Β· 0/10 named |
| rnb_soul | 10 | chordonomicon_rnb_soul | 34β82 | 128s Β· 0/10 named |
Source license summary: McGill Billboard (CC0, named pop songs), Jazz Harmony Treebank / JazzStandards / WJazzD (Public / community-redistributed, named jazz standards), Bach chorales via music21 (public domain, named pieces), Chordonomicon per-genre subsets (CC BY-NC 4.0; titles are Spotify track IDs by upstream dataset policy β progressions are real songs). See docs/EVAL.md for full breakdown.
Methodology
Teacher-forced next-token cross-entropy / top-1 / top-5 over each song's token sequence (BOS + key + time_sig + genre + bars + EOS, truncated to max_seq_len=256). Same evaluate() call as ai/results/f1_per_genre_baseline.csv, just narrowed to the curated 130-song subset. Token-level metrics; not a generation-quality eval (free-generation comparison with R1 Sethares + R2 theory RAG rerank is documented separately in ai/results/eval_report.md).
Caveats:
classicalval partition is intrinsically small (37 sequences in full eval); the 10-song subset here has even narrower confidence bands. Directional finding (LoRA helps a lot on Bach harmony) is robust, exact pp deltas are noisy.- F-series numbers on the 9 LoRA-only genres are conditioned without genre tag (vocab=351 has no
[GENRE:country]token etc.). This is the realistic "F-series alone" condition, not a controlled ablation.
Source CSV: ai/results/real_song_eval.csv (17 models Γ 130 songs, long format).
Training-data licenses
| Dataset | License |
|---|---|
| Chordonomicon | Public (user-generated) |
| McGill Billboard | CC0 |
| Jazz Harmony Treebank | Public |
| JazzStandards (iReal Pro) | Community redistribution |
| Weimar Jazz Database | ODbL |
| JAAH | Research-use public |
Citation
Cite the original mix-ratio paper. The companion per-genre LoRA paper (chord-symbol time-series adaptation) is in preparation; its arXiv ID will be added here once posted.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
@misc{lee2026chordtimeseries,
title = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
author = {Lee, Jinju},
year = {2026},
note = {arXiv preprint, ID TBD},
}
- Downloads last month
- 248