TheArtist Music Transformer β€” F3 (Pop 2.5K Mix) β€” balanced sweet spot

Jazz-adapted chord model with a 2,500-sequence pop rehearsal buffer (β‰ˆ1.65Γ— the jazz training volume). Pop top-1 preserved within 0.04 points of the Phase 0 baseline. Jazz top-1 +8.13 points. The paper's recommended balanced default.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field Value
Architecture Music Transformer with relative positional attention
Parameters 25,661,440
Vocabulary size 351 tokens
Max sequence length 256
d_model / heads / FFN / layers 512 / 8 / 2048 / 8
Fine-tune resumed from Phase 0 pop baseline
Best epoch 9

Training data

All 1,513 jazz training sequences plus 2,500 pop rehearsal sequences (seed 42). Pop:jazz β‰ˆ 1.65:1. The paper identifies this ratio as the minimum rehearsal volume that fully preserves pop fluency while delivering essentially the full jazz gain (paper Β§7.1).

Evaluation (held-out per-genre test sets)

Metric Pop test Jazz test
Top-1 accuracy 84.20% 80.99%
Top-5 accuracy 96.87% 92.63%
Perplexity 1.82 2.29
Ξ” vs. Phase 0 baseline βˆ’0.04 +8.13

Qualitative samples from F3 introduce secondary dominants, chromatic passing diminished chords, and other jazz voice-leading vocabulary that the Phase 0 baseline does not produce. See paper Β§6.4 for representative continuations.

Intended use

Recommended default for chord-composition workflows that need fluency in both pop and jazz registers. F1 (ft-pop80) and F4 (ft-pop29) are the stylistic endpoints when a more committed pop-leaning or jazz-leaning identity is desired.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

The repo bundles the project's model.py and tokenizer.py at the repo root, so external users can load the checkpoint end-to-end without cloning anything from GitHub. snapshot_download materializes the full repo on disk; sys.path makes the bundled model.py / tokenizer.py importable.

Required dependencies: torch, huggingface_hub.

import sys
import torch
from huggingface_hub import snapshot_download

# Download the full repo (model.py, tokenizer.py, best.pt, config.json).
ckpt_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50")
sys.path.insert(0, ckpt_dir)  # so the next two imports resolve

from model import MusicTransformer
from tokenizer import ChordTokenizer

tokenizer = ChordTokenizer()
ckpt = torch.load(f"{ckpt_dir}/best.pt", map_location="cpu", weights_only=False)
model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Prompt = ii-V-I in C major; ask for a jazz-flavoured continuation.
song = {
    "key": "Cmaj", "time_signature": "4/4", "genre": "jazz",
    "bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

For per-genre adaptation beyond pop and jazz, see the 11 LoRA adapter repos at PearlLeeStudio β€” they chain on top of this base.

Per-genre real-song eval (held-out 130-song set, 2026-05)

First per-genre evaluation of ft-pop50 beyond the pop/jazz split that the original paper reports.

Eval results

Genre n_songs Top-1 (%) Top-5 (%) val_loss
pop 10 86.31 95.80 0.5821
rock 10 87.05 97.40 0.4669
jazz 10 71.01 86.70 1.3335
blues 10 81.86 93.90 0.8056
bossa 10 82.02 95.81 0.7311
classical 10 49.73 83.61 2.1032
country 10 85.96 98.24 0.5182
electronic 10 87.02 98.29 0.5093
folk 10 84.80 98.70 0.5285
funk 10 83.99 96.27 0.6901
gospel 10 79.76 96.71 0.7359
hip_hop 10 89.92 98.62 0.4002
rnb_soul 10 84.99 97.06 0.5885

On this eval set F3 peaks on hip_hop (89.92%) and struggles most on classical (49.73%). This is auxiliary signal β€” the 11 per-genre LoRAs (sister lora-* repos) are the recommended path for production use on the 9 non-pop, non-jazz genres. F-series cells on those genres show what the base model produces under [GENRE:none] conditioning (the model's [GENRE:X] token does not exist for the 9 new genres in the F-series vocab=351).

Eval dataset composition

130 songs total, 10 per genre Γ— 13 genres. Drawn from the same splits/val.jsonl + splits/test.jsonl partitions every F-series model was held out from during training β€” no train-set leakage. Built by ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 (deterministic).

Genre n Source(s) Bar range Avg duration Β· named
pop 10 billboard 58–116 189s Β· 10/10 named
rock 10 chordonomicon_rock 52–87 127s Β· 0/10 named
jazz 10 choco:jazz-corpus, choco:real-book, jazzstandards, jht 16–89 72s Β· 10/10 named
blues 10 chordonomicon_blues 24–46 93s Β· 0/10 named
bossa 10 chordonomicon_bossa 24–78 88s Β· 0/10 named
classical 10 chordonomicon_classical 11–40 60s Β· 10/10 named
country 10 chordonomicon_country 30–81 110s Β· 0/10 named
electronic 10 chordonomicon_electronic 25–84 89s Β· 0/10 named
folk 10 chordonomicon_folk 33–82 114s Β· 0/10 named
funk 10 chordonomicon_funk 30–60 92s Β· 0/10 named
gospel 10 chordonomicon_gospel 24–85 98s Β· 0/10 named
hip_hop 10 chordonomicon_hip_hop 24–81 136s Β· 0/10 named
rnb_soul 10 chordonomicon_rnb_soul 34–82 128s Β· 0/10 named

Source license summary: McGill Billboard (CC0, named pop songs), Jazz Harmony Treebank / JazzStandards / WJazzD (Public / community-redistributed, named jazz standards), Bach chorales via music21 (public domain, named pieces), Chordonomicon per-genre subsets (CC BY-NC 4.0; titles are Spotify track IDs by upstream dataset policy β€” progressions are real songs). See docs/EVAL.md for full breakdown.

Methodology

Teacher-forced next-token cross-entropy / top-1 / top-5 over each song's token sequence (BOS + key + time_sig + genre + bars + EOS, truncated to max_seq_len=256). Same evaluate() call as ai/results/f1_per_genre_baseline.csv, just narrowed to the curated 130-song subset. Token-level metrics; not a generation-quality eval (free-generation comparison with R1 Sethares + R2 theory RAG rerank is documented separately in ai/results/eval_report.md).

Caveats:

  • classical val partition is intrinsically small (37 sequences in full eval); the 10-song subset here has even narrower confidence bands. Directional finding (LoRA helps a lot on Bach harmony) is robust, exact pp deltas are noisy.
  • F-series numbers on the 9 LoRA-only genres are conditioned without genre tag (vocab=351 has no [GENRE:country] token etc.). This is the realistic "F-series alone" condition, not a controlled ablation.

Source CSV: ai/results/real_song_eval.csv (17 models Γ— 130 songs, long format).

Training-data licenses

Dataset License
Chordonomicon Public (user-generated)
McGill Billboard CC0
Jazz Harmony Treebank Public
JazzStandards (iReal Pro) Community redistribution
Weimar Jazz Database ODbL
JAAH Research-use public

Citation

Cite the original mix-ratio paper. The companion per-genre LoRA paper (chord-symbol time-series adaptation) is in preparation; its arXiv ID will be added here once posted.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

@misc{lee2026chordtimeseries,
  title         = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
  author        = {Lee, Jinju},
  year          = {2026},
  note          = {arXiv preprint, ID TBD},
}
Downloads last month
248
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50