TheArtist Music Transformer — LoRA Adapter (Funk)

LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward funk chord progressions. One of eleven per-genre adapters released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.

Adapter summary

Field	Value
Base model	`PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80` (F1, 25.6M params)
Adapter type	LoRA (Q/K/V projections)
LoRA rank	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	`w_q`, `w_k`, `w_v`
Trainable parameters	~~204,800 (~~0.79% of base)
Adapter file size	~0.8 MB
Base vocabulary	351 tokens (jazz/pop)
Vocabulary extension	+8 genre tokens (`embedding_extension.pt`)
Training epochs	8

Training data

Source

2,837 chord-progression sequences in the funk subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.

Filter rule

genres contains any of {funk}

(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)

Splits (song-level, seed=42, 80/10/10)

Partition	Songs	Used for
train	2,269	this LoRA's training (12-key augmented → 27,228 sequences)
val	283	rank-sweep eval + best-epoch selection during training
test	285	held aside for future paired analysis

Vocabulary

Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
Extension: +8 [GENRE:X] tokens covering 8 new genres (this LoRA adds the [GENRE:funk] token)
Final vocab: 359 tokens (stored alongside the adapter in embedding_extension.pt)

Reproducibility

# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres funk --merge

# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/funk.yaml

Hyperparameters: 8 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.

Genre character

Funk grooves with extended dominants and altered sevenths

Rank sweep

The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Every cell uses the same F1 base, same val split, same evaluate() call, and the same [GENRE:none]-initialized embedding extension — only lora_r (and lora_alpha = 2 × lora_r) changes. Numbers are validation-set token-level metrics (no key augmentation).

Rank	val_loss	val_top1 (%)	val_top5 (%)	Δtop1 vs F1
r=4	0.5694	84.62	96.05	+2.08
r=8	0.5688	84.64	96.05	+2.10 ← selected
r=16	0.5688	84.62	96.05	+2.08
r=32	0.5714	84.60	95.23	+2.06
r=64	0.5697	84.72	96.07	+2.18

Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker. val_loss is what the training loop optimizes and what selects each rank's best.pt epoch, so using it for cross-rank selection keeps consistency with how each individual checkpoint was chosen.

Full 11-genre × 5-rank sweep + full-FT anchor table: ai/results/lora_rank_sweep.md in the repo.

Evaluation

Validation token-level metrics on the genre-specific val split (283 sequences, no key augmentation). The F1 base column uses the same val split, same dataloader, and the same [GENRE:none]-initialized embedding-extension setup as the LoRA run — only the LoRA parameters and the trained embedding rows differ.

Metric	F1 base alone	F1 + this LoRA	Δ
Top-1 accuracy (%)	82.54	84.64	+2.10
Top-5 accuracy (%)	94.38	96.05	+1.67
Cross-entropy loss	0.7878	0.5688	-0.2190

Source: ai/results/f1_per_genre_baseline.csv + ai/results/lora_rank_sweep.csv. Higher top-1/top-5 and lower loss are better.

Real-song eval

Mean validation top-1/top-5/cross-entropy on 10 held-out real funk songs from ai/data/eval_real_songs.jsonl (held-out from ai/data/splits/{val,test}.jsonl, see docs/EVAL.md for dataset composition + methodology). Teacher-forced eval — same evaluate() call as the full-val rank-sweep eval above, just narrowed to a curated 10-song subset.

Model	Top-1 (%)	Top-5 (%)	val_loss
F1 base alone	83.85	96.03	0.6811
F1 + this LoRA	84.71	96.37	0.5912
Δ	+0.87	+0.33	-0.0899

Evaluation data

This adapter is evaluated on two complementary held-out sets, both drawn from the same val + test splits the LoRA never saw during training:

1. Full val split — used for the rank sweep table above

Size: 283 validation sequences (this genre's val partition)
Methodology: teacher-forced next-token CE / top-1 / top-5 with pad_id masking, batch 32, no key augmentation
Comparison fairness: same evaluate() call as ai/results/f1_per_genre_baseline.csv, same dataloader, same [GENRE:none]-initialised embedding-extension setup. Only the LoRA's adapter weights + the 8 new genre embedding rows differ.
Output: ai/results/lora_rank_sweep.csv (long format, one row per (genre, rank) cell)

2. Curated 130-song real-song eval — used for the Real-song eval section below

Size: 10 songs from this genre (10 per genre × 13 genres = 130 total)
Source partition: drawn from splits/val.jsonl + splits/test.jsonl only (no train leakage)
Per-genre sources: chordonomicon_funk
Title coverage (this genre): 0 of 10 are named real songs; remainder are Chordonomicon entries whose title field is a Spotify track ID by upstream dataset policy
Bar range (this genre): 30–60 bars (≈ 92s avg at typical tempo for this genre)
Build script: ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 — deterministic, re-runnable
Output: ai/results/real_song_eval.csv (17 models × 130 songs, long format)
Full dataset composition + per-source license + methodology: see docs/EVAL.md

License and use

The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.

Usage

import torch
from huggingface_hub import hf_hub_download
from peft import PeftModel
from model import MusicTransformer
from tokenizer import ChordTokenizer

# 1. Load the F1 base
base_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
    filename="best.pt",
)
base_ckpt = torch.load(base_path, map_location="cpu", weights_only=False)
tokenizer = ChordTokenizer()
model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(base_ckpt["model_state_dict"])

# 2. Extend the embedding to fit the LoRA's expanded vocabulary
ext_path = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-funk", filename="embedding_extension.pt")
ext = torch.load(ext_path, map_location="cpu", weights_only=False)
# (See model/README.md for the apply-extension recipe.)

# 3. Apply the LoRA adapter
adapter_dir = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-funk", filename="adapter_model.safetensors")
model = PeftModel.from_pretrained(model, adapter_dir.rsplit("/", 1)[0])
model.eval()

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

Downloads last month: 60

Model tree for PearlLeeStudio/TheArtist-MusicTransformer-lora-funk

Base model

PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80

Adapter

(11)

this model

Paper for PearlLeeStudio/TheArtist-MusicTransformer-lora-funk

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published 11 days ago