TheArtist Music Transformer — LoRA Adapter (Funk)

LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward funk chord progressions. One of eleven per-genre adapters released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.

Adapter summary

Field Value
Base model PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80 (F1, 25.6M params)
Adapter type LoRA (Q/K/V projections)
LoRA rank 8
LoRA alpha 16
LoRA dropout 0.05
Target modules w_q, w_k, w_v
Trainable parameters 204,800 (0.79% of base)
Adapter file size ~0.8 MB
Base vocabulary 351 tokens (jazz/pop)
Vocabulary extension +8 genre tokens (embedding_extension.pt)
Training epochs 8

Training data

Source

2,837 chord-progression sequences in the funk subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.

Filter rule

genres contains any of {funk}

(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)

Splits (song-level, seed=42, 80/10/10)

Partition Songs Used for
train 2,269 this LoRA's training (12-key augmented → 27,228 sequences)
val 283 rank-sweep eval + best-epoch selection during training
test 285 held aside for future paired analysis

Vocabulary

  • Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
  • Extension: +8 [GENRE:X] tokens covering 8 new genres (this LoRA adds the [GENRE:funk] token)
  • Final vocab: 359 tokens (stored alongside the adapter in embedding_extension.pt)

Reproducibility

# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres funk --merge

# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/funk.yaml

Hyperparameters: 8 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.

Genre character

Funk grooves with extended dominants and altered sevenths

Rank sweep

The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Every cell uses the same F1 base, same val split, same evaluate() call, and the same [GENRE:none]-initialized embedding extension — only lora_r (and lora_alpha = 2 × lora_r) changes. Numbers are validation-set token-level metrics (no key augmentation).

Rank val_loss val_top1 (%) val_top5 (%) Δtop1 vs F1
r=4 0.5694 84.62 96.05 +2.08
r=8 0.5688 84.64 96.05 +2.10 ← selected
r=16 0.5688 84.62 96.05 +2.08
r=32 0.5714 84.60 95.23 +2.06
r=64 0.5697 84.72 96.07 +2.18

Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker. val_loss is what the training loop optimizes and what selects each rank's best.pt epoch, so using it for cross-rank selection keeps consistency with how each individual checkpoint was chosen.

Full 11-genre × 5-rank sweep + full-FT anchor table: ai/results/lora_rank_sweep.md in the repo.

Evaluation

Validation token-level metrics on the genre-specific val split (283 sequences, no key augmentation). The F1 base column uses the same val split, same dataloader, and the same [GENRE:none]-initialized embedding-extension setup as the LoRA run — only the LoRA parameters and the trained embedding rows differ.

Metric F1 base alone F1 + this LoRA Δ
Top-1 accuracy (%) 82.54 84.64 +2.10
Top-5 accuracy (%) 94.38 96.05 +1.67
Cross-entropy loss 0.7878 0.5688 -0.2190

Source: ai/results/f1_per_genre_baseline.csv + ai/results/lora_rank_sweep.csv. Higher top-1/top-5 and lower loss are better.

Real-song eval

Mean validation top-1/top-5/cross-entropy on 10 held-out real funk songs from ai/data/eval_real_songs.jsonl (held-out from ai/data/splits/{val,test}.jsonl, see docs/EVAL.md for dataset composition + methodology). Teacher-forced eval — same evaluate() call as the full-val rank-sweep eval above, just narrowed to a curated 10-song subset.

Model Top-1 (%) Top-5 (%) val_loss
F1 base alone 83.85 96.03 0.6811
F1 + this LoRA 84.71 96.37 0.5912
Δ +0.87 +0.33 -0.0899

Evaluation data

This adapter is evaluated on two complementary held-out sets, both drawn from the same val + test splits the LoRA never saw during training:

1. Full val split — used for the rank sweep table above

  • Size: 283 validation sequences (this genre's val partition)
  • Methodology: teacher-forced next-token CE / top-1 / top-5 with pad_id masking, batch 32, no key augmentation
  • Comparison fairness: same evaluate() call as ai/results/f1_per_genre_baseline.csv, same dataloader, same [GENRE:none]-initialised embedding-extension setup. Only the LoRA's adapter weights + the 8 new genre embedding rows differ.
  • Output: ai/results/lora_rank_sweep.csv (long format, one row per (genre, rank) cell)

2. Curated 130-song real-song eval — used for the Real-song eval section below

  • Size: 10 songs from this genre (10 per genre × 13 genres = 130 total)
  • Source partition: drawn from splits/val.jsonl + splits/test.jsonl only (no train leakage)
  • Per-genre sources: chordonomicon_funk
  • Title coverage (this genre): 0 of 10 are named real songs; remainder are Chordonomicon entries whose title field is a Spotify track ID by upstream dataset policy
  • Bar range (this genre): 30–60 bars (≈ 92s avg at typical tempo for this genre)
  • Build script: ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 — deterministic, re-runnable
  • Output: ai/results/real_song_eval.csv (17 models × 130 songs, long format)
  • Full dataset composition + per-source license + methodology: see docs/EVAL.md

License and use

The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.

Usage

import torch
from huggingface_hub import hf_hub_download
from peft import PeftModel
from model import MusicTransformer
from tokenizer import ChordTokenizer

# 1. Load the F1 base
base_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
    filename="best.pt",
)
base_ckpt = torch.load(base_path, map_location="cpu", weights_only=False)
tokenizer = ChordTokenizer()
model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(base_ckpt["model_state_dict"])

# 2. Extend the embedding to fit the LoRA's expanded vocabulary
ext_path = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-funk", filename="embedding_extension.pt")
ext = torch.load(ext_path, map_location="cpu", weights_only=False)
# (See model/README.md for the apply-extension recipe.)

# 3. Apply the LoRA adapter
adapter_dir = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-funk", filename="adapter_model.safetensors")
model = PeftModel.from_pretrained(model, adapter_dir.rsplit("/", 1)[0])
model.eval()

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}
Downloads last month
60
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PearlLeeStudio/TheArtist-MusicTransformer-lora-funk

Paper for PearlLeeStudio/TheArtist-MusicTransformer-lora-funk