MeloTTS-English-MLX
MLX port of myshell-ai/MeloTTS-English for Apple Silicon.
Architecture
VITS2-based end-to-end TTS with:
- Transformer text encoder with relative position attention
- Stochastic + deterministic duration predictors
- TransformerCouplingLayer normalizing flow (4 layers)
- HiFi-GAN vocoder
- BERT prosodic features (bert-base-uncased)
Sample rate: 44,100 Hz
Speakers: EN-US, EN-BR, EN_INDIA, EN-AU, EN-Default
Usage
from mlx_audio.tts.models.melotts import Model, ModelConfig
from mlx_audio.tts.models.melotts.text import process_text
import mlx.core as mx
import json, soundfile as sf, numpy as np
from safetensors.numpy import load_file
from huggingface_hub import snapshot_download
from pathlib import Path
# Download model
model_path = Path(snapshot_download("mlx-community/MeloTTS-English-MLX"))
# Load
with open(model_path / "config.json") as f:
config_data = json.load(f)
model = Model(ModelConfig.from_dict(config_data))
weights = load_file(str(model_path / "model.safetensors"))
model.load_weights(list({k: mx.array(v) for k, v in weights.items()}.items()))
Model.post_load_hook(model, model_path)
# Generate
for result in model.generate("Hello, this is MeloTTS running on MLX!"):
audio = np.array(result.audio)
sf.write("output.wav", audio, result.sample_rate)
Converted from
myshell-ai/MeloTTS-English using mlx_audio.tts.models.melotts.convert.
- Downloads last month
- 71
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for mlx-community/MeloTTS-English-MLX
Base model
myshell-ai/MeloTTS-English