Dev372/Cardiology_Medical_STT_Dataset
Viewer • Updated • 1.53k • 50 • 3
How to use khazarai/Cardiology-TTS with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/csm-1b")
model = PeftModel.from_pretrained(base_model, "khazarai/Cardiology-TTS")How to use khazarai/Cardiology-TTS with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-to-speech", model="khazarai/Cardiology-TTS") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("khazarai/Cardiology-TTS", dtype="auto")This is a fine-tuned version of the Conversational Speech Model (CSM-1B) using LoRA for parameter-efficient fine-tuning. The model is trained on a 1,530-sample dataset of medical cardiology texts, designed to generate high-quality speech from cardiology-related text. It leverages the capabilities of the original CSM-1B model for text-to-speech synthesis, extended with domain-specific terminology for medical cardiology. It is intended for speech generation in English, especially for clinical and educational contexts.
Use the code below to get started with the model.
import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
import soundfile as sf
from peft import PeftModel
model_id = "unsloth/csm-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(model_id)
base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(base_model, "khazarai/Cardiology-TTS")
text = "The coronary arteries are patent with no significant stenosis."
speaker_id = 0
conversation = [
{"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
audio_values = model.generate(
**processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to("cuda"),
max_new_tokens=200,
# play with these parameters to tweak results
# depth_decoder_top_k=0,
# depth_decoder_top_p=0.9,
# depth_decoder_do_sample=True,
# depth_decoder_temperature=0.9,
# top_k=0,
# top_p=1.0,
# temperature=0.9,
# do_sample=True,
#########################################################
output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example.wav", audio, 24000)
1,530 samples of cardiology-related text paired with audio.
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/csm-1b") model = PeftModel.from_pretrained(base_model, "khazarai/Cardiology-TTS")