nvidia/Granary
Viewer • Updated • 141M • 11.2k • 198
How to use bofenghuang/parakeet-tdt-0.6b-v3-hybrid with NeMo:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("bofenghuang/parakeet-tdt-0.6b-v3-hybrid")
transcriptions = asr_model.transcribe(["file.wav"])Extend nvidia/parakeet-tdt-0.6b-v3 from TDT to hybrid TDT-CTC:
Sanity check seen below passed, getting the same transcriptions using TDT and gibberish with reinitialized CTC:
from nemo.collections.asr.models import ASRModel
nemo_model_path = "bofenghuang/parakeet-tdt-0.6b-v3-hybrid"
asr_model = ASRModel.from_pretrained(model_name=nemo_model_path)
audio_path = "example.wav"
result = asr_model.transcribe([audio_path])[0]
print(result.text)
# expect same output to nvidia/parakeet-tdt-0.6b-v3
asr_model.change_decoding_strategy(decoder_type="ctc")
result = asr_model.transcribe([audio_path])[0]
print(result.text)
# expect gibberish output