Instructions to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="lex-au/Orpheus-3b-FT-Q4_K_M.gguf", filename="Orpheus-3b-FT-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "\"The answer to the universe is 42\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Use Docker
docker model run hf.co/lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with Ollama:
ollama run hf.co/lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
- Unsloth Studio new
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lex-au/Orpheus-3b-FT-Q4_K_M.gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lex-au/Orpheus-3b-FT-Q4_K_M.gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lex-au/Orpheus-3b-FT-Q4_K_M.gguf to start chatting
- Pi new
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with Docker Model Runner:
docker model run hf.co/lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
- Lemonade
How to use lex-au/Orpheus-3b-FT-Q4_K_M.gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull lex-au/Orpheus-3b-FT-Q4_K_M.gguf:Q4_K_M
Run and chat with the model
lemonade run user.Orpheus-3b-FT-Q4_K_M.gguf-Q4_K_M
List all available models
lemonade list
Orpheus-3b-FT-Q4_K_M
This is a quantised version of canopylabs/orpheus-3b-0.1-ft.
Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output.
Model Description
Orpheus-3b-FT-Q4_K_M is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q4_K_M) format for efficient inference, making it accessible on consumer hardware.
Key features:
- 8 distinct voice options with different characteristics
- Support for emotion tags like laughter, sighs, etc.
- Optimised for CUDA acceleration on RTX GPUs
- Produces high-quality 24kHz mono audio
- Fine-tuned for conversational naturalness
How to Use
This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints.
Compatible Inference Servers
This quantised model can be loaded into any of these LLM inference servers:
- GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
- LM Studio - Load the GGUF model and start the local server
- llama.cpp server - Run with the appropriate model parameters
- Any compatible OpenAI API-compatible server
Quick Start
Download this quantised model from lex-au's Orpheus-FASTAPI collection
Load the model in your preferred inference server and start the server.
Clone the Orpheus-FastAPI repository:
git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI
Configure the FastAPI server to connect to your inference server by setting the
ORPHEUS_API_URLenvironment variable.Follow the complete installation and setup instructions in the repository README.
Audio Samples
Listen to the model in action with different voices and emotions:
Default Voice Sample
Leah (Happy)
Tara (Sad)
Zac (Contemplative)
Available Voices
The model supports 8 different voices:
tara: Female, conversational, clearleah: Female, warm, gentlejess: Female, energetic, youthfulleo: Male, authoritative, deepdan: Male, friendly, casualmia: Female, professional, articulatezac: Male, enthusiastic, dynamiczoe: Female, calm, soothing
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>,<chuckle>: For laughter sounds<sigh>: For sighing sounds<cough>,<sniffle>: For subtle interruptions<groan>,<yawn>,<gasp>: For additional emotional expression
Technical Specifications
- Architecture: Specialised token-to-audio sequence model
- Parameters: ~3 billion
- Quantisation: 8-bit (GGUF Q4_K_M format)
- Audio Sample Rate: 24kHz
- Input: Text with optional voice selection and emotion tags
- Output: High-quality WAV audio
- Language: English
- Hardware Requirements: CUDA-compatible GPU (recommended: RTX series)
- Integration Method: External LLM inference server + Orpheus-FastAPI frontend
Limitations
- Currently supports English text only
- Best performance achieved on CUDA-compatible GPUs
- Generation speed depends on GPU capability
License
This model is available under the Apache License 2.0.
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server.
If you use this quantised model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q4_K_M: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q4_K_M.gguf}}
}
- Downloads last month
- 375
4-bit