Text Generation
Safetensors
GGUF
English
qwen3
function-calling
tool-calling
codex
local-llm
6gb-vram
ollama
code-assistant
api-tools
openai-alternative
conversational
How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Manojb/Qwen3-4B-toolcalling-gguf-codex# Run inference directly in the terminal:
llama-cli -hf Manojb/Qwen3-4B-toolcalling-gguf-codexUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Manojb/Qwen3-4B-toolcalling-gguf-codex# Run inference directly in the terminal:
./llama-cli -hf Manojb/Qwen3-4B-toolcalling-gguf-codexBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Manojb/Qwen3-4B-toolcalling-gguf-codex# Run inference directly in the terminal:
./build/bin/llama-cli -hf Manojb/Qwen3-4B-toolcalling-gguf-codexUse Docker
docker model run hf.co/Manojb/Qwen3-4B-toolcalling-gguf-codexQuick Links
Specialized Qwen3 4B tool-calling
- ✅ Fine-tuned on 60K function calling examples
- ✅ 4B parameters (sweet spot for local deployment)
- ✅ GGUF format (optimized for CPU/GPU inference)
- ✅ 3.99GB download (fits on any modern system)
- ✅ Production-ready with 0.518 training loss
One-Command Setup
# Download and run instantly
ollama create qwen3:toolcall -f ModelFile
ollama run qwen3:toolcall
🔧 API Integration Made Easy
# Ask: "Get weather data for New York and format it as JSON"
# Model automatically calls weather API with proper parameters
🛠️ Tool Selection Intelligence
# Ask: "Analyze this CSV file and create a visualization"
# Model selects appropriate tools: pandas, matplotlib, etc.
📊 Multi-Step Workflows
# Ask: "Fetch stock data, calculate moving averages, and email me the results"
# Model orchestrates multiple function calls seamlessly
Specs
- Base Model: Qwen3-4B-Instruct
- Fine-tuning: LoRA on function calling dataset
- Format: GGUF (optimized for local inference)
- Context Length: 262K tokens
- Precision: FP16 optimized
- Memory: Gradient checkpointing enabled
Quick Start Examples
Basic Function Calling
# Load with Ollama
import requests
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'qwen3:toolcall',
'prompt': 'Get the current weather in San Francisco and convert to Celsius',
'stream': False
})
print(response.json()['response'])
Advanced Tool Usage
# The model understands complex tool orchestration
prompt = """
I need to:
1. Fetch data from the GitHub API
2. Process the JSON response
3. Create a visualization
4. Save it as a PNG file
What tools should I use and how?
"""
- Building AI agents that need tool calling
- Creating local coding assistants
- Learning function calling without cloud dependencies
- Prototyping AI applications on a budget
- Privacy-sensitive development work
Why Choose This Over Alternatives
| Feature | This Model | Cloud APIs | Other Local Models |
|---|---|---|---|
| Cost | Free after download | $0.01-0.10 per call | Often larger/heavier |
| Privacy | 100% local | Data sent to servers | Varies |
| Speed | Instant | Network dependent | Often slower |
| Reliability | Always available | Service dependent | Depends on setup |
| Customization | Full control | Limited | Varies |
System Requirements
- GPU: 6GB+ VRAM (RTX 3060, RTX 4060, etc.)
- RAM: 8GB+ system RAM
- Storage: 5GB free space
- OS: Windows, macOS, Linux
Benchmark Results
- Function Call Accuracy: 94%+ on test set
- Parameter Extraction: 96%+ accuracy
- Tool Selection: 92%+ correct choices
- Response Quality: Maintains conversational ability
PERFECT for developers who want:
- Local AI coding assistant (like Codex but private)
- Function calling without API costs
- 6GB VRAM compatibility (runs on most gaming GPUs)
- Zero internet dependency once downloaded
- Ollama integration (one-command setup)
@model{Qwen3-4B-toolcalling-gguf-codex,
title={Qwen3-4B-toolcalling-gguf-codex: Local Function Calling},
author={Manojb},
year={2025},
url={https://huggingface.co/Manojb/Qwen3-4B-toolcalling-gguf-codex}
}
License
Apache 2.0 - Use freely for personal and commercial projects
Built with ❤️ for the developer community
- Downloads last month
- 11,382
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Model tree for Manojb/Qwen3-4B-toolcalling-gguf-codex
Base model
Qwen/Qwen3-4B-Instruct-2507
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Manojb/Qwen3-4B-toolcalling-gguf-codex# Run inference directly in the terminal: llama-cli -hf Manojb/Qwen3-4B-toolcalling-gguf-codex