Collection of GGUFs for inference with vla.cpp, a unified C++ inference engine for Vision-Language-Action models.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
VinRobotics - Edge AI & Model Optimization
We optimize and deploy LLMs, ASR, VLM and VLA (Vision-Language-Action) models on real-world systems.
Featured Projects
vla.cpp Native C++ inference runtime for Vision-Language-Action models, built for low-latency robotic deployment.
Model Quantization Recipes Practical recipes for quantizing and deploying LLM, ASR, VLM, and VLA models on real-world systems.
What we do
- Optimization: quantization (INT8/INT4/FP8/NVFP4), pruning, distillation, ...
- Deployment: VLLM, TensorRT, ONNX Runtime, edge runtimes
- Systems: real-time pipelines (vision, audio, language, action)
Focus
- Edge devices (Jetson, SoCs)
- Robotics & VLA systems
- Latency, stability, deployability
Philosophy
Optimization = model + runtime + system
models 24
vrfai/gr00tn1d5-libero-object-gguf
Robotics • 2B • Updated
vrfai/gr00tn1d6-libero-gguf
Robotics • 3B • Updated
vrfai/Qwen3-ASR-1.7B-int8
Automatic Speech Recognition • 2B • Updated • 3
vrfai/Qwen3-ASR-1.7B-int4
Automatic Speech Recognition • 2B • Updated • 3
vrfai/Qwen3-ASR-1.7B-fp8
Automatic Speech Recognition • 2B • Updated • 2.39k • 5
vrfai/Qwen3-ASR-1.7B-nvfp4
Automatic Speech Recognition • 1B • Updated • 228 • 5
vrfai/gemma-4-E4B-it-fp8
Text Generation • 8B • Updated • 982 • 4
vrfai/Qwen3.6-35B-A3B-NVFP4
Image-Text-to-Text • 34B • Updated • 436 • 3
vrfai/Qwen3.6-27B-FP8
Image-Text-to-Text • 27B • Updated • 1.91k • 2
vrfai/Qwen3.6-27B-NVFP4
Image-Text-to-Text • 19B • Updated • 9.52k • 6
datasets 0
None public yet