Instructions to use RepublicOfKorokke/GLM-OCR-oQ8-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use RepublicOfKorokke/GLM-OCR-oQ8-fp16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir GLM-OCR-oQ8-fp16 RepublicOfKorokke/GLM-OCR-oQ8-fp16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
GLM-OCR-oQ8-fp16
This model was quantized using oQ mixed-precision quantization.
float16 gives ~20% faster prefill on M1/M2 Apple Silicon (native fp16). bfloat16 is safer on M3/M4 and for numerical stability.
Benchmark (on M1 Max)
| Model Variant | PP (Tokens per second) | TG (Tokens per second) |
|---|---|---|
| Original (bf16) | 4,684 | 104.8 |
| oQ8-fp16 | 3,806 | 99.0 |
- Downloads last month
- 566
Model size
0.6B params
Tensor type
F16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit
Model tree for RepublicOfKorokke/GLM-OCR-oQ8-fp16
Base model
zai-org/GLM-OCR