--- library_name: mlx pipeline_tag: image-text-to-text license: cc-by-nc-4.0 tags: - mlx - apple-silicon - multimodal - multilingual - vlm - vision-language - qwen3 - siglip2 language: - en - zh - ar - pt - ru - tr - de - es - fr - it - ja - ko - vi - th - id - hi - bn - nl - pl - sv - fi - da - "no" - cs - el - he - uk - ro - hu - multilingual base_model: jinaai/jina-vlm base_model_relation: quantized inference: false --- # jina-vlm-mlx Native MLX port of [jina-vlm](https://huggingface.co/jinaai/jina-vlm) for Apple Silicon with 4-bit quantization. ## Model Size **2.0 GB** (down from 9.2 GB fp32, 79% compression) ### Quantization Strategy - **4-bit weights** (group_size=64): lm_head, vision encoder, VL connector, language model layers 1-27 - **bfloat16 weights**: Embeddings, layer norms, language model layer 0 ## Installation > [!IMPORTANT] > jina-vlm support is already merged into mlx-vlm master but not yet released. Until the next release, install from the main branch: ```bash pip install git+https://github.com/Blaizzy/mlx-vlm.git@main ``` ## Usage ```python from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load model model, processor = load("jinaai/jina-vlm-mlx") config = load_config("jinaai/jina-vlm-mlx") # Prepare input image = ["photo.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output = generate(model, processor, formatted_prompt, image, max_tokens=200) print(output.text) ``` ## CLI Usage ```bash python -m mlx_vlm.generate \ --model jinaai/jina-vlm-mlx \ --image photo.jpg \ --prompt "Describe this image." \ --max-tokens 200 ``` ## License CC BY-NC 4.0. Commercial use: [contact Jina AI](https://jina.ai/contact-sales/). ## Citation ```bibtex @misc{koukounas2025jinavlm, title={Jina-VLM: Small Multilingual Vision Language Model}, author={Andreas Koukounas and Georgios Mastrapas and Florian Hönicke and Sedigheh Eslami and Guillaume Roncari and Scott Martens and Han Xiao}, year={2025}, eprint={2512.04032}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.04032}, } ```