ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding
Paper • 2604.06685 • Published
How to use xxxllz/ChemVLR-8B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="xxxllz/ChemVLR-8B")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("xxxllz/ChemVLR-8B")
model = AutoModelForImageTextToText.from_pretrained("xxxllz/ChemVLR-8B")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use xxxllz/ChemVLR-8B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "xxxllz/ChemVLR-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "xxxllz/ChemVLR-8B",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker model run hf.co/xxxllz/ChemVLR-8B
How to use xxxllz/ChemVLR-8B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "xxxllz/ChemVLR-8B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "xxxllz/ChemVLR-8B",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "xxxllz/ChemVLR-8B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "xxxllz/ChemVLR-8B",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'How to use xxxllz/ChemVLR-8B with Docker Model Runner:
docker model run hf.co/xxxllz/ChemVLR-8B
ChemVLR is a chemical Vision-Language Model (VLM) designed to prioritize reasoning within the perception process. Unlike conventional chemical VLMs that often function as "black-box" systems, ChemVLR analyzes visual inputs in a fine-grained manner by explicitly identifying granular chemical descriptors, such as functional groups, prior to generating answers. This approach ensures the production of explicit and interpretable reasoning paths for complex visual chemical problems.
@misc{zhao2026chemvlrprioritizingreasoningperception,
title={ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding},
author={Xuanle Zhao and Xinyuan Cai and Xiang Cheng and Xiuyi Chen and Bo Xu},
year={2026},
eprint={2604.06685},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.06685},
}
ChemVLR is built upon the open-source work of Qwen2.5-VL and Qwen3-VL.
docker model run hf.co/xxxllz/ChemVLR-8B