I have an error
docker exec -it ollama ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q4_K_M
pulling manifest
pulling 2ca636d9e81d: 100% ββββββββββββββββββββββββββββββ 5.6 GB
pulling 05f662501f8b: 100% ββββββββββββββββββββββββββββββ 921 MB
pulling e8b41bd7e9bc: 100% ββββββββββββββββββββββββββββββ 481 B
verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: unable to load model: /root/.ollama/models/blobs/sha256-2ca636d9e81d3d23ca9b60c234fe185d30ec082eeba69ce770fdb0c76559a4f5
How to fix?
Getting the same error, here are some logs:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model
M4 16gb Q6_K
Sounds like you need to update to the latest versions. Also I highly advise you don't use Ollama and move to something like LM Studio if you don't want to do llama.cpp directly.
It has been updated to the latest version, yet it still reports an error. However, the model downloaded from the official Ollama website works without issues.
I'm also using Ollama and encountering the same error. The model requires 16.2GB of memory; if it doesn't, it throws a 500 error.
Even with a 4B model, the same problem occurs in testing.
However, testing with Jan doesn't have this issue, so the problem isn't with the hardware but likely with the model configuration.
Solution has been given, don't use Ollama.
Swap to something like LM Studio if you want it user-friendly, otherwise get Llama.cpp.