Instructions to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF",
	filename="Hy-MT1.5-1.8B-2bit.gguf",
)

llm.create_chat_completion(
	messages = "\"Меня зовут Вольфганг и я живу в Берлине\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
# Run inference directly in the terminal:
llama-cli -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
# Run inference directly in the terminal:
llama-cli -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
# Run inference directly in the terminal:
./llama-cli -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF

Use Docker

docker model run hf.co/AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF

LM Studio
Jan
Ollama
How to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with Ollama:
```
ollama run hf.co/AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
```

Unsloth Studio new

How to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF to start chatting

Docker Model Runner
How to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with Docker Model Runner:
```
docker model run hf.co/AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
```

Lemonade

How to use AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF

Run and chat with the model

lemonade run user.Hy-MT1.5-1.8B-2bit-GGUF-{{QUANT_TAG}}

List all available models

lemonade list

it can not be loaded by the newest version llama.cpp..... which version do you use when developing ?

by JamesYdAtJ3 - opened 26 days ago

Discussion

JamesYdAtJ3

26 days ago

rm -f /wwwFS.out/unix.socket.llama.sock ; /ai02/binLLM/llama-server --host /wwwFS.out/unix.socket.llama.sock --timeout 3609 -m /ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf --threads 5 --parallel 2
build_info: b8985-27aef3dd9
system_info: n_threads = 5 (n_threads_batch = 5) / 6 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Running without SSL
init: using 6 threads for HTTP server
start: setting address family to AF_UNIX
main: loading model
srv load_model: loading model '/ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
gguf_init_from_file_ptr: tensor 'blk.0.attn_k_norm.weight' has offset 203248672, expected 203129888
gguf_init_from_file_ptr: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf
llama_model_load_from_file_impl: failed to load model
common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model
common_fit_params: fitting params to free memory took 0.04 seconds
gguf_init_from_file_ptr: tensor 'blk.0.attn_k_norm.weight' has offset 203248672, expected 203129888
gguf_init_from_file_ptr: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf'
srv load_model: failed to load model, '/ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

JamesYdAtJ3

26 days ago

The 2-bit GGUF file is corrupt — the tensor offset table doesn't match the actual data layout. Re-downloading or re-quantizing it should resolve the issue entirely, I think.

HongHuang

AngelSlim org 26 days ago

We have used our custom kernel for llama.cpp, which will be released soon.

HongHuang

AngelSlim org 17 days ago

We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp PR #22836 ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR ! 🔥🔥🔥
2-bit kernel is on the way.

Gray430

12 days ago

We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp PR #22836 ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR ! 🔥🔥🔥
2-bit kernel is on the way.

干得漂亮。等不及要去尝试推理这个模型用llama.cpp了！下载了模型，等待PR合并

young96

10 days ago

手机上效果很好也很快，要不先放一个win版的llama呢，感觉是很棒的模型

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment