Instructions to use PhysiQuanty/Binary-Addition-LLM-POC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PhysiQuanty/Binary-Addition-LLM-POC with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PhysiQuanty/Binary-Addition-LLM-POC", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("PhysiQuanty/Binary-Addition-LLM-POC", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PhysiQuanty/Binary-Addition-LLM-POC with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PhysiQuanty/Binary-Addition-LLM-POC"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PhysiQuanty/Binary-Addition-LLM-POC",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/PhysiQuanty/Binary-Addition-LLM-POC

SGLang

How to use PhysiQuanty/Binary-Addition-LLM-POC with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PhysiQuanty/Binary-Addition-LLM-POC" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PhysiQuanty/Binary-Addition-LLM-POC",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PhysiQuanty/Binary-Addition-LLM-POC" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PhysiQuanty/Binary-Addition-LLM-POC",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use PhysiQuanty/Binary-Addition-LLM-POC with Docker Model Runner:
```
docker model run hf.co/PhysiQuanty/Binary-Addition-LLM-POC
```

Binary-Calculator-LLM (Proof of Concept)

A tiny tokenizer-free / bit-level (base-2) calculator proof of concept.

This repository ships custom modeling_*.py / configuration_*.py, so you must load it with trust_remote_code=True.

What it does

The model is trained to read two integers encoded as 10-bit binary inside a structured prompt, and to emit an answer inside a BOR ... EOR block (binary output, variable-length).

Vocab (size = 8)

Bits: 0, 1
Specials:
- BOS=2, EOS=3
- BOI=4, EOI=5 (integer input blocks)
- BOR=6, EOR=7 (integer result block)

Load (Python)

from transformers import AutoModelForCausalLM

m = AutoModelForCausalLM.from_pretrained(
    "PhysiQuanty/Binary-Calculator-LLM-POC",
    trust_remote_code=True,
)
m.eval()

Inference (CLI)

This repo is typically used with the companion inference script inference_binary_calculator3.py (manual token-by-token loop, no .generate()), supporting:

--prompt_int "int,int" → builds: BOS t0 t1 BOI <10b int1> EOI BOI <10b int2> EOI
--print_int → extracts the first BOR ... EOR block and prints the decoded integer

Command

python3 inference_binary_calculator3.py \
  --repo "PhysiQuanty/Binary-Calculator-LLM-POC" \
  --prompt_int "20,68" \
  --seed -1 \
  --stop_on_eos \
  --max_new_tokens 64 \
  --temperature 0.7 \
  --top_k 50 \
  --print_int

Example output

[Seed] 1011554894
[Device] cuda
[Model] loaded from PhysiQuanty/Binary-Calculator-LLM-POC | vocab_size=8
[Prompt Origin] prompt_int="20,68" (t0,t1=0,0)
[Prompt IDs] len=27 first32=[2, 0, 0, 4, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 5, 4, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 5]

[Generated RAW IDS]
[6, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 7, 6, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 3]

[Generated RAW IDS (as digits)]
600001011000760000000100073

[PrintInt] First BOR..EOR
[PrintInt] pos=27 nbits=11 bits=00001011000 int=88

Notes

Inputs are 10-bit integers (0..1023). The output can exceed 10 bits (e.g. addition overflow), so the BOR..EOR block is decoded with variable bit-length.
The model is tokenizer-free in the sense that it operates directly on bits and a tiny set of structural tokens.
This is a POC: sampling settings (temperature, top_k) can affect stability. For deterministic behavior, you can lower temperature and/or increase constraints.

Downloads last month: 10

PhysiQuanty
/

Binary-Addition-LLM-POC