Instructions to use PhysiQuanty/Binary-Addition-LLM-POC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PhysiQuanty/Binary-Addition-LLM-POC with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PhysiQuanty/Binary-Addition-LLM-POC", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("PhysiQuanty/Binary-Addition-LLM-POC", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use PhysiQuanty/Binary-Addition-LLM-POC with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PhysiQuanty/Binary-Addition-LLM-POC" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PhysiQuanty/Binary-Addition-LLM-POC", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/PhysiQuanty/Binary-Addition-LLM-POC
- SGLang
How to use PhysiQuanty/Binary-Addition-LLM-POC with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PhysiQuanty/Binary-Addition-LLM-POC" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PhysiQuanty/Binary-Addition-LLM-POC", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PhysiQuanty/Binary-Addition-LLM-POC" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PhysiQuanty/Binary-Addition-LLM-POC", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use PhysiQuanty/Binary-Addition-LLM-POC with Docker Model Runner:
docker model run hf.co/PhysiQuanty/Binary-Addition-LLM-POC
Binary-Calculator-LLM (Proof of Concept)
A tiny tokenizer-free / bit-level (base-2) calculator proof of concept.
This repository ships custom modeling_*.py / configuration_*.py, so you must load it with trust_remote_code=True.
What it does
The model is trained to read two integers encoded as 10-bit binary inside a structured prompt, and to emit an answer inside a BOR ... EOR block (binary output, variable-length).
Vocab (size = 8)
- Bits:
0,1 - Specials:
BOS=2,EOS=3BOI=4,EOI=5(integer input blocks)BOR=6,EOR=7(integer result block)
Load (Python)
from transformers import AutoModelForCausalLM
m = AutoModelForCausalLM.from_pretrained(
"PhysiQuanty/Binary-Calculator-LLM-POC",
trust_remote_code=True,
)
m.eval()
Inference (CLI)
This repo is typically used with the companion inference script inference_binary_calculator3.py (manual token-by-token loop, no .generate()), supporting:
--prompt_int "int,int"→ builds:BOS t0 t1 BOI <10b int1> EOI BOI <10b int2> EOI--print_int→ extracts the firstBOR ... EORblock and prints the decoded integer
Command
python3 inference_binary_calculator3.py \
--repo "PhysiQuanty/Binary-Calculator-LLM-POC" \
--prompt_int "20,68" \
--seed -1 \
--stop_on_eos \
--max_new_tokens 64 \
--temperature 0.7 \
--top_k 50 \
--print_int
Example output
[Seed] 1011554894
[Device] cuda
[Model] loaded from PhysiQuanty/Binary-Calculator-LLM-POC | vocab_size=8
[Prompt Origin] prompt_int="20,68" (t0,t1=0,0)
[Prompt IDs] len=27 first32=[2, 0, 0, 4, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 5, 4, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 5]
[Generated RAW IDS]
[6, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 7, 6, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 3]
[Generated RAW IDS (as digits)]
600001011000760000000100073
[PrintInt] First BOR..EOR
[PrintInt] pos=27 nbits=11 bits=00001011000 int=88
Notes
- Inputs are 10-bit integers (0..1023). The output can exceed 10 bits (e.g. addition overflow), so the
BOR..EORblock is decoded with variable bit-length. - The model is tokenizer-free in the sense that it operates directly on bits and a tiny set of structural tokens.
- This is a POC: sampling settings (
temperature,top_k) can affect stability. For deterministic behavior, you can lower temperature and/or increase constraints.
- Downloads last month
- 10