Instructions to use cognica/Cognica-BP-v1.0-1.3B-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cognica/Cognica-BP-v1.0-1.3B-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cognica/Cognica-BP-v1.0-1.3B-base", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("cognica/Cognica-BP-v1.0-1.3B-base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use cognica/Cognica-BP-v1.0-1.3B-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cognica/Cognica-BP-v1.0-1.3B-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cognica/Cognica-BP-v1.0-1.3B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/cognica/Cognica-BP-v1.0-1.3B-base

SGLang

How to use cognica/Cognica-BP-v1.0-1.3B-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cognica/Cognica-BP-v1.0-1.3B-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cognica/Cognica-BP-v1.0-1.3B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cognica/Cognica-BP-v1.0-1.3B-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cognica/Cognica-BP-v1.0-1.3B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use cognica/Cognica-BP-v1.0-1.3B-base with Docker Model Runner:
```
docker model run hf.co/cognica/Cognica-BP-v1.0-1.3B-base
```

Configuration Parsing Warning:In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

Cognica-BP-v1.0-1.3B-base

Paper: Product of Experts as Scalable Local Learning: Modular Construction at 1.3B Parameters (Jeong, 2026)

A 1.384 B-parameter causal language model pretrained from scratch with standard end-to-end backprop (no PoE local learning). This is the control arm of the d24 r20 Chinchilla 3-way experiment that the paper reports, released so downstream researchers and practitioners can reproduce the exact BPB gap between standard backprop and PoE-based local learning at this scale.

This repo is a companion to cognica/Cognica-PoE-v1.0-1.3B-base, which is the PoE-trained version of the same architecture on the same data. They differ only in loss construction.

TL;DR

1.384 B params, d24, 1536-dim, 12 heads, 4 clustered stages × 6 layers (same architecture as the PoE release; only the training loss differs).
27.7 B tokens from ClimbMix (Chinchilla-20 ratio, matching the PoE run exactly).
Final val bpb: 0.6768 (the best-of-3-way run; PoE α=0.0 finished at 0.7209, a 6.52 % BPB gap).
No PoE inference modes. Because there are no per-stage losses, the intermediate layers do not produce valid predictors. Stage prefix pruning, WAND adaptive depth, speculative drafting from stage 0, and post-hoc specialist attach all require the PoE-trained base — they do not apply here.
Released as a reference baseline for scaling-law / local-learning research. Not intended as a production instruction-tuned model.

When to use which base

Use case	Pick
You want the best BPB-per-FLOP at this scale and don't care about staged inference	This repo (BP)
You want early-exit, WAND, speculative drafting, or the ability to attach SFT specialists	`Cognica-PoE-v1.0-1.3B-base`
You want to reproduce the paper's BP-vs-PoE comparison	Load both and run side by side

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "cognica/Cognica-BP-v1.0-1.3B-base",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).eval().cuda()

tok = AutoTokenizer.from_pretrained(
    "cognica/Cognica-BP-v1.0-1.3B-base",
    trust_remote_code=True,
)

# IMPORTANT: base models in this family expect <|bos|> as the first token.
# Without it, generation quality degrades sharply (see the companion PoE base's
# README for the historical anomaly / verification that led to this rule).
prompt = "<|bos|>The capital of France is"
ids = tok.encode(prompt, return_tensors="pt").cuda()

out = model.generate(ids, max_new_tokens=40, do_sample=False,
                     repetition_penalty=1.15, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0]))

Greedy decoding with the BP baseline produces coherent continuations on scientific and geographic probes (Paris / planets of the solar system / photosynthesis). The BP baseline model does not add any specialist head; its lm_head is a single projection from the final-layer hidden state.

Training

Setting	Value
Architecture	d24, hidden 1536, 12 heads (MHA), SSSL window
Activation	relu² (squared ReLU)
Vocab	32768 (byte-level BPE, same as the PoE base)
Optimizer	MuonAdamW hybrid (Muon for matrices, AdamW for embeddings / scalars)
Matrix LR	0.02
Embedding LR	0.30
Unembedding LR	0.008
Batch	1 048 576 tokens / step, seq len 2048
Steps	26 430 (27.7 B tokens, Chinchilla-20)
Warmup	40 steps
Warmdown	65 % of run
Final LR frac	0.05
Dataset	ClimbMix 400 B mirror
Hardware	4 × A100 80 GB (Google Cloud, us-central1)
Wall time	~67 h

Validation bpb trajectory (every 1000 steps after warmup)

Final: 0.6768 @ step 26 430.

Step	Val bpb
5000	0.8823
10000	0.7795
15000	0.7358
20000	0.7025
25000	0.6827
26430	0.6768

Comparison to PoE α=0.0 (same architecture, same data)

Run	Loss	Final val bpb	Δ vs baseline
This repo (BP, α=∅)	Standard final-layer CE	0.6768	—
`Cognica-PoE-v1.0-1.3B-base`	PoE flat α=0.0 (per-stage detached CE through shared head)	0.7209	+0.0441 (+6.52 %)

The 6.52 % gap is the pure architectural / loss-shape cost of PoE local learning at this exact scale and data budget — everything else (optimizer, LR schedule, data, seed) is matched. The PoE base is the more capable artifact in practice because it also supports 1.82× WAND inference speedups, 1.87× speculative decoding, and post-hoc specialist attach (see the PoE base repo). The BP baseline is released so the comparison can be exactly reproduced.

A third arm (PoE α=0.5 sqrt(n) scaling) was launched in the same experiment but killed at step 14 000 (53 % of run) when it was tracking slightly worse than α=0.0 at the same step — the sqrt(n) hypothesis (P0 in the paper) is rejected by this data. Checkpoints for that run are not released.

Files

model.safetensors — 2.6 GB bf16 weights (175 tensors).
config.json — HF config; poe_mode="none" (vs "flat" on the PoE repo).
configuration_cognica_poe.py, modeling_cognica_poe.py, tokenization_cognica_poe.py — identical to the PoE base repo (same architecture class); the forward pass simply doesn't compute the PoE aggregate when poe_mode="none".
tokenizer.pkl, tokenizer_config.json, special_tokens_map.json, token_bytes.pt — tokenizer (identical to base).
generation_config.json — default generation params.

Limitations

Not instruction-tuned. Pure causal-LM continuation only. To get chat / SFT behavior, attach a specialist stage (which requires the PoE base, not this one) or do your own full-model SFT.
No early-exit capability. Stage 0 / stage 1 / stage 2 outputs from this model are not valid predictors because they were never supervised — they only appear in the frozen residual chain. Do not use head_mode=base / stage pruning patterns from the PoE repo on this one.
English-only. ClimbMix is predominantly English; non-English generation may be poor.

Citation

@article{jeong2026poe,
  title  = {Product of Experts as Scalable Local Learning: Modular Construction at 1.3B Parameters},
  author = {Jeong, Jaepil},
  year   = {2026},
  institution = {Cognica, Inc.},
  doi    = {10.5281/zenodo.19547653},
  url    = {https://doi.org/10.5281/zenodo.19547653}
}

@misc{cognica-baseline-2026,
  title  = {Cognica-Baseline-v1.0-1.3B: Standard backprop reference for the PoE 3-way experiment},
  author = {{Cognica, Inc.}},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/cognica/Cognica-BP-v1.0-1.3B-base}}
}

License

Apache 2.0 — see LICENSE and NOTICE. Same terms as the PoE base. Training data (ClimbMix) carries its own license (see ClimbMix dataset card).

Downloads last month: 124

Safetensors

Model size

1B params

Tensor type

BF16

cognica
/

Cognica-BP-v1.0-1.3B-base