Instructions to use cognica/Cognica-BP-v1.0-1.3B-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cognica/Cognica-BP-v1.0-1.3B-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cognica/Cognica-BP-v1.0-1.3B-base", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("cognica/Cognica-BP-v1.0-1.3B-base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use cognica/Cognica-BP-v1.0-1.3B-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cognica/Cognica-BP-v1.0-1.3B-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cognica/Cognica-BP-v1.0-1.3B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/cognica/Cognica-BP-v1.0-1.3B-base

SGLang

How to use cognica/Cognica-BP-v1.0-1.3B-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cognica/Cognica-BP-v1.0-1.3B-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cognica/Cognica-BP-v1.0-1.3B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cognica/Cognica-BP-v1.0-1.3B-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cognica/Cognica-BP-v1.0-1.3B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use cognica/Cognica-BP-v1.0-1.3B-base with Docker Model Runner:
```
docker model run hf.co/cognica/Cognica-BP-v1.0-1.3B-base
```

jaepil commited on Apr 21

Commit

14dbc69

verified ·

1 Parent(s): 071829a

Upload configuration_cognica_poe.py with huggingface_hub

Browse files

Files changed (1) hide show

configuration_cognica_poe.py +87 -0

configuration_cognica_poe.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""Cognica-PoE configuration class (HF transformers PretrainedConfig subclass).
+Mirrors the `GPTConfig` dataclass inside the nanochat GPT implementation while
+exposing the canonical HF field names so the model loads via
+`AutoModelForCausalLM.from_pretrained(..., trust_remote_code=True)`.
+"""
+from transformers import PretrainedConfig
+class CognicaPoEConfig(PretrainedConfig):
+    model_type = "cognica_poe"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    def __init__(
+        self,
+        hidden_size: int = 1536,
+        intermediate_size: int = 6144,
+        num_hidden_layers: int = 24,
+        num_attention_heads: int = 12,
+        num_key_value_heads: int = 12,
+        head_dim: int = 128,
+        max_position_embeddings: int = 2048,
+        vocab_size: int = 32768,
+        padded_vocab_size: int = 32768,
+        hidden_act: str = "relu_squared",
+        rms_norm_eps: float = 1e-6,
+        rope_theta: float = 100000.0,
+        tie_word_embeddings: bool = False,
+        window_pattern: str = "SSSL",
+        use_cache: bool = True,
+        poe_mode: str = "flat",
+        poe_every: int = 6,
+        poe_alpha: float = 0.0,
+        poe_head_count: int = 4,
+        base_model_name_or_path: str = None,
+        new_layers: int = 0,
+        frozen_layers: int = 0,
+        dual_head: bool = False,
+        stage_depth: int = 0,
+        stage_training: dict = None,
+        **kwargs,
+    ):
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.head_dim = head_dim
+        self.max_position_embeddings = max_position_embeddings
+        self.vocab_size = vocab_size
+        self.padded_vocab_size = padded_vocab_size
+        self.hidden_act = hidden_act
+        self.rms_norm_eps = rms_norm_eps
+        self.rope_theta = rope_theta
+        self.window_pattern = window_pattern
+        self.use_cache = use_cache
+        # PoE-specific metadata (training-time, no effect at inference)
+        self.poe_mode = poe_mode
+        self.poe_every = poe_every
+        self.poe_alpha = poe_alpha
+        self.poe_head_count = poe_head_count
+        # Stage extension metadata (paper Section 8.8 Elastic Depth + 6.5 Dual-Head).
+        # base_model_name_or_path: HF repo id of the parent model (another stage, or the base).
+        #   - None: this IS the base model (leaf of the cascade).
+        #   - str: this is a stage repo; `from_pretrained` will cascade-load the parent first.
+        # new_layers: number of layers this stage adds on top of its parent.
+        #   - 0 at base leaf.
+        # frozen_layers: index boundary at which the parent's layers end.
+        #   - At a stage with N_parent + N_new layers, frozen_layers = N_parent.
+        # dual_head: if True, this stage carries an additive specialist `lm_head_stage`
+        #   that is summed with the frozen base `lm_head` at the final projection.
+        # stage_depth: how deep this stage is in the cascade (0 = base, 1 = first SFT, 2 = stacked, ...).
+        # stage_training: optional dict of hyperparameters used to train this stage.
+        self.base_model_name_or_path = base_model_name_or_path
+        self.new_layers = new_layers
+        self.frozen_layers = frozen_layers
+        self.dual_head = dual_head
+        self.stage_depth = stage_depth
+        self.stage_training = stage_training if stage_training is not None else {}
+        super().__init__(
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )