TENNs LLM 1B

A 1-billion-parameter causal language model built on gate-mode SSM (State Space Model) layers from TENNs Core. Uses recurrent inference instead of attention, making it efficient for streaming and long-context generation.

Architecture

Component	Details
Layers	24 × TENNsBlock (gate mode)
Hidden dim	2048
Inner dim	4096
Vocabulary	32,000 (Mistral-7B tokenizer)
Parameters	~1B

Each TENNsBlock: RMSNorm → in_proj → causal_conv(4) → SSM(gate) → out_proj → residual

Quick Start (Google Colab / any environment)

!pip install transformers torch einops opt_einsum safetensors

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BrainChipInc/tenns-llm-1b")
model = AutoModelForCausalLM.from_pretrained(
    "BrainChipInc/tenns-llm-1b",
    trust_remote_code=True,
)

output = model.generate_text("The history of artificial intelligence", tokenizer, max_new_tokens=100)
print(output)

Do not use pipeline() — this model uses a custom recurrent architecture that is not compatible with HuggingFace's standard text-generation pipeline.

Installation

pip install transformers torch einops opt_einsum safetensors

Usage

Note: Do not use pipeline() — this model requires model.generate_text() instead of HuggingFace's standard generate(). The recurrent SSM architecture is not compatible with the attention KV-cache pipeline.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BrainChipInc/tenns-llm-1b")
model = AutoModelForCausalLM.from_pretrained(
    "BrainChipInc/tenns-llm-1b",
    trust_remote_code=True,
)

output = model.generate_text("The history of artificial intelligence", tokenizer, max_new_tokens=100)
print(output)

Generation options

# Greedy decoding (default)
output = model.generate_text(prompt, tokenizer, max_new_tokens=50)

# Top-k sampling with temperature
output = model.generate_text(prompt, tokenizer, max_new_tokens=100, temperature=0.8, top_k=50)

`trust_remote_code=True`

This model uses custom modeling code bundled in the repository (modeling_tenns_llm.py, configuration_tenns_llm.py, tenns_core/). Loading requires trust_remote_code=True. The bundled tenns_core/ package is a snapshot of the TENNs Core SSM library — no separate installation needed.

Training

Fine-tuned from a base TENNs gate-mode model using LoRA adapters on English instruction data. LoRA adapters are merged into base weights at export time.

Limitations

English only
No system prompt or chat template — plain completion model
Recurrent state resets between calls to generate_text()

Downloads last month: 563