TENNs LLM 1B

A 1-billion-parameter causal language model built on gate-mode SSM (State Space Model) layers from TENNs Core. Uses recurrent inference instead of attention, making it efficient for streaming and long-context generation.

Architecture

Component Details
Layers 24 ร— TENNsBlock (gate mode)
Hidden dim 2048
Inner dim 4096
Vocabulary 32,000 (Mistral-7B tokenizer)
Parameters ~1B

Each TENNsBlock: RMSNorm โ†’ in_proj โ†’ causal_conv(4) โ†’ SSM(gate) โ†’ out_proj โ†’ residual

Quick Start (Google Colab / any environment)

!pip install transformers torch einops opt_einsum safetensors

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BrainChipInc/tenns-llm-1b")
model = AutoModelForCausalLM.from_pretrained(
    "BrainChipInc/tenns-llm-1b",
    trust_remote_code=True,
)

output = model.generate_text("The history of artificial intelligence", tokenizer, max_new_tokens=100)
print(output)

Do not use pipeline() โ€” this model uses a custom recurrent architecture that is not compatible with HuggingFace's standard text-generation pipeline.

Installation

pip install transformers torch einops opt_einsum safetensors

Usage

Note: Do not use pipeline() โ€” this model requires model.generate_text() instead of HuggingFace's standard generate(). The recurrent SSM architecture is not compatible with the attention KV-cache pipeline.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BrainChipInc/tenns-llm-1b")
model = AutoModelForCausalLM.from_pretrained(
    "BrainChipInc/tenns-llm-1b",
    trust_remote_code=True,
)

output = model.generate_text("The history of artificial intelligence", tokenizer, max_new_tokens=100)
print(output)

Generation options

# Greedy decoding (default)
output = model.generate_text(prompt, tokenizer, max_new_tokens=50)

# Top-k sampling with temperature
output = model.generate_text(prompt, tokenizer, max_new_tokens=100, temperature=0.8, top_k=50)

trust_remote_code=True

This model uses custom modeling code bundled in the repository (modeling_tenns_llm.py, configuration_tenns_llm.py, tenns_core/). Loading requires trust_remote_code=True. The bundled tenns_core/ package is a snapshot of the TENNs Core SSM library โ€” no separate installation needed.

Training

Fine-tuned from a base TENNs gate-mode model using LoRA adapters on English instruction data. LoRA adapters are merged into base weights at export time.

Limitations

  • English only
  • No system prompt or chat template โ€” plain completion model
  • Recurrent state resets between calls to generate_text()
Downloads last month
563
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support