πŸ¦… Supra Mini 0.1M

Supra Mini 0.1M is a very tiny base model trained on 500 million tokens of Fineweb-Edu for 2 epochs to prove how well very tiny models can perform on world knowledge.

Model Config

  • Parameters: 117,648 (0.1M)
  • Architecture: Llama
  • Vocab size with custom BPE tokenizer: 250
  • Hidden Size: 48
  • Intermediate Size: 96
  • Hidden Layers: 4
  • Attention Heads: 4
  • Max Position Embeddings: 256
  • Learning rate: 6e-4
  • Weight Decay: 0.01

Final Loss

This model reached a final train loss after 2 epochs of x.xxx.

Benchmarks

All benchmarks were executed using lm-eval.

Task Value Random level
Arc_Easy 0.2639 0.25 (25%)
Wikitext 25.1691 -
BLiMP 0.5177 0.5 (50%)

Examples

Prompt: "Artificial intelligence is "
Output:: "Artificial intelligence is power by the leading the community, the book of the bring and in the made to the production of the back of an installing and consider in the several c"

Prompt: "The main concept of physics is "
Output:: "The main concept of physics is a struggle of the development of the company of the solution of the work of the first can be some of the supply a part of the state of the management,"

Prompt: "Once upon a time, "
Output:: "Once upon a time, so that he survey which is a self-described by the series of the surgery of the really a policy of the process of the southern of the material the stu"

Usage

To use our model, just run this code using HF Transformers to execute the model:

from transformers import pipeline
import torch

print("[*] Loading model from Hugging Face Hub...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-0.1M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=50):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Training guide

We trained Supra Mini 0.1M on a single T4 GPU in ~45 minutes for 2 epochs.
The full training code can be found in this repo as run.sh (easily run the complete pipeline), train_tokenizer.py (train costum BPE tokenizer with vocab size of 250), train.py (train the model) and inference.py (test the model).
The model was trained on the first 500 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.

Overtraining

Yes, this model is heavily overtrained! With about ~212x more data than needed (20 tokens per parameter is chinchilla-optimum - we used ~4250).

Final thoughts

As the new founded organization SupraLabs, we are proud the introduce our first Tiny-LLM to prove that our pipeline is running.
More models will release soon...

Downloads last month
-
Safetensors
Model size
118k params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train SupraLabs/Supra-Mini-0.1M