AkiliCode-14B Research Preview

AkiliCode logo

AkiliCode-14B is an early coding model from MsingiAI released as a research preview.

This release is published as:

  • msingiai/akilicode-14b-research-preview

Release artifact directory:

  • /outputs/akilicode-14b-research-preview

Source checkpoint:

  • /outputs/akili-code-stage3-ckpt100-s50-r4/checkpoint-20

It is a post-trained Qwen2.5-Coder-14B model optimized for structured reasoning and repair-oriented coding tasks. In MsingiAI's current internal evaluation stack, it shows promising function-level coding performance, but it is not yet strong on hidden-test competitive-programming benchmarks.

Research Preview Status

This model is being released for:

  • research
  • evaluation
  • failure analysis
  • downstream experimentation

This model is not being released as a state-of-the-art coding model or as a strong LiveCodeBench model.

Key Metrics

Promoted checkpoint results:

Benchmark Result
HumanEval+ 62.80
MBPP+ 65.61
BigCodeBench-Instruct 45.09
CRUXEval-O 49.75
LiveCodeBench v6 official 11.37

Important Caveat on LiveCodeBench

The main remaining weakness is hidden-test algorithmic correctness, not output parsing.

On the official-style LiveCodeBench v6 run:

  • n = 1055
  • pass@1 = 11.37
  • private tests used for all 1055 problems
  • extraction_success_rate = 100.0
  • syntax_valid_rate = 98.58

Failure breakdown:

  • wrong_answer = 838
  • timeout = 61
  • runtime_error = 21
  • syntax_error = 15
  • extraction_failed = 0

This means the model is usually producing executable outputs, but it still struggles on hidden-test algorithmic generalization, especially on medium and hard competition-style problems.

Intended Output Format

AkiliCode-14B was trained to respond with two XML blocks in order:

  1. <reasoning>...</reasoning>
  2. <code>...</code>

The reasoning block is expected to contain these headings:

  • PLAN:
  • TRACE:
  • EDGE CASES:
  • COMPLEXITY:

The code block is expected to contain only executable Python.

If your downstream stack only wants runnable code, extract the contents of <code>...</code> before execution.

Intended Uses

Recommended uses:

  • structured code generation
  • code-repair experiments
  • benchmark research
  • reasoning-format experiments
  • evaluation harness development

Less suitable uses right now:

  • competition-style hidden-test programming
  • production-critical autonomous coding
  • benchmark marketing claims about frontier coding performance

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "msingiai/akilicode-14b-research-preview"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = """<|im_start|>system
You are Akili Code, an expert programming assistant built by MsingiAI.
Always respond with <reasoning> followed by <code>.
<|im_end|>
<|im_start|>user
Write a Python function that returns the longest palindromic substring of a string.
<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Summary

AkiliCode-14B uses a three-stage post-training recipe:

  1. supervised fine-tuning
  2. white-box RL for first-pass execution accuracy
  3. repair-focused MURPHY / P-GRPO continuation

The promoted checkpoint was selected because it improved the strongest reliable function-level metrics relative to the Stage 2 golden checkpoint:

  • HumanEval+: 60.98 -> 62.80
  • MBPP+: 65.08 -> 65.61

It regressed slightly on BigCodeBench-Instruct:

  • 45.61 -> 45.09

Limitations

  • weak performance on hidden-test competitive programming
  • current benchmark profile is much stronger on short function-synthesis tasks than contest-style algorithmic tasks
  • model behavior depends on downstream handling of the XML output format
  • not validated for safety-critical or production-critical use

Contact

For research or partnership inquiries:

  • korir@msingiai.com
Downloads last month
11
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for msingiai/akilicode-14b-research-preview

Base model

Qwen/Qwen2.5-14B
Finetuned
(16)
this model