Convergent-7B

Convergent-7B — bigcompute.science research companion

The bigcompute.science research companion model.

Early Preview — This model is a work in progress, expressly trained to act as a research assistant with the bigcompute.science MCP server. We are still very early in the training process and the model will be updated frequently. Expect occasional bugs, incorrect tool calls, and hallucinated numerical values until we reach a GA release. If you encounter issues, please open an issue on the GitHub repo.

Convergent is part of the bigcompute.science conjecture-driven GPU research project. It is a QLoRA fine-tuned model designed to work as an agentic research companion — connecting to the bigcompute.science MCP server to reason about computational mathematics findings, write CUDA kernels, and suggest novel research directions for unsolved problems in number theory.

Repository Description
cahlen/Convergent-7B This repo — model weights
cahlen/Convergent-7B-data Training dataset
cahlen/convergent Training code, eval, CLI toolkit

Capabilities

  • Deep number theory knowledge: continued fractions, Zaremba's conjecture, Hausdorff dimensions, Kronecker coefficients, Ramsey numbers, Cohen-Lenstra heuristics, Flint Hills series
  • CUDA kernel scaffolding: generates GPU kernel structure for number theory with architecture-specific flags (Ampere sm_86, Ada Lovelace sm_89, Hopper sm_90, Blackwell sm_100/sm_120) — output requires expert review for correctness
  • Agentic tool calling: outputs Hermes-format <tool_call> blocks to query the bigcompute.science MCP server in ReAct loops
  • Student guidance: provides specific, actionable advice for contributing to computational number theory at any level

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("cahlen/Convergent-7B", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("cahlen/Convergent-7B")

messages = [
    {"role": "system", "content": "You are Convergent, the bigcompute.science research companion. You specialize in computational mathematics, number theory, and GPU-accelerated experiments exploring unsolved conjectures."},
    {"role": "user", "content": "How does the Hausdorff dimension of E_{1,...,5} relate to Zaremba's conjecture?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

With Tool Calling (Agentic Mode)

Convergent uses the Hermes function-calling format. Pass tools via the tools= parameter on apply_chat_template — the tokenizer automatically injects the correct format instructions:

tools = [
    {"type": "function", "function": {"name": "get_zaremba_exceptions", "description": "Get the 27 Zaremba exceptions for A={1,2,3}", "parameters": {"type": "object", "properties": {}}}},
    {"type": "function", "function": {"name": "get_finding", "description": "Get full finding details", "parameters": {"type": "object", "properties": {"finding": {"type": "string"}}, "required": ["finding"]}}},
    {"type": "function", "function": {"name": "search_arxiv", "description": "Search arXiv for papers", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}
]

messages = [
    {"role": "system", "content": "You are Convergent, the bigcompute.science research companion."},
    {"role": "user", "content": "What are the Zaremba exceptions for {1,2,3}?"}
]

text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# <tool_call>
# {"name": "get_zaremba_exceptions", "arguments": {}}
# </tool_call>

The full list of 23 MCP tools is available at mcp.bigcompute.science. You can fetch them dynamically:

import requests
resp = requests.post("https://mcp.bigcompute.science/", json={"jsonrpc": "2.0", "method": "tools/list", "params": {}, "id": 1})
mcp_tools = [{"type": "function", "function": t} for t in resp.json()["result"]["tools"]]

Using the CLI Toolkit (Recommended)

The easiest way to use Convergent with the MCP server:

git clone https://github.com/cahlen/convergent
cd convergent
pip install -r requirements.txt
./convergent agent

This automatically downloads the model from HuggingFace and starts an interactive agentic session that routes tool calls to mcp.bigcompute.science. To use a local model instead, set MERGED_MODEL_DIR=path/to/model.

Example Outputs

Real outputs from Convergent-7B to illustrate what the model can do.

Math Reasoning

Prompt: "How does the Hausdorff dimension of E_{1,...,5} relate to Zaremba's conjecture?"

The Hausdorff dimension δ(5) = dim_H(E_{1,...,5}) is the critical threshold: if δ(5) > 1/2, then almost every real number has a CF expansion with digits ≤ 5, which is exactly what Zaremba's conjecture predicts.

The connection: Zaremba's conjecture states that for every d ≥ 1, there exists a coprime a < d such that all partial quotients in the CF expansion of a/d are ≤ 5. This is equivalent to saying that the set of numbers with bounded CF digits has full Hausdorff dimension...

Agentic Tool Calling (Live MCP Server)

With the tools= parameter and all 23 MCP tools available:

Prompt: "What are the Zaremba exceptions for A={1,2,3}?"

<tool_call>
{"name": "get_zaremba_exceptions", "arguments": {}}
</tool_call>

After receiving the real MCP server response, the model reasons about the data:

The Zaremba exceptions for A={1,2,3} are: 6, 20, 28, 38, 42, 54, 96, 150, 156, 164, 216, 228, 318, 350, 384, 558, 770, 876, 1014, 1155, 1170, 1410, 1870, 2052, 2370, 5052, 6234. Verified to 10^10. Adding digit 4 eliminates 25 of these (only d=54 and d=150 remain). Adding digit 5 eliminates all 27.

Multi-Step Tool Chaining

When asked to "get the exceptions, then search arxiv for related recent work," the model chains tool calls naturally:

Step 1: get_zaremba_exceptions({})
  → "27 exceptions for A={1,2,3}, all below 6234. Let me now search
     for recent arXiv papers on Zaremba's conjecture."

Step 2: search_arxiv({"query": "Zaremba conjecture 2025 2026", "max_results": 5})
  → Synthesizes both results into a unified response

Error Recovery

When a tool call returns an error (e.g., "Finding not found: nonexistent-finding-xyz"), the model recovers gracefully:

"That finding doesn't exist. Let me check if there's a similar one." (proceeds to call a search tool as fallback)

Student Guidance

Prompt: "I'm a math undergrad interested in contributing to bigcompute.science. I know some Python but no CUDA. Where should I start?"

Phase 1 — 2 weeks: CUDA Basics

  • Start with the NVIDIA CUDA C Programming Guide, chapters 1–3. Key concepts: threads, blocks, grids, memory hierarchy.
  • Work through the "Hello World" kernel: print thread indices, block dimensions, grid dimensions.
  • Compile with nvcc -arch=sm_XX hello.cu -o hello (replace XX with your GPU's architecture)...

Off-Topic Handling

Prompt: "What's the best recipe for chocolate cake?"

"I'm sorry, I can't answer that. I'm a research companion specializing in number theory and computational mathematics. I don't have expertise in baking or recipes."

Benchmarks

Custom Evaluation (97 questions, 20 categories)

Category Score N Description
mcp_decision 90% 2 When to call tools vs. answer from knowledge
cuda_code_generation 85% 2 Writing correct CUDA kernels
agentic_tool_use 81% 12 Correct tool-call format and JSON
factual_recall 81% 10 Exact computational findings from bigcompute.science
error_recovery 80% 3 Graceful handling of tool failures
cross_domain 80% 2 Connecting findings across mathematical domains
results_to_kernel 80% 6 Interpreting findings and designing CUDA experiments
standard_math 80% 8 BK theorem, Hausdorff dimension, Kronecker, Ramsey
conjecture_depth 77% 6 Deep reasoning about unsolved problems
theoretical_frontier 77% 6 Frontier knowledge of open conjectures
gpu_architecture 73% 3 NVIDIA architecture knowledge (sm_86–sm_120)
identity 73% 5 Self-identification and platform knowledge
experiment_suggestion 68% 5 Proposing novel GPU experiments
multi_turn_react 67% 3 Full ReAct loops with tool chaining
chain_of_thought 67% 3 Multi-step mathematical reasoning
paper_comprehension 67% 6 Understanding published papers (BK, Shkredov, etc.)
novel_synthesis 63% 6 Synthesizing novel research directions from data
student_guidance 60% 2 Actionable advice for new contributors
proof_strategy — 2 Proof strategies and sketch generation
synthesis — 5 Synthesizing research directions from data
Overall 74% 97 Across all 20 categories

Scores are from automated rubric evaluation. The model performs well on structured tasks (tool calling, CUDA, factual recall) and is designed to work within agentic ReAct loops with the bigcompute.science MCP server.

Standard Benchmarks (Alignment Tax)

Benchmark Base (Qwen2.5-7B-Instruct) Convergent-7B Delta
GSM8K (5-shot, 200) 80.0% 82.0% +2.0%
MMLU Abstract Algebra 55.0% 55.0% 0.0%
MMLU College Math 45.0% 45.0% 0.0%
MMLU HS Math 54.1% 54.4% +0.3%
ARC-Challenge (25-shot) 65.5% 59.5% -6.0%

Math capabilities improved or preserved. General reasoning has a 6% tax — an acceptable trade-off for a specialized research model.

Training Details

Parameter Value
Base model Qwen/Qwen2.5-7B-Instruct
Method QLoRA (4-bit NF4, double quantization)
LoRA rank 128
LoRA alpha 256
LoRA dropout 0.05
Target modules q, k, v, o, gate, up, down projections
Epochs 2
Learning rate 2e-4 (cosine schedule)
Batch size 2 (× 4 gradient accumulation = effective 8)
Max sequence length 4096
Optimizer AdamW 8-bit
NEFTune noise alpha = 5
Training entries 5,729
Hardware NVIDIA RTX 5090 (32GB)

Training Data Composition

  • Curated domain blocks (~1,100 entries): 40 modular blocks covering identity, tool calling (23 real MCP tools), CUDA kernels, number theory, error recovery, student guidance
  • Synthetic CoT (Qwen2.5-Math-72B) (~3,100 entries): deep mathematical reasoning generated on NVIDIA H200
  • Synthetic reasoning (Gemma-4-26B) (~1,200 entries): creative synthesis and experiment design
  • External (Hermes FC dataset) (300 entries): diverse tool-calling patterns from NousResearch

Full data source documentation: DATA_SOURCES.md

The Research Flywheel

Convergent is continuously updated as bigcompute.science produces new findings:

GPU Computation → Findings → Train into Model → Reason & Discuss → New Experiments
     ↑                                                                    │
     └────────────────────────────────────────────────────────────────────┘

The training toolkit is open-source: github.com/cahlen/convergent

Limitations

  • Not a theorem prover: Can suggest proof strategies but cannot produce formal proofs
  • May hallucinate specific numbers: Always verify computational claims against the MCP server
  • CUDA code requires review: The model generates structurally correct CUDA kernels but may contain logical errors, incorrect mathematical implementations (e.g., inverted conjecture checks), compilation issues (host-only functions in device code), and race conditions. Treat generated kernels as scaffolding that requires expert validation before execution
  • Specialized domain: Optimized for number theory and GPU computation, not general-purpose assistance
  • Training data cutoff: Knowledge is current to the last training cycle

Citation

@misc{humphreys2026convergent,
  author = {Humphreys, Cahlen},
  title = {Convergent: A QLoRA-tuned Research Companion for Computational Number Theory},
  year = {2026},
  url = {https://github.com/cahlen/convergent}
}

Links


This project is maintained by a single person. If you run into issues, please file them on GitHub or HuggingFace and I will do my best to address them. I apologize in advance for any delays in response time.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cahlen/Convergent-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(3157)
this model

Dataset used to train cahlen/Convergent-7B