Convergent-7B

The bigcompute.science research companion model.

Early Preview — This model is a work in progress, expressly trained to act as a research assistant with the bigcompute.science MCP server. We are still very early in the training process and the model will be updated frequently. Expect occasional bugs, incorrect tool calls, and hallucinated numerical values until we reach a GA release. If you encounter issues, please open an issue on the GitHub repo.

Convergent is part of the bigcompute.science conjecture-driven GPU research project. It is a QLoRA fine-tuned model designed to work as an agentic research companion — connecting to the bigcompute.science MCP server to reason about computational mathematics findings, write CUDA kernels, and suggest novel research directions for unsolved problems in number theory.

Repository	Description
cahlen/Convergent-7B	This repo — model weights
cahlen/Convergent-7B-data	Training dataset
cahlen/convergent	Training code, eval, CLI toolkit

Capabilities

Deep number theory knowledge: continued fractions, Zaremba's conjecture, Hausdorff dimensions, Kronecker coefficients, Ramsey numbers, Cohen-Lenstra heuristics, Flint Hills series
CUDA kernel scaffolding: generates GPU kernel structure for number theory with architecture-specific flags (Ampere sm_86, Ada Lovelace sm_89, Hopper sm_90, Blackwell sm_100/sm_120) — output requires expert review for correctness
Agentic tool calling: outputs Hermes-format <tool_call> blocks to query the bigcompute.science MCP server in ReAct loops
Student guidance: provides specific, actionable advice for contributing to computational number theory at any level

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("cahlen/Convergent-7B", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("cahlen/Convergent-7B")

messages = [
    {"role": "system", "content": "You are Convergent, the bigcompute.science research companion. You specialize in computational mathematics, number theory, and GPU-accelerated experiments exploring unsolved conjectures."},
    {"role": "user", "content": "How does the Hausdorff dimension of E_{1,...,5} relate to Zaremba's conjecture?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

With Tool Calling (Agentic Mode)

Convergent uses the Hermes function-calling format. Pass tools via the tools= parameter on apply_chat_template — the tokenizer automatically injects the correct format instructions:

tools = [
    {"type": "function", "function": {"name": "get_zaremba_exceptions", "description": "Get the 27 Zaremba exceptions for A={1,2,3}", "parameters": {"type": "object", "properties": {}}}},
    {"type": "function", "function": {"name": "get_finding", "description": "Get full finding details", "parameters": {"type": "object", "properties": {"finding": {"type": "string"}}, "required": ["finding"]}}},
    {"type": "function", "function": {"name": "search_arxiv", "description": "Search arXiv for papers", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}
]

messages = [
    {"role": "system", "content": "You are Convergent, the bigcompute.science research companion."},
    {"role": "user", "content": "What are the Zaremba exceptions for {1,2,3}?"}
]

text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# <tool_call>
# {"name": "get_zaremba_exceptions", "arguments": {}}
# </tool_call>

The full list of 23 MCP tools is available at mcp.bigcompute.science. You can fetch them dynamically:

import requests
resp = requests.post("https://mcp.bigcompute.science/", json={"jsonrpc": "2.0", "method": "tools/list", "params": {}, "id": 1})
mcp_tools = [{"type": "function", "function": t} for t in resp.json()["result"]["tools"]]

Using the CLI Toolkit (Recommended)

The easiest way to use Convergent with the MCP server:

git clone https://github.com/cahlen/convergent
cd convergent
pip install -r requirements.txt
./convergent agent

This automatically downloads the model from HuggingFace and starts an interactive agentic session that routes tool calls to mcp.bigcompute.science. To use a local model instead, set MERGED_MODEL_DIR=path/to/model.

Example Outputs

Real outputs from Convergent-7B to illustrate what the model can do.

Math Reasoning

Prompt: "How does the Hausdorff dimension of E_{1,...,5} relate to Zaremba's conjecture?"

The Hausdorff dimension δ(5) = dim_H(E_{1,...,5}) is the critical threshold: if δ(5) > 1/2, then almost every real number has a CF expansion with digits ≤ 5, which is exactly what Zaremba's conjecture predicts.

The connection: Zaremba's conjecture states that for every d ≥ 1, there exists a coprime a < d such that all partial quotients in the CF expansion of a/d are ≤ 5. This is equivalent to saying that the set of numbers with bounded CF digits has full Hausdorff dimension...

Agentic Tool Calling (Live MCP Server)

With the tools= parameter and all 23 MCP tools available:

Prompt: "What are the Zaremba exceptions for A={1,2,3}?"

<tool_call>
{"name": "get_zaremba_exceptions", "arguments": {}}
</tool_call>

After receiving the real MCP server response, the model reasons about the data:

The Zaremba exceptions for A={1,2,3} are: 6, 20, 28, 38, 42, 54, 96, 150, 156, 164, 216, 228, 318, 350, 384, 558, 770, 876, 1014, 1155, 1170, 1410, 1870, 2052, 2370, 5052, 6234. Verified to 10^10. Adding digit 4 eliminates 25 of these (only d=54 and d=150 remain). Adding digit 5 eliminates all 27.

Multi-Step Tool Chaining

When asked to "get the exceptions, then search arxiv for related recent work," the model chains tool calls naturally:

Step 1: get_zaremba_exceptions({})
  → "27 exceptions for A={1,2,3}, all below 6234. Let me now search
     for recent arXiv papers on Zaremba's conjecture."

Step 2: search_arxiv({"query": "Zaremba conjecture 2025 2026", "max_results": 5})
  → Synthesizes both results into a unified response

Error Recovery

When a tool call returns an error (e.g., "Finding not found: nonexistent-finding-xyz"), the model recovers gracefully:

"That finding doesn't exist. Let me check if there's a similar one." (proceeds to call a search tool as fallback)

Student Guidance

Prompt: "I'm a math undergrad interested in contributing to bigcompute.science. I know some Python but no CUDA. Where should I start?"

Phase 1 — 2 weeks: CUDA Basics

Start with the NVIDIA CUDA C Programming Guide, chapters 1–3. Key concepts: threads, blocks, grids, memory hierarchy.

Work through the "Hello World" kernel: print thread indices, block dimensions, grid dimensions.

Compile with nvcc -arch=sm_XX hello.cu -o hello (replace XX with your GPU's architecture)...

Off-Topic Handling

Prompt: "What's the best recipe for chocolate cake?"

"I'm sorry, I can't answer that. I'm a research companion specializing in number theory and computational mathematics. I don't have expertise in baking or recipes."

Benchmarks

Custom Evaluation (97 questions, 20 categories)

Category	Score	N	Description
mcp_decision	90%	2	When to call tools vs. answer from knowledge
cuda_code_generation	85%	2	Writing correct CUDA kernels
agentic_tool_use	81%	12	Correct tool-call format and JSON
factual_recall	81%	10	Exact computational findings from bigcompute.science
error_recovery	80%	3	Graceful handling of tool failures
cross_domain	80%	2	Connecting findings across mathematical domains
results_to_kernel	80%	6	Interpreting findings and designing CUDA experiments
standard_math	80%	8	BK theorem, Hausdorff dimension, Kronecker, Ramsey
conjecture_depth	77%	6	Deep reasoning about unsolved problems
theoretical_frontier	77%	6	Frontier knowledge of open conjectures
gpu_architecture	73%	3	NVIDIA architecture knowledge (sm_86–sm_120)
identity	73%	5	Self-identification and platform knowledge
experiment_suggestion	68%	5	Proposing novel GPU experiments
multi_turn_react	67%	3	Full ReAct loops with tool chaining
chain_of_thought	67%	3	Multi-step mathematical reasoning
paper_comprehension	67%	6	Understanding published papers (BK, Shkredov, etc.)
novel_synthesis	63%	6	Synthesizing novel research directions from data
student_guidance	60%	2	Actionable advice for new contributors
proof_strategy	—	2	Proof strategies and sketch generation
synthesis	—	5	Synthesizing research directions from data
Overall	74%	97	Across all 20 categories

Scores are from automated rubric evaluation. The model performs well on structured tasks (tool calling, CUDA, factual recall) and is designed to work within agentic ReAct loops with the bigcompute.science MCP server.

Standard Benchmarks (Alignment Tax)

Benchmark	Base (Qwen2.5-7B-Instruct)	Convergent-7B	Delta
GSM8K (5-shot, 200)	80.0%	82.0%	+2.0%
MMLU Abstract Algebra	55.0%	55.0%	0.0%
MMLU College Math	45.0%	45.0%	0.0%
MMLU HS Math	54.1%	54.4%	+0.3%
ARC-Challenge (25-shot)	65.5%	59.5%	-6.0%

Math capabilities improved or preserved. General reasoning has a 6% tax — an acceptable trade-off for a specialized research model.

Training Details

Parameter	Value
Base model	Qwen/Qwen2.5-7B-Instruct
Method	QLoRA (4-bit NF4, double quantization)
LoRA rank	128
LoRA alpha	256
LoRA dropout	0.05
Target modules	q, k, v, o, gate, up, down projections
Epochs	2
Learning rate	2e-4 (cosine schedule)
Batch size	2 (× 4 gradient accumulation = effective 8)
Max sequence length	4096
Optimizer	AdamW 8-bit
NEFTune noise	alpha = 5
Training entries	5,729
Hardware	NVIDIA RTX 5090 (32GB)

Training Data Composition

Curated domain blocks (~1,100 entries): 40 modular blocks covering identity, tool calling (23 real MCP tools), CUDA kernels, number theory, error recovery, student guidance
Synthetic CoT (Qwen2.5-Math-72B) (~3,100 entries): deep mathematical reasoning generated on NVIDIA H200
Synthetic reasoning (Gemma-4-26B) (~1,200 entries): creative synthesis and experiment design
External (Hermes FC dataset) (300 entries): diverse tool-calling patterns from NousResearch

Full data source documentation: DATA_SOURCES.md

The Research Flywheel

Convergent is continuously updated as bigcompute.science produces new findings:

GPU Computation → Findings → Train into Model → Reason & Discuss → New Experiments
     ↑                                                                    │
     └────────────────────────────────────────────────────────────────────┘

The training toolkit is open-source: github.com/cahlen/convergent

Limitations

Not a theorem prover: Can suggest proof strategies but cannot produce formal proofs
May hallucinate specific numbers: Always verify computational claims against the MCP server
CUDA code requires review: The model generates structurally correct CUDA kernels but may contain logical errors, incorrect mathematical implementations (e.g., inverted conjecture checks), compilation issues (host-only functions in device code), and race conditions. Treat generated kernels as scaffolding that requires expert validation before execution
Specialized domain: Optimized for number theory and GPU computation, not general-purpose assistance
Training data cutoff: Knowledge is current to the last training cycle

Citation

@misc{humphreys2026convergent,
  author = {Humphreys, Cahlen},
  title = {Convergent: A QLoRA-tuned Research Companion for Computational Number Theory},
  year = {2026},
  url = {https://github.com/cahlen/convergent}
}

Links

bigcompute.science — Conjecture-driven GPU research in computational mathematics
MCP Server — Model Context Protocol server for experimental data and tools
Training Toolkit — Full pipeline source code on GitHub
Training Data — Complete training dataset on HuggingFace
guerrillamathematics.com — Mathematical research blog

This project is maintained by a single person. If you run into issues, please file them on GitHub or HuggingFace and I will do my best to address them. I apologize in advance for any delays in response time.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for cahlen/Convergent-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct