Convergent-7B
The bigcompute.science research companion model.
Early Preview — This model is a work in progress, expressly trained to act as a research assistant with the bigcompute.science MCP server. We are still very early in the training process and the model will be updated frequently. Expect occasional bugs, incorrect tool calls, and hallucinated numerical values until we reach a GA release. If you encounter issues, please open an issue on the GitHub repo.
Convergent is part of the bigcompute.science conjecture-driven GPU research project. It is a QLoRA fine-tuned model designed to work as an agentic research companion — connecting to the bigcompute.science MCP server to reason about computational mathematics findings, write CUDA kernels, and suggest novel research directions for unsolved problems in number theory.
| Repository | Description |
|---|---|
| cahlen/Convergent-7B | This repo — model weights |
| cahlen/Convergent-7B-data | Training dataset |
| cahlen/convergent | Training code, eval, CLI toolkit |
Capabilities
- Deep number theory knowledge: continued fractions, Zaremba's conjecture, Hausdorff dimensions, Kronecker coefficients, Ramsey numbers, Cohen-Lenstra heuristics, Flint Hills series
- CUDA kernel scaffolding: generates GPU kernel structure for number theory with architecture-specific flags (Ampere sm_86, Ada Lovelace sm_89, Hopper sm_90, Blackwell sm_100/sm_120) — output requires expert review for correctness
- Agentic tool calling: outputs Hermes-format
<tool_call>blocks to query the bigcompute.science MCP server in ReAct loops - Student guidance: provides specific, actionable advice for contributing to computational number theory at any level
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("cahlen/Convergent-7B", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("cahlen/Convergent-7B")
messages = [
{"role": "system", "content": "You are Convergent, the bigcompute.science research companion. You specialize in computational mathematics, number theory, and GPU-accelerated experiments exploring unsolved conjectures."},
{"role": "user", "content": "How does the Hausdorff dimension of E_{1,...,5} relate to Zaremba's conjecture?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
With Tool Calling (Agentic Mode)
Convergent uses the Hermes function-calling format. Pass tools via the tools= parameter on apply_chat_template — the tokenizer automatically injects the correct format instructions:
tools = [
{"type": "function", "function": {"name": "get_zaremba_exceptions", "description": "Get the 27 Zaremba exceptions for A={1,2,3}", "parameters": {"type": "object", "properties": {}}}},
{"type": "function", "function": {"name": "get_finding", "description": "Get full finding details", "parameters": {"type": "object", "properties": {"finding": {"type": "string"}}, "required": ["finding"]}}},
{"type": "function", "function": {"name": "search_arxiv", "description": "Search arXiv for papers", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}
]
messages = [
{"role": "system", "content": "You are Convergent, the bigcompute.science research companion."},
{"role": "user", "content": "What are the Zaremba exceptions for {1,2,3}?"}
]
text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# <tool_call>
# {"name": "get_zaremba_exceptions", "arguments": {}}
# </tool_call>
The full list of 23 MCP tools is available at mcp.bigcompute.science. You can fetch them dynamically:
import requests
resp = requests.post("https://mcp.bigcompute.science/", json={"jsonrpc": "2.0", "method": "tools/list", "params": {}, "id": 1})
mcp_tools = [{"type": "function", "function": t} for t in resp.json()["result"]["tools"]]
Using the CLI Toolkit (Recommended)
The easiest way to use Convergent with the MCP server:
git clone https://github.com/cahlen/convergent
cd convergent
pip install -r requirements.txt
./convergent agent
This automatically downloads the model from HuggingFace and starts an interactive agentic session that routes tool calls to mcp.bigcompute.science. To use a local model instead, set MERGED_MODEL_DIR=path/to/model.
Example Outputs
Real outputs from Convergent-7B to illustrate what the model can do.
Math Reasoning
Prompt: "How does the Hausdorff dimension of E_{1,...,5} relate to Zaremba's conjecture?"
The Hausdorff dimension δ(5) = dim_H(E_{1,...,5}) is the critical threshold: if δ(5) > 1/2, then almost every real number has a CF expansion with digits ≤ 5, which is exactly what Zaremba's conjecture predicts.
The connection: Zaremba's conjecture states that for every d ≥ 1, there exists a coprime a < d such that all partial quotients in the CF expansion of a/d are ≤ 5. This is equivalent to saying that the set of numbers with bounded CF digits has full Hausdorff dimension...
Agentic Tool Calling (Live MCP Server)
With the tools= parameter and all 23 MCP tools available:
Prompt: "What are the Zaremba exceptions for A={1,2,3}?"
<tool_call>
{"name": "get_zaremba_exceptions", "arguments": {}}
</tool_call>
After receiving the real MCP server response, the model reasons about the data:
The Zaremba exceptions for A={1,2,3} are: 6, 20, 28, 38, 42, 54, 96, 150, 156, 164, 216, 228, 318, 350, 384, 558, 770, 876, 1014, 1155, 1170, 1410, 1870, 2052, 2370, 5052, 6234. Verified to 10^10. Adding digit 4 eliminates 25 of these (only d=54 and d=150 remain). Adding digit 5 eliminates all 27.
Multi-Step Tool Chaining
When asked to "get the exceptions, then search arxiv for related recent work," the model chains tool calls naturally:
Step 1: get_zaremba_exceptions({})
→ "27 exceptions for A={1,2,3}, all below 6234. Let me now search
for recent arXiv papers on Zaremba's conjecture."
Step 2: search_arxiv({"query": "Zaremba conjecture 2025 2026", "max_results": 5})
→ Synthesizes both results into a unified response
Error Recovery
When a tool call returns an error (e.g., "Finding not found: nonexistent-finding-xyz"), the model recovers gracefully:
"That finding doesn't exist. Let me check if there's a similar one." (proceeds to call a search tool as fallback)
Student Guidance
Prompt: "I'm a math undergrad interested in contributing to bigcompute.science. I know some Python but no CUDA. Where should I start?"
Phase 1 — 2 weeks: CUDA Basics
- Start with the NVIDIA CUDA C Programming Guide, chapters 1–3. Key concepts: threads, blocks, grids, memory hierarchy.
- Work through the "Hello World" kernel: print thread indices, block dimensions, grid dimensions.
- Compile with
nvcc -arch=sm_XX hello.cu -o hello(replace XX with your GPU's architecture)...
Off-Topic Handling
Prompt: "What's the best recipe for chocolate cake?"
"I'm sorry, I can't answer that. I'm a research companion specializing in number theory and computational mathematics. I don't have expertise in baking or recipes."
Benchmarks
Custom Evaluation (97 questions, 20 categories)
| Category | Score | N | Description |
|---|---|---|---|
| mcp_decision | 90% | 2 | When to call tools vs. answer from knowledge |
| cuda_code_generation | 85% | 2 | Writing correct CUDA kernels |
| agentic_tool_use | 81% | 12 | Correct tool-call format and JSON |
| factual_recall | 81% | 10 | Exact computational findings from bigcompute.science |
| error_recovery | 80% | 3 | Graceful handling of tool failures |
| cross_domain | 80% | 2 | Connecting findings across mathematical domains |
| results_to_kernel | 80% | 6 | Interpreting findings and designing CUDA experiments |
| standard_math | 80% | 8 | BK theorem, Hausdorff dimension, Kronecker, Ramsey |
| conjecture_depth | 77% | 6 | Deep reasoning about unsolved problems |
| theoretical_frontier | 77% | 6 | Frontier knowledge of open conjectures |
| gpu_architecture | 73% | 3 | NVIDIA architecture knowledge (sm_86–sm_120) |
| identity | 73% | 5 | Self-identification and platform knowledge |
| experiment_suggestion | 68% | 5 | Proposing novel GPU experiments |
| multi_turn_react | 67% | 3 | Full ReAct loops with tool chaining |
| chain_of_thought | 67% | 3 | Multi-step mathematical reasoning |
| paper_comprehension | 67% | 6 | Understanding published papers (BK, Shkredov, etc.) |
| novel_synthesis | 63% | 6 | Synthesizing novel research directions from data |
| student_guidance | 60% | 2 | Actionable advice for new contributors |
| proof_strategy | — | 2 | Proof strategies and sketch generation |
| synthesis | — | 5 | Synthesizing research directions from data |
| Overall | 74% | 97 | Across all 20 categories |
Scores are from automated rubric evaluation. The model performs well on structured tasks (tool calling, CUDA, factual recall) and is designed to work within agentic ReAct loops with the bigcompute.science MCP server.
Standard Benchmarks (Alignment Tax)
| Benchmark | Base (Qwen2.5-7B-Instruct) | Convergent-7B | Delta |
|---|---|---|---|
| GSM8K (5-shot, 200) | 80.0% | 82.0% | +2.0% |
| MMLU Abstract Algebra | 55.0% | 55.0% | 0.0% |
| MMLU College Math | 45.0% | 45.0% | 0.0% |
| MMLU HS Math | 54.1% | 54.4% | +0.3% |
| ARC-Challenge (25-shot) | 65.5% | 59.5% | -6.0% |
Math capabilities improved or preserved. General reasoning has a 6% tax — an acceptable trade-off for a specialized research model.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Method | QLoRA (4-bit NF4, double quantization) |
| LoRA rank | 128 |
| LoRA alpha | 256 |
| LoRA dropout | 0.05 |
| Target modules | q, k, v, o, gate, up, down projections |
| Epochs | 2 |
| Learning rate | 2e-4 (cosine schedule) |
| Batch size | 2 (× 4 gradient accumulation = effective 8) |
| Max sequence length | 4096 |
| Optimizer | AdamW 8-bit |
| NEFTune noise | alpha = 5 |
| Training entries | 5,729 |
| Hardware | NVIDIA RTX 5090 (32GB) |
Training Data Composition
- Curated domain blocks (~1,100 entries): 40 modular blocks covering identity, tool calling (23 real MCP tools), CUDA kernels, number theory, error recovery, student guidance
- Synthetic CoT (Qwen2.5-Math-72B) (~3,100 entries): deep mathematical reasoning generated on NVIDIA H200
- Synthetic reasoning (Gemma-4-26B) (~1,200 entries): creative synthesis and experiment design
- External (Hermes FC dataset) (300 entries): diverse tool-calling patterns from NousResearch
Full data source documentation: DATA_SOURCES.md
The Research Flywheel
Convergent is continuously updated as bigcompute.science produces new findings:
GPU Computation → Findings → Train into Model → Reason & Discuss → New Experiments
↑ │
└────────────────────────────────────────────────────────────────────┘
The training toolkit is open-source: github.com/cahlen/convergent
Limitations
- Not a theorem prover: Can suggest proof strategies but cannot produce formal proofs
- May hallucinate specific numbers: Always verify computational claims against the MCP server
- CUDA code requires review: The model generates structurally correct CUDA kernels but may contain logical errors, incorrect mathematical implementations (e.g., inverted conjecture checks), compilation issues (host-only functions in device code), and race conditions. Treat generated kernels as scaffolding that requires expert validation before execution
- Specialized domain: Optimized for number theory and GPU computation, not general-purpose assistance
- Training data cutoff: Knowledge is current to the last training cycle
Citation
@misc{humphreys2026convergent,
author = {Humphreys, Cahlen},
title = {Convergent: A QLoRA-tuned Research Companion for Computational Number Theory},
year = {2026},
url = {https://github.com/cahlen/convergent}
}
Links
- bigcompute.science — Conjecture-driven GPU research in computational mathematics
- MCP Server — Model Context Protocol server for experimental data and tools
- Training Toolkit — Full pipeline source code on GitHub
- Training Data — Complete training dataset on HuggingFace
- guerrillamathematics.com — Mathematical research blog
This project is maintained by a single person. If you run into issues, please file them on GitHub or HuggingFace and I will do my best to address them. I apologize in advance for any delays in response time.
- Downloads last month
- -