Auxiliary datasets for tmax.
Hamish Ivison
hamishivi
AI & ML interests
NLP :)
Recent Activity
updated a model about 21 hours ago
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax__42_step300 published a model about 21 hours ago
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax__42_step300Organizations
RLVE
Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
-
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Paper • 2511.07317 • Published • 18 -
hamishivi/OpenThinker3-1.5B-RLVE
Text Generation • 2B • Updated • 17 • • 2 -
hamishivi/Nemotron-Research-Reasoning-Qwen-1.5B-v2-RLVE
Text Generation • 2B • Updated • 13 • • 3
Tmax Extras
Auxiliary datasets for tmax.
RLVE
Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
-
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Paper • 2511.07317 • Published • 18 -
hamishivi/OpenThinker3-1.5B-RLVE
Text Generation • 2B • Updated • 17 • • 2 -
hamishivi/Nemotron-Research-Reasoning-Qwen-1.5B-v2-RLVE
Text Generation • 2B • Updated • 13 • • 3
models 335
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax__42_step300
2.65M • Updated
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax_step240
2.65M • Updated • 165
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax_step200
2.65M • Updated • 103
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax_step160
2.65M • Updated • 199
hamishivi/swerl_qwen35_4b_fp32lm_dppo
Updated • 464
hamishivi/swerl_qwen35_2b_fp32lm_dppo
Updated • 254
hamishivi/swerl_qwen35_9b_fp32lm_dppo_swesmith
Updated • 216
hamishivi/qwen35_9b_tmax_skill_tax_no_tool_call_sft
9B • Updated • 84
hamishivi/swerl_qwen35_27b_fp32lm_dppo_tmax_step100
2.65M • Updated • 125
hamishivi/Qwen3.5-2B
Image-Text-to-Text • 2B • Updated • 3.13k
datasets 227
hamishivi/tmax-sft-big
Viewer • Updated • 327k • 35
hamishivi/agent-task-endless-terminals
Viewer • Updated • 2.49k • 71
hamishivi/sft_ablations_scientific_minimax_v1_sanitized
Viewer • Updated • 2.23k • 50
hamishivi/sft_ablations_bc_only_v1_sanitized
Viewer • Updated • 5.59k • 41
hamishivi/agent-task-cli-gym
Viewer • Updated • 1.55k • 69
hamishivi/agent-task-swe-smith
Viewer • Updated • 59.1k • 53
hamishivi/agent-task-r2e-gym
Viewer • Updated • 4.58k • 57
hamishivi/agent-task-combined
Viewer • Updated • 27k • 24
hamishivi/sft_ablations_redsearcher_sft_sanitized
Viewer • Updated • 9.81k • 26
hamishivi/tmax-sft-skill-tax-20260505-2.2k-combined-balanced-qwen3.6-27b-thinking-no-tool-call
Viewer • Updated • 16.5k • 89