nvidia/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1 Viewer • Updated 23 days ago • 97k • 882 • 14
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary Paper • 2601.10201 • Published Jan 15 • 9
HanningZhang/deepseek_only_conjecture_claude_deepseek_train_data_max1_5e-7_bs32_decay1e-6_2ep_ep1 Text Generation • 7B • Updated Jan 12 • 1
HanningZhang/deepseek_only_conjecture_claude_deepseek_train_data_max1_5e-7_bs32_decay1e-6_2ep_ep1 Text Generation • 7B • Updated Jan 12 • 1