SOTAVerified

Math

Papers

Showing 301350 of 1596 papers

TitleStatusHype
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking0
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training0
Gemma 3 Technical Report0
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling0
Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels0
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the WildCode7
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning0
Reasoning to Learn from Latent ThoughtsCode2
AgentRxiv: Towards Collaborative Autonomous ResearchCode9
Long Is More Important Than Difficult for Training Reasoning Models0
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection0
ChatBench: From Static Benchmarks to Human-AI EvaluationCode0
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning ModelsCode2
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach0
Pensez: Less Data, Better Reasoning -- Rethinking French LLM0
xLSTM 7B: A Recurrent LLM for Fast and Efficient InferenceCode7
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?0
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data0
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and BeyondCode4
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web SearchCode1
Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression0
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-ErrorCode0
Understanding the Logical Capabilities of Large Language Models via Out-of-Context Representation Learning0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory0
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics0
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning0
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language ModelsCode5
Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance0
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models0
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning0
Compositional Causal Reasoning Evaluation in Language Models0
HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks0
Benchmarking Reasoning Robustness in Large Language Models0
Better Process Supervision with Bi-directional Rewarding Signals0
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning0
START: Self-taught Reasoner with Tools0
Performance Comparison of Large Language Models on Advanced Calculus Problems0
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach0
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean40
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language ModelsCode1
What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret0
When an LLM is apprehensive about its answers -- and when its uncertainty is justifiedCode0
Show:102550
← PrevPage 7 of 32Next →

No leaderboard results yet.