SOTAVerified

Math

Papers

Showing 501550 of 1596 papers

TitleStatusHype
Ape210K: A Large-Scale and Template-Rich Dataset of Math Word ProblemsCode1
Graph-to-Tree Learning for Solving Math Word ProblemsCode1
A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for HandCode1
SIPA: A Simple Framework for Efficient NetworksCode1
StereoSet: Measuring stereotypical bias in pretrained language modelsCode1
Injecting Numerical Reasoning Skills into Language ModelsCode1
Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word ProblemCode1
ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document ImagesCode1
Discovering Mathematical Objects of Interest -- A Study of Mathematical NotationsCode1
A Tree-Structured Decoder for Image-to-Markup GenerationCode1
Template-based math word problem solvers with recursive neural networksCode1
From GAN to WGANCode1
VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks0
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation0
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training0
Personalized Exercise Recommendation with Semantically-Grounded Knowledge TracingCode0
Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding0
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs0
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Activation Steering for Chain-of-Thought CompressionCode0
Effects of structure on reasoning in instance-level Self-DiscoverCode0
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model0
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test0
Bridging Offline and Online Reinforcement Learning for LLMs0
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length ControlCode0
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs0
Multi-lingual Functional Evaluation for Large Language Models0
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs0
Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
Shrinking the Generation-Verification Gap with Weak Verifiers0
Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study0
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks0
Utility-Driven Speculative Decoding for Mixture-of-Experts0
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models0
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy0
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks0
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models0
VGR: Visual Grounded Reasoning0
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards0
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference OptimizationCode0
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games0
Reinforce LLM Reasoning through Multi-Agent Reflection0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs0
Learning to Reason Across Parallel Samples for LLM Reasoning0
Show:102550
← PrevPage 11 of 32Next →

No leaderboard results yet.