SOTAVerified

Math

Papers

Showing 276300 of 1596 papers

TitleStatusHype
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?Code1
Efficient Process Reward Model Training via Active LearningCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for CompressionCode1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language ModelsCode1
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?Code1
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning ModelsCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web SearchCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Self-Training Elicits Concise Reasoning in Large Language ModelsCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
Reasoning with Reinforced Functional Token TuningCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Thinking Preference OptimizationCode1
Show:102550
← PrevPage 12 of 64Next →

No leaderboard results yet.