SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1530115350 of 474278 papers

TitleStatusHype
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout ReplayCode1
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia GamesCode1
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video GenerationCode1
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning ModelsCode1
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection LearningCode1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual SimulationsCode1
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-ViewCode1
Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic DataCode1
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long ContextsCode1
Progressive Tempering Sampler with DiffusionCode1
MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road EnvironmentsCode1
macOSWorld: A Multilingual Interactive Benchmark for GUI AgentsCode1
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM ReasoningCode1
OSGNet @ Ego4D Episodic Memory Challenge 2025Code1
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian SplattingCode1
El0ps: An Exact L0-regularized Problems SolverCode1
TokAlign: Efficient Vocabulary Adaptation via Token AlignmentCode1
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data SynthesisCode1
Target Semantics Clustering via Text Representations for Robust Universal Domain AdaptationCode1
VLMs Can Aggregate Scattered Training PatchesCode1
A Generic Branch-and-Bound Algorithm for _0-Penalized Problems with Supplementary MaterialCode1
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data AnnotationCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
TracLLM: A Generic Framework for Attributing Long Context LLMsCode1
RewardAnything: Generalizable Principle-Following Reward ModelsCode1
SuperWriter: Reflection-Driven Long-Form Generation with Large Language ModelsCode1
Even Faster Hyperbolic Random Forests: A Beltrami-Klein Wrapper ApproachCode1
AdaDecode: Accelerating LLM Decoding with Adaptive Layer ParallelismCode1
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician ValidationCode1
POSS: Position Specialist Generates Better Draft for Speculative DecodingCode1
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object DetectorCode1
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle MatricesCode1
Zero-Shot Temporal Interaction Localization for Egocentric VideosCode1
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid MotionsCode1
Rethinking Machine Unlearning in Image Generation ModelsCode1
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation AlgorithmCode1
FlySearch: Exploring how vision-language models exploreCode1
Zero-Shot Tree Detection and Segmentation from Aerial Forest ImageryCode1
UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site DetectionCode1
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction ScenariosCode1
ThinkTank: A Framework for Generalizing Domain-Specific AI Agent Systems into Universal Collaborative Intelligence PlatformsCode1
NetPress: Dynamically Generated LLM Benchmarks for Network ApplicationsCode1
Adversarial Attacks on Robotic Vision Language Action ModelsCode1
GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region RemovalCode1
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning MitigationCode1
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View SynthesisCode1
Dense Match Summarization for Faster Two-view EstimationCode1
Simple, Good, Fast: Self-Supervised World Models Free of BaggageCode1
Adaptive Differential Denoising for Respiratory Sounds ClassificationCode1
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement LearningCode1
Show:102550
← PrevPage 307 of 9486Next →