SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1595116000 of 474278 papers

TitleStatusHype
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object DetectionCode1
Talk to Your Slides: Language-Driven Agents for Efficient Slide EditingCode1
Ranked Voting based Self-Consistency of Large Language ModelsCode1
MOSAIK: Multi-Origin Spatial Transcriptomics Analysis and Integration KitCode1
The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender SystemsCode1
Flash Invariant Point AttentionCode1
PoE-World: Compositional World Modeling with Products of Programmatic ExpertsCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World ScenariosCode1
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing FrameworkCode1
Learning Dense Hand Contact Estimation from Imbalanced DataCode1
Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger BridgeCode1
BLEUBERI: BLEU is a surprisingly effective reward for instruction followingCode1
RAGSynth: Synthetic Data for Robust and Faithful RAG Component OptimizationCode1
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge GraphsCode1
EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenesCode1
Physics-informed Temporal Alignment for Auto-regressive PDE Foundation ModelsCode1
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image EditingCode1
FP64 is All You Need: Rethinking Failure Modes in Physics-Informed Neural NetworksCode1
Accurate KV Cache Quantization with Outlier Tokens TracingCode1
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability TheoryCode1
PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose EstimationCode1
MatTools: Benchmarking Large Language Models for Materials Science ToolsCode1
Reasoning on a Budget: Miniaturizing DeepSeek R1 with SFT-GRPO Alignment for Instruction-Tuned LLMsCode1
Unifying Segment Anything in Microscopy with Multimodal Large Language ModelCode1
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion ModelsCode1
Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and BenchmarksCode1
AutoRAN: Weak-to-Strong Jailbreaking of Large Reasoning ModelsCode1
ImagineBench: Evaluating Reinforcement Learning with Large Language Model RolloutsCode1
An Introduction to Discrete Variational AutoencodersCode1
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and ForecastingCode1
Multi-Token Prediction Needs RegistersCode1
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference OptimizationCode1
LLM-Explorer: Towards Efficient and Affordable LLM-based Exploration for Mobile AppsCode1
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field TestsCode1
Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G NetworksCode1
Learned Lightweight Smartphone ISP with Unpaired DataCode1
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context OptimizationCode1
Rethinking Repetition Problems of LLMs in Code GenerationCode1
Seasonal Forecasting of Pan-Arctic Sea Ice with State Space ModelCode1
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and O(T) ComplexityCode1
MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation ModelsCode1
HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion SegmentationCode1
Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian SplattingCode1
From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision MakingCode1
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot LearningCode1
A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024Code1
Rethinking Prompt Optimizers: From Prompt Merits to OptimizationCode1
Hierarchical Document Refinement for Long-context Retrieval-augmented GenerationCode1
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story GenerationCode1
Show:102550
← PrevPage 320 of 9486Next →