SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1100111050 of 661570 papers

TitleStatusHype
Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video UnderstandingCode0
Tell2Adapt: A Unified Framework for Source Free Unsupervised Domain Adaptation via Vision Foundation ModelCode0
Survive at All Costs: Exploring LLM's Risky Behaviors under Survival PressureCode0
Mario: Multimodal Graph Reasoning with Large Language ModelsCode0
Embedded Inter-Subject Variability in Adversarial Learning for Inertial Sensor-Based Human Activity RecognitionCode0
Planner Aware Path Learning in Diffusion Language Models TrainingCode0
SURE: Semi-dense Uncertainty-REfined Feature MatchingCode0
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language ModelsCode0
Progressive Residual Warmup for Language Model PretrainingCode0
ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason ChunkingCode0
Temporal Misalignment Attacks against Multimodal Perception in Autonomous DrivingCode0
MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language ModelsCode0
Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form GenerationCode0
IF-RewardBench: Benchmarking Judge Models for Instruction-Following EvaluationCode0
Mitigating Instance Entanglement in Instance-Dependent Partial Label LearningCode0
Causally Robust Reward Learning from Reason-Augmented Preference FeedbackCode0
Free Lunch for Pass@k? Low Cost Diverse Sampling for Diffusion Language ModelsCode0
VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic MemoryCode0
FedBCD:Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated LearningCode0
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured SparsityCode0
Logi-PAR: Logic-Infused Patient Activity Recognition via Differentiable RuleCode0
Stable-LoRA: Stabilizing Feature Learning of Low-Rank AdaptationCode0
VietJobs: A Vietnamese Job Advertisement DatasetCode0
SlideSparse: Fast and Flexible (2N-2):2N Structured SparsityCode0
ORMOT: A Dataset and Framework for Omnidirectional Referring Multi-Object TrackingCode0
Judge Reliability Harness: Stress Testing the Reliability of LLM JudgesCode0
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival AnalysisCode0
Towards Provably Unbiased LLM Judges via Bias-Bounded EvaluationCode0
Making Reconstruction FID Predictive of Diffusion Generation FIDCode0
Any to Full: Prompting Depth Anything for Depth Completion in One StageCode0
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video GenerationCode0
BiEvLight: Bi-level Learning of Task-Aware Event Refinement for Low-Light Image EnhancementCode0
Pursuing Minimal Sufficiency in Spatial ReasoningCode0
Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot LearningCode0
Semantic Class Distribution Learning for Debiasing Semi-Supervised Medical Image SegmentationCode0
Interactive BenchmarksCode0
MotionStream: Real-Time Video Generation with Interactive Motion Controls4
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs2
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model1
-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space1
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning1
LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery1
RealWonder: Real-Time Physical Action-Conditioned Video Generation2
KLASS: KL-Guided Fast Inference in Masked Diffusion Models1
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation2
Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels2
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL1
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline1
The Fragility Of Moral Judgment In Large Language Models0
Longitudinal Lesion Inpainting in Brain MRI via 3D Region Aware Diffusion0
Show:102550
← PrevPage 221 of 13232Next →