SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 72517275 of 474278 papers

TitleStatusHype
MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and DefenseCode2
Ensured: Explanations for Decreasing the Epistemic Uncertainty in PredictionsCode2
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer TokensCode2
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention CausalityCode2
TurtleBench: Evaluating Top Language Models via Real-World Yes/No PuzzlesCode2
Differential TransformerCode2
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video GenerationCode2
Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian SplattingCode2
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific DiscoveryCode2
Video Prediction Transformers without Recurrence or ConvolutionCode2
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse AttentionCode2
Recent Advances of Multimodal Continual Learning: A Comprehensive SurveyCode2
CAR: Controllable Autoregressive Modeling for Visual GenerationCode2
SecAlign: Defending Against Prompt Injection with Preference OptimizationCode2
Hammer: Robust Function-Calling for On-Device Language Models via Function MaskingCode2
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable DiffusionCode2
LiteVLoc: Map-Lite Visual Localization for Image Goal NavigationCode2
Generative Flows on Synthetic Pathway for Drug DesignCode2
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community RetrievalCode2
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated WeightsCode2
Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment AnalysisCode2
dattri: A Library for Efficient Data AttributionCode2
TimeBridge: Non-Stationarity Matters for Long-term Time Series ForecastingCode2
GenSim: A General Social Simulation Platform with Large Language Model based AgentsCode2
UniMuMo: Unified Text, Music and Motion GenerationCode2
Show:102550
← PrevPage 291 of 18972Next →