SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 72517300 of 661570 papers

TitleStatusHype
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video GenerationCode2
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific DiscoveryCode2
Ensured: Explanations for Decreasing the Epistemic Uncertainty in PredictionsCode2
SecAlign: Defending Against Prompt Injection with Preference OptimizationCode2
A Simple Image Segmentation Framework via In-Context ExamplesCode2
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse AttentionCode2
Learning Efficient and Effective Trajectories for Differential Equation-based Image RestorationCode2
Causal Context Adjustment Loss for Learned Image CompressionCode2
TurtleBench: Evaluating Top Language Models via Real-World Yes/No PuzzlesCode2
Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian SplattingCode2
Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNetCode2
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer TokensCode2
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention CausalityCode2
Differential TransformerCode2
Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment AnalysisCode2
Generative Flows on Synthetic Pathway for Drug DesignCode2
dattri: A Library for Efficient Data AttributionCode2
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community RetrievalCode2
GenSim: A General Social Simulation Platform with Large Language Model based AgentsCode2
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated WeightsCode2
LiteVLoc: Map-Lite Visual Localization for Image Goal NavigationCode2
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable DiffusionCode2
TimeBridge: Non-Stationarity Matters for Long-term Time Series ForecastingCode2
UniMuMo: Unified Text, Music and Motion GenerationCode2
Hammer: Robust Function-Calling for On-Device Language Models via Function MaskingCode2
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-ImprovementCode2
Distillation-Free One-Step Diffusion for Real-World Image Super-ResolutionCode2
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language ModelsCode2
An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple DomainsCode2
DeFoG: Discrete Flow Matching for Graph GenerationCode2
SyllableLM: Learning Coarse Semantic Units for Speech Language ModelsCode2
Learning Truncated Causal History Model for Video RestorationCode2
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language ModelsCode2
Oscillatory State-Space ModelsCode2
Refinement of Monocular Depth Maps via Multi-View Differentiable RenderingCode2
Mamba in Vision: A Comprehensive Survey of Techniques and ApplicationsCode2
Multi-Robot Motion Planning with Diffusion ModelsCode2
Dynamic Diffusion TransformerCode2
Exploring the Benefit of Activation Sparsity in Pre-trainingCode2
ToolGen: Unified Tool Retrieval and Calling via GenerationCode2
Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-ReviewCode2
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language ModelsCode2
Scaling Large Motion Models with Million-Level Human MotionsCode2
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language ModelsCode2
Steering Large Language Models between Code Execution and Textual ReasoningCode2
Autoregressive Action Sequence Learning for Robotic ManipulationCode2
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared TaskCode2
Generative Artificial Intelligence for Navigating Synthesizable Chemical SpaceCode2
GraphRouter: A Graph-based Router for LLM SelectionsCode2
AutoPenBench: Benchmarking Generative Agents for Penetration TestingCode2
Show:102550
← PrevPage 146 of 13232Next →