SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1610116150 of 474278 papers

TitleStatusHype
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Crosslingual Reasoning through Test-Time ScalingCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
EquiHGNN: Scalable Rotationally Equivariant Hypergraph Neural NetworksCode1
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation StatisticsCode1
Physics-Assisted and Topology-Informed Deep Learning for Weather PredictionCode1
Augmented Deep Contexts for Spatially Embedded Video CodingCode1
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIPCode1
PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation modelsCode1
Griffin: Towards a Graph-Centric Relational Database Foundation ModelCode1
Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial ExplorationCode1
scDrugMap: Benchmarking Large Foundation Models for Drug Response PredictionCode1
The City that Never Settles: Simulation-based LiDAR Dataset for Long-Term Place Recognition Under Extreme Structural ChangesCode1
A Preliminary Study for GPT-4o on Image RestorationCode1
A Simple Detector with Frame Dynamics is a Strong TrackerCode1
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source LocalizationCode1
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion PriorCode1
Adaptive Markup Language Generation for Contextually-Grounded Visual Document UnderstandingCode1
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything ModelCode1
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text ClassificationCode1
Scalable Chain of Thoughts via Elastic ReasoningCode1
FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series ForecastingCode1
VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction TuningCode1
TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image EnhancementCode1
Componential Prompt-Knowledge Alignment for Domain Incremental LearningCode1
Histo-Miner: Deep Learning based Tissue Features Extraction Pipeline from H&E Whole Slide Images of Cutaneous Squamous Cell CarcinomaCode1
TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven EvolutionCode1
WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image DehazingCode1
RGB-Event Fusion with Self-Attention for Collision PredictionCode1
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with EventsCode1
Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff PerspectiveCode1
LLAMAPIE: Proactive In-Ear Conversation AssistantsCode1
Retrieval Augmented Time Series ForecastingCode1
Registration of 3D Point Sets Using Exponential-based Similarity MatrixCode1
Image Restoration via Multi-domain LearningCode1
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via α-β-DivergenceCode1
Nature's Insight: A Novel Framework and Comprehensive Analysis of Agentic Reasoning Through the Lens of NeuroscienceCode1
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised RewardsCode1
Vision Graph Prompting via Semantic Low-Rank DecompositionCode1
Benchmarking LLMs' Swarm intelligenceCode1
DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at OnceCode1
Object-Shot Enhanced Grounding Network for Egocentric VideoCode1
GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision ModelCode1
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser NetworksCode1
Learning-based Homothetic Tube MPCCode1
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from ScratchCode1
IndicSQuAD: A Comprehensive Multilingual Question Answering Dataset for Indic LanguagesCode1
OSUniverse: Benchmark for Multimodal GUI-navigation AI AgentsCode1
1^st Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction ChallengeCode1
Show:102550
← PrevPage 323 of 9486Next →