SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 82018250 of 661570 papers

TitleStatusHype
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation LearningCode2
FIRST: Faster Improved Listwise Reranking with Single Token DecodingCode2
RouteFinder: Towards Foundation Models for Vehicle Routing ProblemsCode2
SelfReg-UNet: Self-Regularized UNet for Medical Image SegmentationCode2
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language ModelsCode2
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian GenerationCode2
MoA: Mixture of Sparse Attention for Automatic Large Language Model CompressionCode2
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A BenchmarkCode2
LeYOLO, New Scalable and Efficient CNN Architecture for Object DetectionCode2
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based FrameworkCode2
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path PlanningCode2
CodeRAG-Bench: Can Retrieval Augment Code Generation?Code2
Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion RecognitionCode2
CityNav: Language-Goal Aerial Navigation Dataset with Geographic InformationCode2
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary AlgorithmsCode2
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?Code2
How far are today's time-series models from real-world weather forecasting applications?Code2
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency TradingCode2
TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language modelsCode2
Asynchronous Large Language Model Enhanced Planner for Autonomous DrivingCode2
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary StudyCode2
Adaptable Logical Control for Large Language ModelsCode2
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsCode2
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real WorldCode2
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging casesCode2
GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold NetworksCode2
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentationsCode2
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized RationalesCode2
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU TasksCode2
WATT: Weight Average Test-Time Adaptation of CLIPCode2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic ImagesCode2
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone DesignCode2
Dissecting Adversarial Robustness of Multimodal LM AgentsCode2
Can Go AIs be adversarially robust?Code2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
Immiscible Diffusion: Accelerating Diffusion Training with Noise AssignmentCode2
Universal Score-based Speech Enhancement with High Content PreservationCode2
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for EnsemblingCode2
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token SynchronizationCode2
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMCode2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AICode2
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local AttentionCode2
Automated MRI Quality Assessment of Brain T1-weighted MRI in Clinical Data Warehouses: A Transfer Learning Approach Relying on Artefact SimulationCode2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation ModelsCode2
Coding Speech through Vocal Tract KinematicsCode2
AgentReview: Exploring Peer Review Dynamics with LLM AgentsCode2
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and ReactionCode2
AEM: Attention Entropy Maximization for Multiple Instance Learning based Whole Slide Image ClassificationCode2
ChangeViT: Unleashing Plain Vision Transformers for Change DetectionCode2
TroL: Traversal of Layers for Large Language and Vision ModelsCode2
Show:102550
← PrevPage 165 of 13232Next →