SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,984 papers248,105 code links4,818 tasks

Papers

Showing 34013450 of 659984 papers

TitleStatusHype
WebCanvas: Benchmarking Web Agents in Online EnvironmentsCode3
Refusal in Language Models Is Mediated by a Single DirectionCode3
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation ModelCode3
Unveiling Encoder-Free Vision-Language ModelsCode3
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and RefinementCode3
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language ModelsCode3
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive ReasoningCode3
An Imitative Reinforcement Learning Framework for Autonomous DogfightCode3
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented UnderstandingCode3
Quest: Query-Aware Sparsity for Efficient Long-Context LLM InferenceCode3
Step-level Value Preference Optimization for Mathematical ReasoningCode3
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsCode3
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding GraphCode3
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile MethodologyCode3
IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & LocalizationCode3
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous GraphsCode3
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement LearningCode3
CarLLaVA: Vision language models for camera-only closed-loop drivingCode3
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and EvaluationCode3
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning BenchmarksCode3
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language ModelsCode3
OmniTokenizer: A Joint Image-Video Tokenizer for Visual GenerationCode3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
RobustSAM: Segment Anything Robustly on Degraded ImagesCode3
Is Value Learning Really the Main Bottleneck in Offline RL?Code3
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image DeblurringCode3
Multimodal Table UnderstandingCode3
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with TextCode3
RVT-2: Learning Precise Manipulation from Few DemonstrationsCode3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video StreamsCode3
Enhancing End-to-End Autonomous Driving with Latent World ModelCode3
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective TasksCode3
Image and Video Tokenization with Binary Spherical QuantizationCode3
Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A SurveyCode3
An Image is Worth 32 Tokens for Reconstruction and GenerationCode3
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout GuidanceCode3
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-DistillationCode3
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and BenchmarkCode3
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View SynthesisCode3
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City GenerationCode3
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery AgentsCode3
GraphStorm: all-in-one graph machine learning framework for industry applicationsCode3
Merlin: A Vision Language Foundation Model for 3D Computed TomographyCode3
AutoSurvey: Large Language Models Can Automatically Write SurveysCode3
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech SeparationCode3
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and DereverberationCode3
Husky: A Unified, Open-Source Language Agent for Multi-Step ReasoningCode3
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback LearningCode3
Show:102550
← PrevPage 69 of 13200Next →