SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 33763400 of 661570 papers

TitleStatusHype
Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear ClassificationCode3
Adam-mini: Use Fewer Learning Rates To Gain MoreCode3
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and LocalizationCode3
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image AnalysisCode3
AudioBench: A Universal Benchmark for Audio Large Language ModelsCode3
Are Language Models Actually Useful for Time Series Forecasting?Code3
Taming 3DGS: High-Quality Radiance Fields with Limited ResourcesCode3
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion ModelsCode3
^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network PotentialsCode3
Visible-Thermal Tiny Object Detection: A Benchmark Dataset and BaselinesCode3
Consistency Models Made EasyCode3
LLM4CP: Adapting Large Language Models for Channel PredictionCode3
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual GenerationCode3
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language ModelsCode3
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM AgentsCode3
SpatialBot: Precise Spatial Understanding with Vision Language ModelsCode3
Detecting hallucinations in large language models using semantic entropyCode3
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model PromptsCode3
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language ModelsCode3
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?Code3
Evaluating representation learning on the protein structure universeCode3
DF40: Toward Next-Generation Deepfake DetectionCode3
TSI-Bench: Benchmarking Time Series ImputationCode3
VoCo-LLaMA: Towards Vision Compression with Large Language ModelsCode3
WebCanvas: Benchmarking Web Agents in Online EnvironmentsCode3
Show:102550
← PrevPage 136 of 26463Next →