SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 99269950 of 474278 papers

TitleStatusHype
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)Code2
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language ModelCode2
TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection ModelsCode2
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution AdaptationCode2
Linear Transformers with Learnable Kernel Functions are Better In-Context ModelsCode2
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image PersonalizationCode2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentCode2
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise AttentionCode2
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction SimulatorCode2
X-maps: Direct Depth Lookup for Event-based Structured Light SystemsCode2
PAL: Proxy-Guided Black-Box Attack on Large Language ModelsCode2
Detecting CSV File Dialects by Table Uniformity Measurement and Data Type InferenceCode2
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical DomainsCode2
A StrongREJECT for Empty JailbreaksCode2
Chain-of-Thought Reasoning Without PromptingCode2
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical FeedbackCode2
Recovering the Pre-Fine-Tuning Weights of Generative ModelsCode2
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inferenceCode2
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language ModelsCode2
LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning DatasetCode2
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven AgentsCode2
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture DetectionCode2
Less is More: Fewer Interpretable Region via Submodular Subset SelectionCode2
PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving EnvironmentsCode2
Extreme Video Compression with Pre-trained Diffusion ModelsCode2
Show:102550
← PrevPage 398 of 18972Next →