SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 32513300 of 659983 papers

TitleStatusHype
DeepInteraction++: Multi-Modality Interaction for Autonomous DrivingCode3
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use CapabilitiesCode3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality DataCode3
Compact 3D Gaussian Splatting for Static and Dynamic Radiance FieldsCode3
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling LawsCode3
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for MedicineCode3
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight DetectionCode3
Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2Code3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
multiGradICON: A Foundation Model for Multimodal Medical Image RegistrationCode3
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character NamesCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous DrivingCode3
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified ModelCode3
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
Beat this! Accurate beat tracking without DBN postprocessingCode3
Hyper-parameter tuning for text guided image editingCode3
ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic BudgetCode3
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language ModelsCode3
Comgra: A Tool for Analyzing and Debugging Neural NetworksCode3
Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object DetectionCode3
Diffusion Feedback Helps CLIP See BetterCode3
rLLM: Relational Table Learning with LLMsCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
Practical Video Object Detection via Feature Selection and AggregationCode3
Theia: Distilling Diverse Vision Foundation Models for Robot LearningCode3
OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at ScaleCode3
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding AgentsCode3
Keypoint Promptable Re-IdentificationCode3
Harnessing Temporal Causality for Advanced Temporal Action DetectionCode3
LION: Linear Group RNN for 3D Object Detection in Point CloudsCode3
EAFormer: Scene Text Segmentation with Edge-Aware TransformersCode3
Sentiment Reasoning for HealthcareCode3
HumanVid: Demystifying Training Data for Camera-controllable Human Image AnimationCode3
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion ModelCode3
3D Gaussian Splatting: Survey, Technologies, Challenges, and OpportunitiesCode3
AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic BenchmarkingCode3
Pareto Front Approximation for Multi-Objective Session-Based Recommender SystemsCode3
Reinforcement Learning Meets Visual OdometryCode3
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly DetectionCode3
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion ModelsCode3
Odyssey: Empowering Minecraft Agents with Open-World SkillsCode3
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language ModelsCode3
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological MeasurementsCode3
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSONCode3
LLMmap: Fingerprinting For Large Language ModelsCode3
vTensor: Flexible Virtual Tensor Management for Efficient LLM ServingCode3
Local All-Pair Correspondence for Point TrackingCode3
Compact Language Models via Pruning and Knowledge DistillationCode3
Show:102550
← PrevPage 66 of 13200Next →