SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1290112950 of 474278 papers

TitleStatusHype
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language ModelsCode2
DsDm: Model-Aware Dataset Selection with DatamodelsCode2
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNetCode2
Benchmarking Laparoscopic Surgical Image Restoration and BeyondCode2
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D UnderstandingCode2
Monocular, One-stage, Regression of Multiple 3D PeopleCode2
Giraffe: Adventures in Expanding Context Lengths in LLMsCode2
Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic NetworksCode2
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual RecognitionCode2
What is a Goldilocks Face Verification Test Set?Code2
DiffArtist: Towards Structure and Appearance Controllable Image StylizationCode2
Structure-Aligned Protein Language ModelCode2
Detecting music deepfakes is easy but actually hardCode2
Denoising Diffusion Bridge ModelsCode2
Test-time Alignment of Diffusion Models without Reward Over-optimizationCode2
Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation ModelCode2
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLCode2
Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End SystemCode2
ReservoirComputing.jl: An Efficient and Modular Library for Reservoir Computing ModelsCode2
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention NetworksCode2
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor DatasetCode2
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Large Language Models on Graphs: A Comprehensive SurveyCode2
SegEarth-R1: Geospatial Pixel Reasoning via Large Language ModelCode2
Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical RobotCode2
Towards Better Dynamic Graph Learning: New Architecture and Unified LibraryCode2
City3D: Large-Scale Building Reconstruction from Airborne LiDAR Point CloudsCode2
Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image GenerationCode2
GenRL: Multimodal-foundation world models for generalization in embodied agentsCode2
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMsCode2
Towards Lightweight Super-Resolution with Dual Regression LearningCode2
Scale Decoupled DistillationCode2
MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable AutoencodersCode2
Explicit Visual Prompting for Low-Level Structure SegmentationsCode2
Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and RobustCode2
Omni Aggregation Networks for Lightweight Image Super-ResolutionCode2
You Only Look at Once for Real-time and Generic Multi-TaskCode2
Domino: Discovering Systematic Errors with Cross-Modal EmbeddingsCode2
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-TransformCode2
InterFusion: Text-Driven Generation of 3D Human-Object InteractionCode2
SystolicAttention: Fusing FlashAttention within a Single Systolic ArrayCode2
TAB: Unified Benchmarking of Time Series Anomaly Detection MethodsCode2
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal ShiftsCode2
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language UnderstandingCode2
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic ManipulationCode2
Exploring Diffusion Transformer Designs via GraftingCode2
LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQLCode2
Wildfire Smoke Detection with Computer VisionCode2
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLMCode2
Process Reward Model with Q-Value RankingsCode2
Show:102550
← PrevPage 259 of 9486Next →