SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 41514200 of 661570 papers

TitleStatusHype
Putting the Object Back into Video Object SegmentationCode3
AgentTuning: Enabling Generalized Agent Abilities for LLMsCode3
Take the aTrain. Introducing an Interface for the Accessible Transcription of InterviewsCode3
Llemma: An Open Language Model For MathematicsCode3
MotionDirector: Motion Customization of Text-to-Video Diffusion ModelsCode3
Lag-Llama: Towards Foundation Models for Probabilistic Time Series ForecastingCode3
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving ResearchCode3
NoMaD: Goal Masked Diffusion Policies for Navigation and ExplorationCode3
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous DrivingCode3
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative AgentsCode3
Text Embeddings Reveal (Almost) As Much As TextCode3
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity AnalysisCode3
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data CompositionCode3
Evaluating Hallucinations in Chinese Large Language ModelsCode3
T^3Bench: Benchmarking Current Progress in Text-to-3D GenerationCode3
MagicDrive: Street View Generation with Diverse 3D Geometry ControlCode3
Conceptual Framework for Autonomous Cognitive EntitiesCode3
OceanGPT: A Large Language Model for Ocean Science TasksCode3
UltraFeedback: Boosting Language Models with Scaled AI FeedbackCode3
AutoAgents: A Framework for Automatic Agent GenerationCode3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Data Filtering NetworksCode3
SMPLer-X: Scaling Up Expressive Human Pose and Shape EstimationCode3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene ReconstructionCode3
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker RecognitionCode3
Impact of architecture on robustness and interpretability of multispectral deep neural networksCode3
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter ModelCode3
FreeU: Free Lunch in Diffusion U-NetCode3
SlimPajama-DC: Understanding Data Combinations for LLM TrainingCode3
Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer LearningCode3
Multimodal Foundation Models: From Specialists to General-Purpose AssistantsCode3
Sparse Autoencoders Find Highly Interpretable Features in Language ModelsCode3
AudioSR: Versatile Audio Super-resolution at ScaleCode3
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image GenerationCode3
HAT: Hybrid Attention Transformer for Image RestorationCode3
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer DetectionCode3
Tracking Anything with Decoupled Video SegmentationCode3
Matcha-TTS: A fast TTS architecture with conditional flow matchingCode3
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited ResourcesCode3
Generative Data Augmentation using LLMs improves Distributional Robustness in Question AnsweringCode3
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language ModelsCode3
SAM-Med2DCode3
Emergence of Segmentation with Minimalistic White-Box TransformersCode3
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language ModelsCode3
LongBench: A Bilingual, Multitask Benchmark for Long Context UnderstandingCode3
VideoCutLER: Surprisingly Simple Unsupervised Video Instance SegmentationCode3
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictionsCode3
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized StylizationCode3
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary DetectionCode3
Show:102550
← PrevPage 84 of 13232Next →