SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 751800 of 177339 papers

TitleStatusHype
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic SystemsCode5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at ScaleCode5
SpeechGPT-Gen: Scaling Chain-of-Information Speech GenerationCode5
MMBench: Is Your Multi-modal Model an All-around Player?Code5
TAPVid-3D: A Benchmark for Tracking Any Point in 3DCode5
Retrieval-Augmented Generation for AI-Generated Content: A SurveyCode5
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec modelsCode5
Improved Distribution Matching Distillation for Fast Image SynthesisCode5
Large Language Model based Multi-Agents: A Survey of Progress and ChallengesCode5
Autoregressive Model Beats Diffusion: Llama for Scalable Image GenerationCode5
Mora: Enabling Generalist Video Generation via A Multi-Agent FrameworkCode5
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge AdaptationCode5
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsCode5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
Diffusion for World Modeling: Visual Details Matter in AtariCode5
Flashlight: Enabling Innovation in Tools for Machine LearningCode5
Astraios: Parameter-Efficient Instruction Tuning Code Large Language ModelsCode5
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You ThinkCode5
BootsTAP: Bootstrapped Training for Tracking-Any-PointCode5
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and DatasetCode5
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual InversionCode5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video GenerationCode5
OffsetBias: Leveraging Debiased Data for Tuning EvaluatorsCode5
Meta-World+: An Improved, Standardized, RL BenchmarkCode5
MONAI: An open-source framework for deep learning in healthcareCode5
Secrets of RLHF in Large Language Models Part II: Reward ModelingCode5
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch DiffusionCode5
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble ScorersCode5
WeNet 2.0: More Productive End-to-End Speech Recognition ToolkitCode5
WebVoyager: Building an End-to-End Web Agent with Large Multimodal ModelsCode5
MLE-bench: Evaluating Machine Learning Agents on Machine Learning EngineeringCode5
Free Process Rewards without Process LabelsCode5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
Executable Code Actions Elicit Better LLM AgentsCode5
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music GenerationCode5
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth EstimationCode5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric DepthCode5
Continuous Thought MachinesCode5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsCode5
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to PosttrainingCode5
Efficient Streaming Language Models with Attention SinksCode5
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMsCode5
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
Sequencer: Deep LSTM for Image ClassificationCode5
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data RefinementCode5
Darwin Godel Machine: Open-Ended Evolution of Self-Improving AgentsCode5
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent CollaborationCode5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation ModelsCode5
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language ModelsCode5
Matrix-Game: Interactive World Foundation ModelCode5
Show:102550
← PrevPage 16 of 3547Next →