SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 19011950 of 177339 papers

TitleStatusHype
Aria Everyday Activities DatasetCode4
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement LearningCode4
Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVsCode4
A-MEM: Agentic Memory for LLM AgentsCode4
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language AnnotationsCode4
FILM: Frame Interpolation for Large MotionCode4
WorldVLA: Towards Autoregressive Action World ModelCode4
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh RenderingCode4
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal InteractionCode4
AnyGPT: Unified Multimodal LLM with Discrete Sequence ModelingCode4
Open Problems in Applied Deep LearningCode4
ReAct: Synergizing Reasoning and Acting in Language ModelsCode4
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future DirectionsCode4
Diffusion Models for Medical Image Analysis: A Comprehensive SurveyCode4
LLM Maybe LongLM: Self-Extend LLM Context Window Without TuningCode4
Kolmogorov-Arnold Convolutions: Design Principles and Empirical StudiesCode4
ChatGPT for Robotics: Design Principles and Model AbilitiesCode4
An Entropy-based Text Watermarking Detection MethodCode4
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation GenerationCode4
MINIMA: Modality Invariant Image MatchingCode4
SparseDrive: End-to-End Autonomous Driving via Sparse Scene RepresentationCode4
Tower: An Open Multilingual Large Language Model for Translation-Related TasksCode4
TrustLLM: Trustworthiness in Large Language ModelsCode4
Null-text Inversion for Editing Real Images using Guided Diffusion ModelsCode4
GriTS: Grid table similarity metric for table structure recognitionCode4
3D Scene Generation: A SurveyCode4
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning PerformanceCode4
AgentBench: Evaluating LLMs as AgentsCode4
Semantic-SAM: Segment and Recognize Anything at Any GranularityCode4
4D Gaussian Splatting for Real-Time Dynamic Scene RenderingCode4
InstanceDiffusion: Instance-level Control for Image GenerationCode4
Depth Any Video with Scalable Synthetic DataCode4
TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic DataCode4
Quality-aware Masked Diffusion Transformer for Enhanced Music GenerationCode4
LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D DetectionCode4
Simple and Effective Masked Diffusion Language ModelsCode4
Sample-Efficient Alignment for LLMsCode4
PVUW 2024 Challenge on Complex Video Understanding: Methods and ResultsCode4
SeeSR: Towards Semantics-Aware Real-World Image Super-ResolutionCode4
Sparse Tensor-based Point Cloud Attribute CompressionCode4
WavCraft: Audio Editing and Generation with Large Language ModelsCode4
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and SoundCode4
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality CollaborationCode4
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat DataCode4
Video Seal: Open and Efficient Video WatermarkingCode4
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh TokenizationCode4
TimeGPT-1Code4
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language ModelsCode4
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual ModelsCode4
Show:102550
← PrevPage 39 of 3547Next →