SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 16511700 of 659983 papers

TitleStatusHype
Tarsier: Recipes for Training and Evaluating Large Video Description ModelsCode4
YuLan: An Open-source Large Language ModelCode4
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksCode4
On Scaling Up 3D Gaussian Splatting TrainingCode4
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on EdgeCode4
Long Context Transfer from Language to VisionCode4
PVUW 2024 Challenge on Complex Video Understanding: Methods and ResultsCode4
RaTEScore: A Metric for Radiology Report GenerationCode4
Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournamentsCode4
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMsCode4
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex InstructionsCode4
Convolutional Kolmogorov-Arnold NetworksCode4
Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User FeedbackCode4
Nemotron-4 340B Technical ReportCode4
Graspness Discovery in Clutters for Fast and Accurate Grasp DetectionCode4
Diffusion Models in Low-Level Vision: A SurveyCode4
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion TokensCode4
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction TuningCode4
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific DiscoveryCode4
Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance CenterCode4
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMsCode4
Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual AnalysesCode4
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language AnnotationsCode4
HelpSteer2: Open-source dataset for training top-performing reward modelsCode4
One-Step Effective Diffusion Network for Real-World Image Super-ResolutionCode4
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with NothingCode4
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language ModelingCode4
AsyncDiff: Parallelizing Diffusion Models by Asynchronous DenoisingCode4
Simple and Effective Masked Diffusion Language ModelsCode4
PufferLib: Making Reinforcement Learning Libraries and Environments Play NiceCode4
Mamba YOLO: A Simple Baseline for Object Detection with State Space ModelCode4
MotionClone: Training-Free Motion Cloning for Controllable Video GenerationCode4
The CLRS-Text Algorithmic Reasoning Language BenchmarkCode4
Lean Workbook: A large-scale Lean problem set formalized from natural language math problemsCode4
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree SearchCode4
Nomic Embed Vision: Expanding the Latent SpaceCode4
Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous DrivingCode4
AgentGym: Evolving Large Language Model-based Agents across Diverse EnvironmentsCode4
Scaling and evaluating sparse autoencodersCode4
DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR ImagesCode4
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language ModelsCode4
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image GenerationCode4
Guiding a Diffusion Model with a Bad Version of ItselfCode4
RaDe-GS: Rasterizing Depth in Gaussian SplattingCode4
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image AnimationCode4
Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual OdometryCode4
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language ModelsCode4
COS-Mix: Cosine Similarity and Distance Fusion for Improved Information RetrievalCode4
End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave ModelCode4
R^2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic ReconstructionCode4
Show:102550
← PrevPage 34 of 13200Next →