SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 68016850 of 661570 papers

TitleStatusHype
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow DataCode2
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAICode2
MovieBench: A Hierarchical Movie Level Dataset for Long Video GenerationCode2
Open-Vocabulary Online Semantic Mapping for SLAMCode2
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame SelectionCode2
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsCode2
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space DualityCode2
AnyText2: Visual Text Generation and Editing With Customizable AttributesCode2
Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled DataCode2
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language ModelsCode2
Natural Language Reinforcement LearningCode2
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation PerspectiveCode2
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language ModelsCode2
CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View GraphsCode2
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the WildCode2
FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use DialogsCode2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AICode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Find Any Part in 3DCode2
Practical Compact Deep Compressed SensingCode2
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous DrivingCode2
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image GenerationCode2
Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow NetworksCode2
Quantized symbolic time series approximationCode2
SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI SystemCode2
SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor SearchCode2
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic CorpusCode2
Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation GraphCode2
From Text to Pose to Image: Improving Diffusion Model Control and QualityCode2
CV-Cities: Advancing Cross-View Geo-Localization in Global CitiesCode2
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-ResolutionCode2
HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and ManipulationCode2
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous DrivingCode2
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown ExtractionCode2
Real-Time Fitness Exercise Classification and Counting from Video FramesCode2
AtomThink: A Slow Thinking Framework for Multimodal Mathematical ReasoningCode2
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
Syllabus: Portable Curricula for Reinforcement Learning AgentsCode2
Enhancing LLM Reasoning with Reward-guided Tree SearchCode2
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet VideosCode2
CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese CharactersCode2
DrivingSphere: Building a High-fidelity 4D World for Closed-loop SimulationCode2
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier EngineeringCode2
MC-LLaVA: Multi-Concept Personalized Vision-Language ModelCode2
Newclid: A User-Friendly Replacement for AlphaGeometryCode2
BianCang: A Traditional Chinese Medicine Large Language ModelCode2
StableV2V: Stablizing Shape Consistency in Video-to-Video EditingCode2
VeGaS: Video Gaussian SplattingCode2
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with TransformersCode2
RPN 2: On Interdependence Function Learning Towards Unifying and Advancing CNN, RNN, GNN, and TransformerCode2
Show:102550
← PrevPage 137 of 13232Next →