SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 81018150 of 661570 papers

TitleStatusHype
Learning Formal Mathematics From Intrinsic MotivationCode2
Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random FeaturesCode2
Diffusion Models and Representation Learning: A SurveyCode2
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image GenerationCode2
Teola: Towards End-to-End Optimization of LLM-based ApplicationsCode2
PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation TasksCode2
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization ProblemsCode2
Diving Deeper Into Pedestrian Behavior Understanding: Intention Estimation, Action Prediction, and Event Risk AssessmentCode2
Efficient Large Multi-modal Models via Visual Context CompressionCode2
Text2Robot: Evolutionary Robot Design from Text DescriptionsCode2
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based AgentsCode2
Multimodal Prototyping for cancer survival predictionCode2
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache ManagementCode2
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent CollaborationCode2
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMsCode2
Odd-One-Out: Anomaly Detection by Comparing with NeighborsCode2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
Efficient World Models with Context-Aware TokenizationCode2
T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient EmbeddingsCode2
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time VariabilityCode2
RoboUniView: Visual-Language Model with Unified View Representation for Robotic ManipulationCode2
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering AnalysisCode2
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human InteractionsCode2
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image GenerationCode2
Taming Data and Transformers for Audio GenerationCode2
Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple InteractionCode2
Chat AI: A Seamless Slurm-Native Solution for HPC-Based ServicesCode2
On Discrete Prompt Optimization for Diffusion ModelsCode2
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangementCode2
GenRL: Multimodal-foundation world models for generalization in embodied agentsCode2
A Closer Look into Mixture-of-Experts in Large Language ModelsCode2
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented GenerationCode2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language ModelsCode2
RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNetsCode2
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMsCode2
MatchTime: Towards Automatic Soccer Game Commentary GenerationCode2
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language ModelsCode2
A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four StemsCode2
Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse ProcessCode2
EgoVideo: Exploring Egocentric Foundation Model and Downstream AdaptationCode2
EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion RecognitionCode2
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context InferenceCode2
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMsCode2
Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular VideosCode2
SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing ImageryCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure GuidanceCode2
KAGNNs: Kolmogorov-Arnold Networks meet Graph LearningCode2
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language ModelsCode2
Denoising as Adaptation: Noise-Space Domain Adaptation for Image RestorationCode2
Show:102550
← PrevPage 163 of 13232Next →