SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 30513100 of 659983 papers

TitleStatusHype
DPLM-2: A Multimodal Diffusion Protein Language ModelCode3
Automatically Interpreting Millions of Features in Large Language ModelsCode3
Movie Gen: A Cast of Media Foundation ModelsCode3
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language ModelsCode3
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and AudioCode3
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language ModelsCode3
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical PerceptionCode3
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image GenerationCode3
PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic ThinkingCode3
Learning Smooth Humanoid Locomotion through Lipschitz-Constrained PoliciesCode3
Latent Action Pretraining from VideosCode3
GIFT-Eval: A Benchmark For General Time Series Forecasting Model EvaluationCode3
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive MemoryCode3
Predicting from Strings: Language Model Embeddings for Bayesian OptimizationCode3
LoLCATs: On Low-Rank Linearizing of Large Language ModelsCode3
UniMatch V2: Pushing the Limit of Semi-Supervised Semantic SegmentationCode3
Large-Scale 3D Medical Image Pre-training with Geometric Context PriorsCode3
FlatQuant: Flatness Matters for LLM QuantizationCode3
MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly DetectionCode3
C-Adapter: Adapting Deep Classifiers for Efficient Conformal Prediction SetsCode3
CtrLoRA: An Extensible and Efficient Framework for Controllable Image GenerationCode3
SceneCraft: Layout-Guided 3D Scene GenerationCode3
Baichuan-Omni Technical ReportCode3
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud LearningCode3
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image SynthesisCode3
Towards Next-Generation LLM-based Recommender Systems: A Survey and BeyondCode3
Fast Feedforward 3D Gaussian Splatting CompressionCode3
Rectified Diffusion: Straightness Is Not Your Need in Rectified FlowCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
TopoTune : A Framework for Generalized Combinatorial Complex Neural NetworksCode3
Rethinking the Evaluation of Visible and Infrared Image FusionCode3
AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image GenerationCode3
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance DesignCode3
AgentSquare: Automatic LLM Agent Search in Modular Design SpaceCode3
Residual Kolmogorov-Arnold Network for Enhanced Deep LearningCode3
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsCode3
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model InferenceCode3
High-Speed Stereo Visual SLAM for Low-Powered Computing DevicesCode3
Accelerating Diffusion Transformers with Token-wise Feature CachingCode3
Neuron-Level Sequential Editing for Large Language ModelsCode3
MELODI: Exploring Memory Compression for Long ContextsCode3
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character controlCode3
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model TransformationCode3
AlphaEdit: Null-Space Constrained Knowledge Editing for Language ModelsCode3
HELMET: How to Evaluate Long-Context Language Models Effectively and ThoroughlyCode3
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMsCode3
RepoGraph: Enhancing AI Software Engineering with Repository-level Code GraphCode3
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation ModelsCode3
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1Code3
ControlAR: Controllable Image Generation with Autoregressive ModelsCode3
Show:102550
← PrevPage 62 of 13200Next →