SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 56015650 of 661570 papers

TitleStatusHype
Coercing LLMs to do and reveal (almost) anythingCode2
Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language ModelsCode2
Learning-Rate-Free Learning by D-AdaptationCode2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningCode2
X-Pose: Detecting Any KeypointsCode2
A Survey on Detection of LLMs-Generated ContentCode2
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series ForecastingCode2
COLMAP-Free 3D Gaussian SplattingCode2
Delicate Textured Mesh Recovery from NeRF via Adaptive Surface RefinementCode2
Super Monotonic Alignment SearchCode2
RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchCode2
M^3CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-ThoughtCode2
Hello Again! LLM-powered Personalized Agent for Long-term DialogueCode2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
Diffusion Models and Representation Learning: A SurveyCode2
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image PriorsCode2
XMainframe: A Large Language Model for Mainframe ModernizationCode2
Learning Generative Interactive Environments By Trained Agent ExplorationCode2
Learning Efficient and Effective Trajectories for Differential Equation-based Image RestorationCode2
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language ModelsCode2
A Comprehensive Guide to Explainable AI: From Classical Models to LLMsCode2
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image ClassificationCode2
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught ReasonersCode2
Sparse Autoencoders Learn Monosemantic Features in Vision-Language ModelsCode2
ImageDream: Image-Prompt Multi-view Diffusion for 3D GenerationCode2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
Saving 77% of the Parameters in Large Language Models Technical ReportCode2
Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point cloudsCode2
RARE: Retrieval-Augmented Reasoning ModelingCode2
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator DesignCode2
Data Science Education in Undergraduate Physics: Lessons Learned from a Community of PracticeCode2
Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language UnderstandingCode2
Adaptive Multi-Agent Reasoning via Automated Workflow GenerationCode2
JaxMARL: Multi-Agent RL Environments and Algorithms in JAXCode2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban EnvironmentsCode2
LightGNN: Simple Graph Neural Network for RecommendationCode2
Interactive and Explainable Region-guided Radiology Report GenerationCode2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential RecommendationCode2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion ModelsCode2
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1Code2
Scattertext: a Browser-Based Tool for Visualizing how Corpora DifferCode2
ScaleKD: Strong Vision Transformers Could Be Excellent TeachersCode2
CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image FormatsCode2
NeRF-RPN: A general framework for object detection in NeRFsCode2
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language ModelsCode2
Automatic Differentiation-based Full Waveform Inversion with Flexible WorkflowsCode2
AirMorph: Topology-Preserving Deep Learning for Pulmonary Airway AnalysisCode2
Attacks, Defenses and Evaluations for LLM Conversation Safety: A SurveyCode2
Show:102550
← PrevPage 113 of 13232Next →