SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1650116550 of 474278 papers

TitleStatusHype
HyperCore: The Core Framework for Building Hyperbolic Foundation Models with Comprehensive ModulesCode1
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data SynthesisCode1
ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive LearningCode1
PhaseGen: A Diffusion-Based Approach for Complex-Valued MRI Data GenerationCode1
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-MixingCode1
Zero-Shot Cross-Domain Code Search without Fine-TuningCode1
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative MethodCode1
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
Harnessing Equivariance: Modeling Turbulence with Graph Neural NetworksCode1
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable MetricCode1
Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural NetworksCode1
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference ServingCode1
The KL3M Data Project: Copyright-Clean Training Resources for Large Language ModelsCode1
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMsCode1
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and RobustnessCode1
CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color ConstancyCode1
MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval-Augmented GenerationCode1
Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography DatabasesCode1
PIDSR: Complementary Polarized Image Demosaicing and Super-ResolutionCode1
Diffusion Transformers for Tabular Data Time Series GenerationCode1
LAPIS: A novel dataset for personalized image aesthetic assessmentCode1
ID-Booth: Identity-consistent Face Generation with Diffusion ModelsCode1
STeP: A General and Scalable Framework for Solving Video Inverse Problems with Spatiotemporal Diffusion PriorsCode1
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time ComputationCode1
Echo Chamber: RL Post-training Amplifies Behaviors Learned in PretrainingCode1
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight DiscoveryCode1
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for CompressionCode1
SVG-IR: Spatially-Varying Gaussian Splatting for Inverse RenderingCode1
Inducing Programmatic Skills for Agentic TasksCode1
Uni-PrevPredMap: Extending PrevPredMap to a Unified Framework of Prior-Informed Modeling for Online Vectorized HD Map ConstructionCode1
Diffusion Factor Models: Generating High-Dimensional Returns with Factor StructureCode1
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-ResolutionCode1
Evolutionary Generation of Random Surreal Numbers for BenchmarkingCode1
Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement LearningCode1
Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory PerceptionCode1
GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered ScenesCode1
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?Code1
Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian SplattingCode1
Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong GeneralizationCode1
A Digital Twin of an Electrical Distribution Grid: SoCal 28-Bus DatasetCode1
A Unified Agentic Framework for Evaluating Conditional Image GenerationCode1
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene UnderstandingCode1
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual GenerationCode1
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual LearningCode1
Are We Done with Object-Centric Learning?Code1
FamilyTool: A Multi-hop Personalized Tool Use BenchmarkCode1
CAFE-AD: Cross-Scenario Adaptive Feature Enhancement for Trajectory Planning in Autonomous DrivingCode1
MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus DataCode1
Leanabell-Prover: Posttraining Scaling in Formal ReasoningCode1
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive JailbreakingCode1
Show:102550
← PrevPage 331 of 9486Next →