SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 34513500 of 659983 papers

TitleStatusHype
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback LearningCode3
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsCode3
GameBench: Evaluating Strategic Reasoning Abilities of LLM AgentsCode3
Probabilistic Weather Forecasting with Hierarchical Graph Neural NetworksCode3
CRAG -- Comprehensive RAG BenchmarkCode3
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language ModelsCode3
Multi-Head RAG: Solving Multi-Aspect Problems with LLMsCode3
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildCode3
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed TomographyCode3
Are We Done with MMLU?Code3
MLVU: Benchmarking Multi-task Long Video UnderstandingCode3
Physics3D: Learning Physical Properties of 3D Gaussians via Video DiffusionCode3
VideoTetris: Towards Compositional Text-to-Video GenerationCode3
Vision-LSTM: xLSTM as Generic Vision BackboneCode3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
Improving Alignment and Robustness with Circuit BreakersCode3
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference OptimizationCode3
FusionBench: A Comprehensive Benchmark of Deep Model FusionCode3
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language ModelsCode3
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image AnalysisCode3
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance SegmentationCode3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language ModelsCode3
Description Boosting for Zero-Shot Entity and Relation ClassificationCode3
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-MultinomialsCode3
DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion PriorsCode3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style ControlCode3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkCode3
Proxy Denoising for Source-Free Domain AdaptationCode3
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image GenerationCode3
Deciphering Oracle Bone Language with Diffusion ModelsCode3
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward ModelCode3
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object DetectionCode3
Automatic Instruction Evolving for Large Language ModelsCode3
HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking ScenariosCode3
Neural Network Verification with Branch-and-Bound for General NonlinearitiesCode3
MeshXL: Neural Coordinate Field for Generative 3D Foundation ModelsCode3
Scalable Bayesian Learning with posteriorsCode3
GNN-RAG: Graph Neural Retrieval for Large Language Model ReasoningCode3
MotionLLM: Understanding Human Behaviors from Human Motions and VideosCode3
CV-VAE: A Compatible Video VAE for Latent Generative Video ModelsCode3
MotionFollower: Editing Video Motion via Lightweight Score-Guided DiffusionCode3
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone GenerationCode3
Descriptive Image Quality Assessment in the WildCode3
HLOB -- Information Persistence and Structure in Limit Order BooksCode3
Understanding and Minimising Outlier Features in Neural Network TrainingCode3
Blind Image Restoration via Fast Diffusion InversionCode3
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward FeedbackCode3
Artificial Intelligence Index Report 2024Code3
Poseidon: Efficient Foundation Models for PDEsCode3
ORLM: A Customizable Framework in Training Large Models for Automated Optimization ModelingCode3
Show:102550
← PrevPage 70 of 13200Next →