SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1965119700 of 474278 papers

TitleStatusHype
MetaKG: Meta-learning on Knowledge Graph for Cold-start RecommendationCode1
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM CompressionCode1
Evaluating GPT-4's Vision Capabilities on Brazilian University Admission ExamsCode1
GenTKG: Generative Forecasting on Temporal Knowledge Graph with Large Language ModelsCode1
MonoUNI: A Unified Vehicle and Infrastructure-side Monocular 3D Object Detection Network with Sufficient Depth CluesCode1
Deconstructing the Inductive Biases of Hamiltonian Neural NetworksCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
Classifying Sequences of Extreme Length with Constant Memory Applied to Malware DetectionCode1
CoProNN: Concept-based Prototypical Nearest Neighbors for Explaining Vision ModelsCode1
Stable and Safe Human-aligned Reinforcement Learning through Neural Ordinary Differential EquationsCode1
Counterfactual Token Generation in Large Language ModelsCode1
Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference SpeedCode1
Neuron Activation Coverage: Rethinking Out-of-distribution Detection and GeneralizationCode1
DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous DrivingCode1
GLObal Building heights for Urban Studies (UT-GLOBUS) for city- and street- scale urban simulations: Development and first applicationsCode1
Compositional Exemplars for In-context LearningCode1
Decentralized Social Navigation with Non-Cooperative Robots via Bi-Level OptimizationCode1
ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflowsCode1
The Sound of Water: Inferring Physical Properties from Pouring LiquidsCode1
Progressive End-to-End Object Detection in Crowded ScenesCode1
Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker DetectionCode1
eclingo: A solver for Epistemic Logic ProgramsCode1
SimROD: A Simple Baseline for Raw Object Detection with Global and Local EnhancementsCode1
PandaSkill -- Player Performance and Skill Rating in Esports: Application to League of LegendsCode1
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft LabelsCode1
Universal NER: A Gold-Standard Multilingual Named Entity Recognition BenchmarkCode1
Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question AnsweringCode1
When to Learn What: Model-Adaptive Data Augmentation CurriculumCode1
Robust Object Detection in Remote Sensing Imagery with Noisy and Sparse Geo-Annotations (Full Version)Code1
Hyperbolic Random ForestsCode1
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data AnalysisCode1
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model RecommendationCode1
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
Complementary Pseudo Multimodal Feature for Point Cloud Anomaly DetectionCode1
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive BiasCode1
A Bi-directional Transformer for Musical Chord RecognitionCode1
SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent CollaborationCode1
Fingerspelling recognition in the wild with iterative visual attentionCode1
An Unsupervised Framework for Comparing Graph EmbeddingsCode1
Generating Diverse High-Fidelity Images with VQ-VAE-2Code1
U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?Code1
Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud LearningCode1
MCF: Mutual Correction Framework for Semi-Supervised Medical Image SegmentationCode1
Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo MethodsCode1
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-ContextCode1
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual LearningCode1
Multi-label Node Classification On Graph-Structured DataCode1
WDC Products: A Multi-Dimensional Entity Matching BenchmarkCode1
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and QuantizationCode1
Learning Neural Volumetric Field for Point Cloud Geometry CompressionCode1
Show:102550
← PrevPage 394 of 9486Next →