SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2005120100 of 474278 papers

TitleStatusHype
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment OptimizationCode1
SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific DocumentsCode1
Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine TranslationCode1
Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language ModelsCode1
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic ConsistencyCode1
BLAPose: Enhancing 3D Human Pose Estimation with Bone Length AdjustmentCode1
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of HeuristicsCode1
Neuro-symbolic Learning Yielding Logical ConstraintsCode1
LLMCBench: Benchmarking Large Language Model Compression for Efficient DeploymentCode1
Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion ModelsCode1
Toward Conditional Distribution Calibration in Survival PredictionCode1
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud AnalysisCode1
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA OptimizationCode1
FoldMark: Protecting Protein Generative Models with WatermarkingCode1
CloudCast -- Total Cloud Cover Nowcasting with Machine LearningCode1
ProtSCAPE: Mapping the landscape of protein conformations in molecular dynamicsCode1
A Cosmic-Scale Benchmark for Symmetry-Preserving Data ProcessingCode1
UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image RegistrationCode1
Unlocking Comics: The AI4VA Dataset for Visual UnderstandingCode1
Depth Attention for Robust RGB TrackingCode1
Automatic Estimation of Singing Voice Musical DynamicsCode1
Referring Human Pose and Mask Estimation in the WildCode1
Symbotunes: unified hub for symbolic music generative modelsCode1
MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic musicCode1
FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model FusionCode1
NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object TrackingCode1
SPICEPilot: Navigating SPICE Code Generation and Simulation with AI GuidanceCode1
Vector Quantization Prompting for Continual LearningCode1
Sebica: Lightweight Spatial and Efficient Bidirectional Channel Attention Super Resolution NetworkCode1
TrajAgent: An Agent Framework for Unified Trajectory ModellingCode1
Agentic Feedback Loop Modeling Improves Recommendation and User SimulationCode1
LLMs Can Evolve Continually on Modality for X-Modal ReasoningCode1
ISDNN: A Deep Neural Network for Channel Estimation in Massive MIMO systemsCode1
Securing Healthcare with Deep Learning: A CNN-Based Model for medical IoT Threat DetectionCode1
Model Equality Testing: Which Model Is This API Serving?Code1
Transferable Adversarial Attacks on SAM and Its Downstream ModelsCode1
MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image GenerationCode1
FedSSP: Federated Graph Learning with Spectral Knowledge and Personalized PreferenceCode1
AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language ModelsCode1
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge RetrieversCode1
DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality ReductionCode1
Multi-view biomedical foundation models for molecule-target and property predictionCode1
GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote SensingCode1
Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic SegmentationCode1
Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategyCode1
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive ScenariosCode1
Unified Cross-Modal Image Synthesis with Hierarchical Mixture of Product-of-ExpertsCode1
Enhancing Battery Storage Energy Arbitrage with Deep Reinforcement Learning and Time-Series ForecastingCode1
Offline Reinforcement Learning with OOD State Correction and OOD Action SuppressionCode1
Context-Based Visual-Language Place RecognitionCode1
Show:102550
← PrevPage 402 of 9486Next →