SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1705117100 of 474278 papers

TitleStatusHype
Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding0
Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model ReliabilityCode0
Addressing Correlated Latent Exogenous Variables in Debiased Recommender SystemsCode0
Improving Memory Efficiency for Training KANs via Meta LearningCode0
Play to Generalize: Learning to Reason Through Game PlayCode2
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific DomainsCode0
APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs0
CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray0
Coordinating Search-Informed Reasoning and Reasoning-Guided Search in Claim Verification0
ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning0
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language ModelsCode0
From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm NetworksCode0
Rethinking Crowd-Sourced Evaluation of Neuron ExplanationsCode0
Dataset combining EEG, eye-tracking, and high-speed video for ocular activity analysis across BCI paradigmsCode0
Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot BenchmarkCode0
R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation0
ProARD: progressive adversarial robustness distillation: provide wide range of robust studentsCode0
MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification0
The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well0
Learning Speaker-Invariant Visual Features for Lipreading0
Synesthesia of Machines (SoM)-Aided Online FDD Precoding via Heterogeneous Multi-Modal Sensing: A Vertical Federated Learning Approach0
PhysiInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation0
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality0
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking0
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial IntelligenceCode1
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic ScenesCode2
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement LearningCode1
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State SpacesCode1
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point CloudsCode1
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative ModelingCode2
HuSc3D: Human Sculpture dataset for 3D object reconstructionCode0
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic DatasetsCode1
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash EquilibriumCode1
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving0
AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists0
Nearness of Neighbors Attention for Regression in Supervised FinetuningCode0
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence ReasoningCode0
A Real-time 3D Desktop DisplayCode0
Parameter-free approximate equivariance for tasks with finite group symmetryCode0
ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity0
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data ImputationCode0
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image GenerationCode2
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework0
Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations0
A System for Accurate Tracking and Video Recordings of Rodent Eye Movements using Convolutional Neural Networks for Biomedical Image Segmentation0
CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction0
Show:102550
← PrevPage 342 of 9486Next →