SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1490114950 of 474278 papers

TitleStatusHype
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence0
MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering0
MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion0
PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps0
Context-Aware Deep Lagrangian Networks for Model Predictive Control0
Probabilistic Trajectory GOSPA: A Metric for Uncertainty-Aware Multi-Object Tracking Performance Evaluation0
Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models0
Factorized RVQ-GAN For Disentangled Speech Tokenization0
Uncovering Intention through LLM-Driven Code Snippet Description Generation0
Code Rate Optimization via Neural Polar Decoders0
One-shot Face Sketch Synthesis in the Wild via Generative Diffusion Prior and Instruction TuningCode0
ABC: Adaptive BayesNet Structure Learning for Computational Scalable Multi-task Image CompressionCode0
Fair Contracts in Principal-Agent Games with Heterogeneous Types0
MAARTA:Multi-Agentic Adaptive Radiology Teaching Assistant0
Centroid Approximation for Byzantine-Tolerant Federated Learning0
RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World EnvironmentsCode0
All is Not Lost: LLM Recovery without CheckpointsCode1
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
Evaluation Pipeline for systematically searching for Anomaly Detection Systems0
Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study0
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach0
Semantic and Feature Guided Uncertainty Quantification of Visual Localization for Autonomous Vehicles0
deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses0
PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning0
Transit for All: Mapping Equitable Bike2Subway Connection using Region Representation Learning0
Mapping Caregiver Needs to AI Chatbot Design: Strengths and Gaps in Mental Health Support for Alzheimer's and Dementia Caregivers0
Accessible Gesture-Driven Augmented Reality Interaction System0
An Empirical Study of Bugs in Data Visualization Libraries0
Steering Your Diffusion Policy with Latent Space Reinforcement Learning0
Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos0
RaCalNet: Radar Calibration Network for Sparse-Supervised Metric Depth Estimation0
Model Predictive Path-Following Control for a Quadrotor0
MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System0
Correspondence-Free Multiview Point Cloud Registration via Depth-Guided Joint Optimisation0
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models0
An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW0
PNCS:Power-Norm Cosine Similarity for Diverse Client Selection in Federated Learning0
Veracity: An Open-Source AI Fact-Checking System0
In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory0
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax TreeCode2
I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution0
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation0
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding0
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuningCode0
Show-o2: Improved Native Unified Multimodal ModelsCode5
Mix-of-Language-Experts Architecture for Multilingual ProgrammingCode0
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization ChallengesCode2
Finance Language Model Evaluation (FLaME)0
Show:102550
← PrevPage 299 of 9486Next →