SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 74517500 of 661570 papers

TitleStatusHype
Adaptation of Weakly Supervised Localization in Histopathology by Debiasing Predictions0
Unleashing Video Language Models for Fine-grained HRCT Report Generation0
Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback0
One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies0
CalliMaster: Mastering Page-level Chinese Calligraphy via Layout-guided Spatial Planning0
Generating Expressive and Customizable Evals for Timeseries Data Analysis Agents with AgentFuel0
Modal Logical Neural Networks for Financial AI0
RAW-Domain Degradation Models for Realistic Smartphone Super-Resolution0
EB-RANSAC: Random Sample Consensus based on Energy-Based Model0
Leveraging Phytolith Research using Artificial Intelligence0
Deep Learning-based Assessment of the Relation Between the Third Molar and Mandibular Canal on Panoramic Radiographs using Local, Centralized, and Federated Learning0
Orientability of Causal Relations in Time Series using Summary Causal Graphs and Faithful Distributions0
Trust Oriented Explainable AI for Fake News Detection0
Efficient Generative Modeling with Unitary Matrix Product States Using Riemannian Optimization0
Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators0
Deployment-Oriented Session-wise Meta-Calibration for Landmark-Based Webcam Gaze Tracking0
Semi-Synthetic Parallel Data for Translation Quality Estimation: A Case Study of Dataset Building for an Under-Resourced Language Pair0
Resource-Efficient Iterative LLM-Based NAS with Feedback Memory0
TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition0
TURA: Tool-Augmented Unified Retrieval Agent for AI Search0
Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models0
CHiL(L)Grader: Calibrated Human-in-the-Loop Short-Answer Grading0
CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks0
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse2
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models2
Chemical Reaction Networks Learn Better than Spiking Neural Networks0
Can LLM Aid in Solving Constraints with Inductive Definitions?0
JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas0
Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation0
The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection0
You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents0
Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets0
The Perfection Paradox: From Architect to Curator in AI-Assisted API Design0
UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization0
Personalized Federated Learning via Gaussian Generative Modeling0
The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance0
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation2
Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark AnalysisCode0
How Does Fourier Analysis Network Work? A Mechanism Analysis and a New Dual-Activation Layer Proposal0
Survival Meets Classification: A Novel Framework for Early Risk Prediction Models of Chronic Diseases0
Single-View Rolling-Shutter SfM0
Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition0
From Next Token Prediction to (STRIPS) World Models0
RefTr: Recurrent Refinement of Confluent Trajectories for 3D Vascular Tree Centerlines0
When Models Fabricate Credentials: Measuring How Professional Identity Suppresses Honest Self-Representation0
Deep Eigenspace Network for Parametric Non-self-adjoint Eigenvalue Problems0
Do LLMs Judge Distantly Supervised Named Entity Labels Well? Constructing the JudgeWEL Dataset0
SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition0
CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents0
Tiny Aya: Bridging Scale and Multilingual Depth0
Show:102550
← PrevPage 150 of 13232Next →