SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1620116250 of 474278 papers

TitleStatusHype
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video UnderstandingCode1
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentCode1
SpectrumFM: A Foundation Model for Intelligent Spectrum ManagementCode1
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature ConfusionCode1
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in ActionCode1
SymPlanner: Deliberate Planning in Language Models with Symbolic RepresentationCode1
Differentiable Nonlinear Model Predictive ControlCode1
OET: Optimization-based prompt injection Evaluation ToolkitCode1
Adapting Precomputed Features for Efficient Graph CondensationCode1
Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing ModalitiesCode1
Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion PredictionCode1
TeLoGraF: Temporal Logic Planning via Graph-encoded Flow MatchingCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Fast and Low-Cost Genomic Foundation Models via Outlier RemovalCode1
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM SelectionCode1
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice RoutingCode1
Visual Test-time Scaling for GUI Agent GroundingCode1
Towards Scalable Human-aligned Benchmark for Text-guided Image EditingCode1
Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated RepresentationsCode1
DeepCritic: Deliberate Critique with Large Language ModelsCode1
Pinching-Antenna Systems (PASS): Power Radiation Model and Optimal Beamforming DesignCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule GenerationCode1
Real Time Semantic Segmentation of High Resolution Automotive LiDAR ScansCode1
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image InterpretationCode1
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context LearningCode1
Is Intermediate Fusion All You Need for UAV-based Collaborative Perception?Code1
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model FrameworkCode1
A Survey on 3D Reconstruction Techniques in Plant Phenotyping: From Classical Methods to Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and BeyondCode1
Recursive KL Divergence Optimization: A Dynamic Framework for Representation LearningCode1
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object PerceptionCode1
Towards Understanding the Nature of Attention with Low-Rank Sparse DecompositionCode1
Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative DecodingCode1
DRO: Doppler-Aware Direct Radar OdometryCode1
OG-HFYOLO :Orientation gradient guidance and heterogeneous feature fusion for deformation table cell instance segmentationCode1
ClusterLOB: Enhancing Trading Strategies by Clustering Orders in Limit Order BooksCode1
End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset EvaluationCode1
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social NetworksCode1
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step VerificationCode1
EchoNet-Quality: Denoising Echocardiograms via Deep Generative Modeling of Ultrasound NoiseCode1
Automatic Legal Writing Evaluation of LLMsCode1
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM SecurityCode1
OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System VerificationCode1
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant PhenotypingCode1
PhyloProfile v2 -- Exploring multi-layered phylogenetic profiles at scaleCode1
Mesh-Learner: Texturing Mesh with Spherical HarmonicsCode1
Taming the Titans: A Survey of Efficient LLM Inference ServingCode1
TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question AnsweringCode1
UNet with Axial Transformer : A Neural Weather Model for Precipitation NowcastingCode1
DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic predictionCode1
Show:102550
← PrevPage 325 of 9486Next →