SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1925119300 of 474278 papers

TitleStatusHype
Reversible molecular simulation for training classical and machine learning force fieldsCode1
MISR: Measuring Instrumental Self-Reasoning in Frontier ModelsCode1
Graph Neural Networks Need Cluster-Normalize-Activate ModulesCode1
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation LearningCode1
M2PDE: Compositional Generative Multiphysics and Multi-component PDE SimulationCode1
Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language ModelsCode1
MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structure-Enhanced Language ModelCode1
WinTSR: A Windowed Temporal Saliency Rescaling Method for Interpreting Time Series Deep Learning ModelsCode1
Retrieval-Augmented Machine Translation with Unstructured KnowledgeCode1
Bench-CoE: a Framework for Collaboration of Experts from BenchmarkCode1
Dual-Branch Subpixel-Guided Network for Hyperspectral Image ClassificationCode1
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program RepairCode1
TransAdapter: Vision Transformer for Feature-Centric Unsupervised Domain AdaptationCode1
Cross-Self KV Cache Pruning for Efficient Vision-Language InferenceCode1
Hidden in the Noise: Two-Stage Robust Watermarking for ImagesCode1
Mind the Gap: Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement LearningCode1
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image EditingCode1
Samudra: An AI Global Ocean Emulator for ClimateCode1
MageBench: Bridging Large Multimodal Models to AgentsCode1
TASR: Timestep-Aware Diffusion Model for Image Super-ResolutionCode1
Is JPEG AI going to change image forensics?Code1
Chatting with Logs: An exploratory study on Finetuning LLMs for LogQLCode1
Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud ClassificationCode1
AI-Driven Day-to-Day Route ChoiceCode1
Robust Multi-bit Text Watermark with LLM-based ParaphrasersCode1
Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment AnythingCode1
Nonparametric Filtering, Estimation and Classification using Neural Jump ODEsCode1
Interpreting single-cell and spatial omics data using deep neural network training dynamicsCode1
RFSR: Improving ISR Diffusion Models via Reward Feedback LearningCode1
gghic: A Versatile R Package for Exploring and Visualizing 3D Genome OrganizationCode1
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMsCode1
BIMCaP: BIM-based AI-supported LiDAR-Camera Pose RefinementCode1
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient GenerationCode1
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial TranscriptomicsCode1
Composed Image Retrieval for Training-Free Domain ConversionCode1
EchoONE: Segmenting Multiple echocardiography Planes in One ModelCode1
Expanding Event Modality Applications through a Robust CLIP-Based EncoderCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual OptimizationCode1
Evaluating Language Models as Synthetic Data GeneratorsCode1
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated ModalitiesCode1
How Many Ratings per Item are Necessary for Reliable Significance Testing?Code1
Beyond [cls]: Exploring the true potential of Masked Image Modeling representationsCode1
SGSST: Scaling Gaussian Splatting StyleTransferCode1
Scaling Inference-Time Search with Vision Value Model for Improved Visual ComprehensionCode1
Testing Neural Network Verifiers: A Soundness Benchmark with Hidden CounterexamplesCode1
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable CompressionCode1
NeRF and Gaussian Splatting SLAM in the WildCode1
Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly DetectionCode1
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMsCode1
Show:102550
← PrevPage 386 of 9486Next →