SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1800118050 of 474278 papers

TitleStatusHype
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM InferenceCode1
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt OptimizationCode1
Active Task Disambiguation with LLMsCode1
UltraIF: Advancing Instruction Following from the WildCode1
Robotouille: An Asynchronous Planning Benchmark for LLM AgentsCode1
HOG-Diff: Higher-Order Guided Diffusion for Graph GenerationCode1
MedGNN: Towards Multi-resolution Spatiotemporal Graph Learning for Medical Time Series ClassificationCode1
Temporal Distribution Shift in Real-World Pharmaceutical Data: Implications for Uncertainty Quantification in QSAR ModelsCode1
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple InteractionsCode1
Large Language Models for Multi-Robot Systems: A SurveyCode1
Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware ConsistencyCode1
Syntriever: How to Train Your Retriever with Synthetic Data from LLMsCode1
Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model MergingCode1
ADIFF: Explaining audio difference using natural languageCode1
AttentionPredictor: Temporal Pattern Matters for Efficient LLM InferenceCode1
MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal GenerationCode1
TorchResist: Open-Source Differentiable Resist SimulatorCode1
STURM-Flood: a curated dataset for deep learning-based flood extent mapping leveraging Sentinel-1 and Sentinel-2 imageryCode1
SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and BeyondCode1
Understanding and Enhancing the Transferability of Jailbreaking AttacksCode1
Intent Representation Learning with Large Language Model for RecommendationCode1
SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language ModelsCode1
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited LabelsCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
Do Large Language Model Benchmarks Test Reliability?Code1
Fine-Tuning Strategies for Continual Online EEG Motor Imagery Decoding: Insights from a Large-Scale Longitudinal StudyCode1
PICBench: Benchmarking LLMs for Photonic Integrated Circuits DesignCode1
A Mixture-Based Framework for Guiding Diffusion ModelsCode1
Kozax: Flexible and Scalable Genetic Programming in JAXCode1
Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning DynamicsCode1
Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal ControlCode1
A Multi-Task Learning Approach to Linear Multivariate ForecastingCode1
All-in-One Image Compression and RestorationCode1
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modallyCode1
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake DetectionCode1
Interactive Symbolic Regression through Offline Reinforcement Learning: A Co-Design FrameworkCode1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMsCode1
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion ModelsCode1
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level RoutingCode1
Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and ClassificationCode1
SurvHive: a package to consistently access multiple survival-analysis packagesCode1
Transformers Boost the Performance of Decision Trees on Tabular Data across Sample SizesCode1
DAMO: Data- and Model-aware Alignment of Multi-modal LLMsCode1
Adaptive Self-improvement LLM Agentic System for ML Library DevelopmentCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
Accurate Pocket Identification for Binding-Site-Agnostic DockingCode1
T-SCEND: Test-time Scalable MCTS-enhanced Diffusion ModelCode1
Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory PredictionCode1
SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and DatasetCode1
EFKAN: A KAN-Integrated Neural Operator For Efficient Magnetotelluric Forward ModelingCode1
Show:102550
← PrevPage 361 of 9486Next →