SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 42764300 of 661570 papers

TitleStatusHype
InstructIE: A Bilingual Instruction-based Information Extraction DatasetCode3
Quantifying the robustness of deep multispectral segmentation models against natural perturbations and data poisoningCode3
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational AbilitiesCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Accelerating Transformer Inference for Translation via Parallel DecodingCode3
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning ResearchCode3
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and VerificationCode3
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsCode3
A Comprehensive Survey on Segment Anything Model for Vision and BeyondCode3
WikiWeb2M: A Page-Level Multimodal Wikipedia DatasetCode3
MultiModal-GPT: A Vision and Language Model for Dialogue with HumansCode3
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign LanguagesCode3
Visual Causal Scene Refinement for Video Question AnsweringCode3
PiML Toolbox for Interpretable Machine Learning Model Development and DiagnosticsCode3
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human SupervisionCode3
Personalize Segment Anything Model with One ShotCode3
Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language ModelsCode3
Unlimiformer: Long-Range Transformers with Unlimited Length InputCode3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationCode3
UCF: Uncovering Common Features for Generalizable Deepfake DetectionCode3
LibCity: A Unified Library Towards Efficient and Comprehensive Urban Spatial-Temporal PredictionCode3
TorchBench: Benchmarking PyTorch with High API Surface CoverageCode3
Learning Neural PDE Solvers with Parameter-Guided Channel AttentionCode3
Show:102550
← PrevPage 172 of 26463Next →