SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 40014050 of 661570 papers

TitleStatusHype
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AICode3
pix2gestalt: Amodal Segmentation by Synthesizing WholesCode3
Marabou 2.0: A Versatile Formal Analyzer of Neural NetworksCode3
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert CacheCode3
An Extensible Framework for Open Heterogeneous Collaborative PerceptionCode3
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM AgentsCode3
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web TasksCode3
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-AlignmentCode3
Benchmarking LLMs via Uncertainty QuantificationCode3
Lumiere: A Space-Time Diffusion Model for Video GenerationCode3
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated TextCode3
In-Context Learning for Extreme Multi-Label ClassificationCode3
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray InterpretationCode3
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View StereoCode3
Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid AlgorithmsCode3
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature SynchronizerCode3
The Manga Whisperer: Automatically Generating Transcriptions for ComicsCode3
RAP-SAM: Towards Real-Time All-Purpose Segment AnythingCode3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsCode3
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsCode3
GARField: Group Anything with Radiance FieldsCode3
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and OpportunitiesCode3
RoHM: Robust Human Motion Reconstruction via DiffusionCode3
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language ModelsCode3
ModernTCN: A Modern Pure Convolution Structure for General Time Series AnalysisCode3
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics PerceptionCode3
A Survey of Resource-efficient LLM and Multimodal Foundation ModelsCode3
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineCode3
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentCode3
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMsCode3
INTERS: Unlocking the Power of Large Language Models in Search with Instruction TuningCode3
GroundingGPT:Language Enhanced Multi-modal Grounding ModelCode3
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMsCode3
AutoAct: Automatic Agent Learning from Scratch for QA via Self-PlanningCode3
Deep learning in motion deblurring: current status, benchmarks and future prospectsCode3
Evaluating Language Model Agency through NegotiationsCode3
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
Universal Time-Series Representation Learning: A SurveyCode3
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D GenerationCode3
MoE-Mamba: Efficient Selective State Space Models with Mixture of ExpertsCode3
Improved motif-scaffolding with SE(3) flow matchingCode3
DiarizationLM: Speaker Diarization Post-Processing with Large Language ModelsCode3
EAT: Self-Supervised Pre-Training with Efficient Audio TransformerCode3
Pheme: Efficient and Conversational Speech GenerationCode3
The Rise of Diffusion Models in Time-Series ForecastingCode3
Denoising Vision TransformersCode3
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language ModelCode3
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN TicketCode3
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language ModelCode3
Show:102550
← PrevPage 81 of 13232Next →