SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1590115950 of 474278 papers

TitleStatusHype
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level MathematicsCode1
DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion ModelCode1
Relation Extraction or Pattern Matching? Unravelling the Generalisation Limits of Language Models for Biographical RECode1
Video-GPT via Next Clip DiffusionCode1
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical TasksCode1
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMsCode1
Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language ModelsCode1
ProMi: An Efficient Prototype-Mixture Baseline for Few-Shot Segmentation with Bounding-Box AnnotationsCode1
Spectral-Spatial Self-Supervised Learning for Few-Shot Hyperspectral Image ClassificationCode1
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and CorrectionCode1
Is Artificial Intelligence Generated Image Detection a Solved Problem?Code1
Towards Visuospatial Cognition via Hierarchical Fusion of Visual ExpertsCode1
GATES: Cost-aware Dynamic Workflow Scheduling via Graph Attention Networks and Evolution StrategyCode1
LLM-DSE: Searching Accelerator Parameters with LLM AgentsCode1
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion SummarizationCode1
Visuospatial Cognitive AssistantCode1
Temporal-Spectral-Spatial Unified Remote Sensing Dense PredictionCode1
Efficient RL Training for Reasoning Models via Length-Aware OptimizationCode1
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context LearningCode1
Always Clear Depth: Robust Monocular Depth Estimation under Adverse WeatherCode1
MedVKAN: Efficient Feature Extraction with Mamba and KAN for Medical Image SegmentationCode1
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion ModelCode1
SepPrune: Structured Pruning for Efficient Deep Speech SeparationCode1
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM EvaluationCode1
ELITE: Embedding-Less retrieval with Iterative Text ExplorationCode1
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog GenerationCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart EditingCode1
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the EdgeCode1
DC-Seg: Disentangled Contrastive Learning for Brain Tumor Segmentation with Missing ModalitiesCode1
VenusX: Unlocking Fine-Grained Functional Understanding of ProteinsCode1
BINAQUAL: A Full-Reference Objective Localization Similarity Metric for Binaural AudioCode1
Neuro-Symbolic Query CompilerCode1
Multimodal Cancer Survival Analysis via Hypergraph Learning with Cross-Modality RebalanceCode1
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer InteractionCode1
Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented GenerationCode1
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning EvaluationCode1
Sample Efficient Reinforcement Learning via Large Vision Language Model DistillationCode1
Accurate KV Cache Quantization with Outlier Tokens TracingCode1
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability TheoryCode1
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch MiningCode1
X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid ImitationCode1
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyMLCode1
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing FrameworkCode1
Unifying Segment Anything in Microscopy with Multimodal Large Language ModelCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World ScenariosCode1
PoE-World: Compositional World Modeling with Products of Programmatic ExpertsCode1
Show:102550
← PrevPage 319 of 9486Next →