SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 91269150 of 474278 papers

TitleStatusHype
HoPE: Hybrid of Position Embedding for Long Context Vision-Language ModelsCode0
ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic SubjectsCode0
acia-workflows: Automated Single-cell Imaging Analysis for Scalable and Deep Learning-based Live-cell Imaging Analysis WorkflowsCode0
Efficient Universal Models for Medical Image Segmentation via Weakly Supervised In-Context LearningCode0
Adaptive Stain Normalization for Cross-Domain Medical HistologyCode0
SDQM: Synthetic Data Quality Metric for Object Detection Dataset EvaluationCode0
MSITrack: A Challenging Benchmark for Multispectral Single Object TrackingCode0
Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter OptimisationCode0
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint TransformerCode0
Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token OptimizationCode0
StyleKeeper: Prevent Content Leakage using Negative Visual Query GuidanceCode0
SID: Multi-LLM Debate Driven by Self SignalsCode0
PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessingCode0
Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language ModelsCode0
Label Semantics for Robust Hyperspectral Image ClassificationCode0
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language ModelsCode0
Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training DataCode0
Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph RegularizationCode0
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language ModelsCode0
Functional Matching of Logic Subgraphs: Beyond Structural IsomorphismCode0
360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-TrainingCode0
Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented GenerationCode0
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine ValidityCode0
A Rotation-Invariant Embedded Platform for (Neural) Cellular AutomataCode0
Distilling Lightweight Language Models for C/C++ VulnerabilitiesCode0
Show:102550
← PrevPage 366 of 18972Next →