SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 89018925 of 474278 papers

TitleStatusHype
STEAM: A Semantic-Level Knowledge Editing Framework for Large Language ModelsCode0
Graph Your Own PromptCode0
Multi-Task Learning with Feature-Similarity Laplacian Graphs for Predicting Alzheimer's Disease ProgressionCode0
Anchor-based Maximum Discrepancy for Relative Similarity TestingCode0
A Simple and Better Baseline for Visual GroundingCode0
Preserving LLM Capabilities through Calibration Data Curation: From Analysis to OptimizationCode0
ProteinAE: Protein Diffusion Autoencoders for Structure EncodingCode0
Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine TranslationCode0
Are Language Models Consequentialist or Deontological Moral Reasoners?Code0
Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role PromptingCode0
Probing the Difficulty Perception Mechanism of Large Language ModelsCode0
RECON: Reasoning with Condensation for Efficient Retrieval-Augmented GenerationCode0
Towards Self-Refinement of Vision-Language Models with Triangular ConsistencyCode0
MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor SegmentationCode0
RePro: Training Language Models to Faithfully Recycle the Web for PretrainingCode0
Fast and Interpretable Protein Substructure Alignment via Optimal TransportCode0
RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task PlanningCode0
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question AnsweringCode0
EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View SelectionCode0
Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic PatternCode0
VL Norm: Rethink Loss Aggregation in RLVRCode0
Latent Reasoning via Sentence Embedding Prediction0
Language Surgery in Multilingual Large Language Models0
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny0
Bridging Graph and State-Space Modeling for Intensive Care Unit Length of Stay PredictionCode0
Show:102550
← PrevPage 357 of 18972Next →