SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1990119950 of 474278 papers

TitleStatusHype
Distributionally Robust Wireless Semantic Communication with Large AI Models0
Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked DataCode0
Point-to-Region Loss for Semi-Supervised Point-Based Crowd CountingCode0
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement LearningCode3
Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon0
Autoregression-free video prediction using diffusion model for mitigating error propagationCode0
Single Domain Generalization for Alzheimer's Detection from 3D MRIs with Pseudo-Morphological Augmentations and Contrastive LearningCode0
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language ModelsCode0
ChatCFD: an End-to-End CFD Agent with Domain-specific Structured ThinkingCode1
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic PotentialsCode2
Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks0
CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning0
Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations0
Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction0
Large Language Models Often Know When They Are Being Evaluated0
Directed Homophily-Aware Graph Neural Network0
ValueSim: Generating Backstories to Model Individual Value Systems0
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models0
ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage0
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation0
Document Valuation in LLM Summaries: A Cluster Shapley Approach0
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs0
SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context0
Measuring Sycophancy of Language Models in Multi-turn DialoguesCode1
SOReL and TOReL: Two Methods for Fully Offline Reinforcement LearningCode0
Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems0
Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities0
Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models0
Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage TaskCode0
Enabling Flexible Multi-LLM Integration for Scalable Knowledge AggregationCode0
VIRAL: Vision-grounded Integration for Reward design And LearningCode0
RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug DiscoveryCode0
Benchmarking Abstract and Reasoning Abilities Through A Theoretical PerspectiveCode0
Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment LossesCode0
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs0
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning0
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems0
Skywork Open Reasoner 1 Technical ReportCode4
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and BenchmarkingCode1
Climate Finance BenchCode0
Update Your Transformer to the Latest Release: Re-Basin of Task VectorsCode1
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM InferenceCode0
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image GenerationCode0
Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series ForecastingCode0
Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging0
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation0
The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector0
RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting0
LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning0
Calibrated Value-Aware Model Learning with Stochastic Environment Models0
Show:102550
← PrevPage 399 of 9486Next →