SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1765117700 of 474278 papers

TitleStatusHype
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos0
Exploring bidirectional bounds for minimax-training of Energy-based models0
Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems0
Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin0
Aligning Latent Spaces with Flow Priors0
Design of intelligent proofreading system for English translation based on CNN and BERT0
PUB: An LLM-Enhanced Personality-Driven User Behaviour Simulator for Recommender System Evaluation0
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings0
Context Is Not Comprehension0
Static Word Embeddings for Sentence Semantic Representation0
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights0
Accelerated Test-Time Scaling with Model-Free Speculative Sampling0
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat0
Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection0
CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media0
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text0
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning0
Do Large Language Models Judge Error Severity Like Humans?0
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective0
RELIC: Evaluating Compositional Instruction Following via Language Recognition0
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection0
CHANCERY: Evaluating Corporate Governance Reasoning Capabilities in Language Models0
Agents of Change: Self-Evolving LLM Agents for Strategic Planning0
E-bike agents: Large Language Model-Driven E-Bike Accident Analysis and Severity Prediction0
Empowering Economic Simulation for Massively Multiplayer Online Games through Generative Agent-Based Modeling0
Safe Planning and Policy Optimization via World Model Learning0
Was Residual Penalty and Neural Operators All We Needed for Solving Optimal Control Problems?0
Fast-DataShapley: Neural Modeling for Training Data Valuation0
Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction0
The Oversmoothing Fallacy: A Misguided Narrative in GNN Research0
Communication Efficient Adaptive Model-Driven Quantum Federated Learning0
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction0
Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning0
The cost of ensembling: is it always worth combining?0
Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models0
Log-Linear Attention0
Can Artificial Intelligence Trade the Stock Market?0
Aligning Multimodal Representations through an Information Bottleneck0
Locality Preserving Markovian Transition for Instance Retrieval0
FPTQuant: Function-Preserving Transforms for LLM Quantization0
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent0
Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm0
Learning long range dependencies through time reversal symmetry breaking0
How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control0
Generalizable, real-time neural decoding with hybrid state-space models0
Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation0
Ignoring Directionality Leads to Compromised Graph Neural Network Explanations0
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and DatasetsCode2
ECoRAG: Evidentiality-guided Compression for Long Context RAGCode1
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal VerificationCode1
Show:102550
← PrevPage 354 of 9486Next →