SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1960119650 of 474278 papers

TitleStatusHype
Engineering Serendipity through Recommendations of Items with Atypical AspectsCode0
Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted NetworksCode0
Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation LearningCode0
ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory ImputationCode1
Score-based Generative Modeling for Conditional Independence TestingCode0
How does Transformer Learn Implicit Reasoning?Code1
Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding SpacCode0
SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking ServicesCode0
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence CalibrationCode0
GeNRe: A French Gender-Neutral Rewriting System Using Collective NounsCode0
Darwin Godel Machine: Open-Ended Evolution of Self-Improving AgentsCode5
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM AgentsCode1
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise AnalyticsCode1
Vision Language Models are BiasedCode2
Estimation of Head Motion in Structural MRI and its Impact on Cortical Thickness Measurements in Retrospective DataCode0
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing TasksCode2
D-AR: Diffusion via Autoregressive ModelsCode2
ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering0
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin0
From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information RetrievalCode0
Understanding Refusal in Language Models with Sparse AutoencodersCode0
Merge-Friendly Post-Training Quantization for Multi-Target Domain AdaptationCode0
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network TrainingCode0
Automated Modeling Method for Pathloss Model DiscoveryCode0
Improving the Effective Receptive Field of Message-Passing Neural NetworksCode1
Towards Reward Fairness in RLHF: From a Resource Allocation PerspectiveCode0
Learning Parametric Distributions from Samples and PreferencesCode0
DiffER: Categorical Diffusion for Chemical RetrosynthesisCode0
LLM-based HSE Compliance Assessment: Benchmark, Performance, and AdvancementsCode0
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models TextCode0
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language ModelsCode0
Probability-Consistent Preference Optimization for Enhanced LLM ReasoningCode0
On the Validity of Head Motion Patterns as Generalisable Depression Biomarkers0
Map&Make: Schema Guided Text to Table Generation0
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs0
DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray ClassificationCode0
Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary ThemesCode0
Generating Diverse Training Samples for Relation Extraction with Large Language Models0
How Does Response Length Affect Long-Form FactualityCode0
COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents0
Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning DynamicsCode0
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation0
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring0
SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods0
On-Policy RL with Optimal Reward Baseline0
DeepRTE: Pre-trained Attention-based Neural Network for Radiative TranferCode0
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence0
Generalizability vs. Counterfactual Explainability Trade-Off0
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language ModelsCode0
Show:102550
← PrevPage 393 of 9486Next →