SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1640116450 of 474278 papers

TitleStatusHype
Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems0
Bayesian Probabilistic Matrix Factorization0
Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling0
Gender Bias in English-to-Greek Machine TranslationCode0
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits0
ECAM: A Contrastive Learning Approach to Avoid Environmental Collision in Trajectory ForecastingCode0
Efficient Part-level 3D Object Generation via Dual Volume PackingCode4
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language ModelsCode1
The Four Color Theorem for Cell Instance SegmentationCode1
Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic SegmentationCode1
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual DrawingCode2
Revisiting Diffusion Models: From Generative Pre-training to One-Step GenerationCode1
LPO: Towards Accurate GUI Agent Interaction via Location Preference OptimizationCode0
Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text RecognitionCode0
TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement LearningCode0
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal EndoscopyCode0
Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME0
Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective0
Survival Analysis as Imprecise Classification with Trainable KernelsCode0
Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection0
Scalable Non-Equivariant 3D Molecule Generation via Rotational AlignmentCode0
Wasserstein Barycenter Soft Actor-Critic0
AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent0
Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering0
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games0
ToxSyn-PT: A Large-Scale Synthetic Dataset for Hate Speech Detection in Portuguese0
Probabilistic Variational Contrastive Learning0
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning0
A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes0
Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders0
Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities0
Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors0
Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition0
Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms0
Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction0
LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation0
A new type of federated clustering: A non-model-sharing approach0
Bridging the Gap Between Open-Source and Proprietary LLMs in Table QACode0
Noise Conditional Variational Score DistillationCode1
When Large Language Models are Reliable for Judging Empathic CommunicationCode0
CoRT: Code-integrated Reasoning within ThinkingCode2
Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20^th century Urban Landscapes with Satellite ImageriesCode2
Classifying Unreliable Narrators with Large Language ModelsCode0
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods0
When Meaning Stays the Same, but Models Drift: Evaluating Quality of Service under Token-Level Behavioral Instability in LLMsCode0
Aspect-Based Opinion Summarization with Argumentation Schemes0
Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models0
Unsupervised Elicitation of Language Models0
Show:102550
← PrevPage 329 of 9486Next →