SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2020120250 of 474278 papers

TitleStatusHype
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward PassesCode1
Joint Point Cloud Upsampling and Cleaning with Octree-based CNNsCode1
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
Automated Spinal MRI Labelling from Reports Using a Large Language ModelCode1
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information CoverageCode1
TopoDiffusionNet: A Topology-aware Diffusion ModelCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
Fair Bilevel Neural Network (FairBiNN): On Balancing fairness and accuracy via Stackelberg EquilibriumCode1
Residual vector quantization for KV cache compression in large language modelCode1
START: A Generalized State Space Model with Saliency-Driven Token-Aware TransformationCode1
SeisLM: a Foundation Model for Seismic WaveformsCode1
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningCode1
PROMPTHEUS: A Human-Centered Pipeline to Streamline SLRs with LLMsCode1
Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience ReportCode1
QuickBind: A Light-Weight And Interpretable Molecular Docking ModelCode1
Scalability of memorization-based machine unlearningCode1
Elucidating the design space of language models for image generationCode1
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosCode1
Bayesian scaling laws for in-context learningCode1
LTBoost: Boosted Hybrids of Ensemble Linear and Gradient Algorithms for the Long-term Time Series ForecastingCode1
PALMS: Plane-based Accessible Indoor Localization Using Mobile SmartphonesCode1
Can Knowledge Editing Really Correct Hallucinations?Code1
TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of SightCode1
Erasing Undesirable Concepts in Diffusion Models with Adversarial PreservationCode1
A Realistic Threat Model for Large Language Model JailbreaksCode1
AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News DetectionCode1
GATEAU: Selecting Influential Samples for Long Context AlignmentCode1
Comprehensive benchmarking of large language models for RNA secondary structure predictionCode1
Reinforced Imitative Trajectory Planning for Urban Automated DrivingCode1
CausalGraph2LLM: Evaluating LLMs for Causal QueriesCode1
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?Code1
Catastrophic Failure of LLM Unlearning via QuantizationCode1
Building A Coding Assistant via the Retrieval-Augmented Language ModelCode1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze DatasetCode1
Arithmetic Transformers Can Length-Generalize in Both Operand Length and CountCode1
On conditional diffusion models for PDE simulationsCode1
Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuningCode1
Reflection-Bench: probing AI intelligence with reflectionCode1
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech RecognitionCode1
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via CompressionCode1
Explainability of Point Cloud Neural Networks Using SMILE: Statistical Model-Agnostic Interpretability with Local ExplanationsCode1
Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentationCode1
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional BootstrappingCode1
TrackMe:A Simple and Effective Multiple Object Tracking Annotation ToolCode1
M-RewardBench: Evaluating Reward Models in Multilingual SettingsCode1
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference LearningCode1
IPO: Interpretable Prompt Optimization for Vision-Language ModelsCode1
Causality for Large Language ModelsCode1
Scene Graph Generation with Role-Playing Large Language ModelsCode1
A Comprehensive Evaluation of Cognitive Biases in LLMsCode1
Show:102550
← PrevPage 405 of 9486Next →