SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2025120300 of 474278 papers

TitleStatusHype
SPA-RL: Reinforcing LLM Agents via Stepwise Progress AttributionCode2
CHIMERA: A Knowledge Base of Idea Recombination in Scientific LiteratureCode0
Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing0
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge0
AbsoluteNet: A Deep Learning Neural Network to Classify Cerebral Hemodynamic Responses of Auditory Processing0
GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous DrivingCode0
Moment kernels: a simple and scalable approach for equivariance to rotations and reflections in deep convolutional networks0
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding0
Sparsified State-Space Models are Efficient Highway NetworksCode0
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent VisibilityCode1
ConText-CIR: Learning from Concepts in Text for Composed Image RetrievalCode1
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language ModelsCode1
DLP: Dynamic Layerwise Pruning in Large Language ModelsCode0
Emotion-aware Dual Cross-Attentive Neural Network with Label Fusion for Stance Detection in Misinformative Social Media ContentCode0
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies0
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation0
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions0
multivariateGPT: a decoder-only transformer for multivariate categorical and numeric data0
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing0
DP-RTFL: Differentially Private Resilient Temporal Federated Learning for Trustworthy AI in Regulated IndustriesCode0
AutoReproduce: Automatic AI Experiment Reproduction with Paper LineageCode1
Born a Transformer -- Always a Transformer?Code0
DenseLoRA: Dense Low-Rank Adaptation of Large Language ModelsCode0
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMsCode0
LLM-Driven E-Commerce Marketing Content Optimization: Balancing Creativity and Conversion0
MedOrchestra: A Hybrid Cloud-Local LLM Approach for Clinical Data Interpretation0
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal AlignmentCode2
ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction0
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot SegmentationCode2
Minute-Long Videos with Dual ParallelismsCode1
Attribute-Efficient PAC Learning of Sparse Halfspaces with Constant Malicious Noise Rate0
3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling0
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks0
Compositional Scene Understanding through Inverse Generative Modeling0
Hierarchical Instruction-aware Embodied Visual Tracking0
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models0
Geometric Feature Prompting of Image Segmentation Models0
MT-Mol:Multi Agent System with Tool-based Reasoning for Molecular Optimization0
Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms0
DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models0
Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains0
Is Hyperbolic Space All You Need for Medical Anomaly Detection?0
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction0
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified DatasetCode5
Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset AnalysisCode0
Improving LLM-based Global Optimization with Search Space PartitioningCode0
Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language ModelsCode0
CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive LearningCode0
What is Adversarial Training for Diffusion Models?0
NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion0
Show:102550
← PrevPage 406 of 9486Next →