SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2050120550 of 474278 papers

TitleStatusHype
Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching0
Learning Annotation Consensus for Continuous Emotion Recognition0
Leveraging GANs for citation intent classification and its impact on citation network analysisCode0
MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent SystemsCode1
NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal SimulationCode3
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement LearningCode1
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy SpaceCode1
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token RoutingCode2
SELF-PERCEPT: Introspection Improves Large Language Models' Detection of Multi-Person Mental Manipulation in ConversationsCode0
Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMsCode0
Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration0
FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion modelsCode0
MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate ModelsCode0
MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition0
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech0
Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework0
HTMNet: A Hybrid Network with Transformer-Mamba Bottleneck Multimodal Fusion for Transparent and Reflective Objects Depth Completion0
Fully Spiking Neural Networks for Unified Frame-Event Object Tracking0
Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion0
Sci-Fi: Symmetric Constraint for Frame Inbetweening0
Supervised and self-supervised land-cover segmentation & classification of the Biesbosch wetlands0
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models0
Dual-Polarization Stacked Intelligent Metasurfaces for Holographic MIMOCode1
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent CollaborationCode1
RoBiS: Robust Binary Segmentation for High-Resolution Industrial ImagesCode1
OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image ModelsCode0
A domain adaptation neural network for digital twin-supported fault diagnosisCode0
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion0
Stereo Radargrammetry Using Deep Learning from Airborne SAR Images0
Counterfactual Multi-player Bandits for Explainable Recommendation DiversificationCode0
Stationary MMD Points for CubatureCode0
DeSocial: Blockchain-based Decentralized Social NetworksCode1
Automated Privacy Information Annotation in Large Language Model InteractionsCode0
Stochastic Geometry-Based Performance Evaluation for LEO Satellite-Assisted Space Caching0
RefAV: Towards Planning-Centric Scenario MiningCode1
The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation ProjectCode2
LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs0
Probabilistic Spatial Interpolation of Sparse Data using Diffusion Models0
An Open-Source Python Framework and Synthetic ECG Image Datasets for Digitization, Lead and Lead Name Detection, and Overlapping Signal SegmentationCode0
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration0
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting0
VSCBench: Bridging the Gap in Vision-Language Model Safety CalibrationCode0
Enhancing Contrastive Learning-based Electrocardiogram Pretrained Model with Patient Memory QueueCode0
HAMburger: Accelerating LLM Inference via Token Smashing0
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data GenerationCode4
In-context Language Learning for Endangered Languages in Speech Recognition0
Electrolyzers-HSI: Close-Range Multi-Scene Hyperspectral Imaging Benchmark DatasetCode0
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement LearningCode1
Detection of Suicidal Risk on Social Media: A Hybrid Model0
Show:102550
← PrevPage 411 of 9486Next →