SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1325113300 of 474278 papers

TitleStatusHype
Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models0
VeFIA: An Efficient Inference Auditing Framework for Vertical Federated Collaborative Software0
High-Order Deep Meta-Learning with Category-Theoretic Interpretation0
Automated Grading of Students' Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models0
MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations0
Determination Of Structural Cracks Using Deep Learning Frameworks0
Evaluating Language Models For Threat Detection in IoT Security LogsCode0
S2FGL: Spatial Spectral Federated Graph LearningCode0
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context InjectionCode1
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video DetectionCode1
Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEsCode0
Meta SecAlign: A Secure Foundation LLM Against Prompt Injection AttacksCode2
GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data CommonsCode0
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal AlignmentCode2
Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings0
Fluid Democracy in Federated Data Aggregation0
Weakly-supervised Contrastive Learning with Quantity Prompts for Moving Infrared Small Target DetectionCode0
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks0
FMOcc: TPV-Driven Flow Matching for 3D Occupancy Prediction with Selective State Space Model0
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent0
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks0
UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation0
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning0
LMPNet for Weakly-supervised Keypoint Discovery0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
DeepGesture: A conversational gesture synthesis system based on emotions and semanticsCode0
SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature AlignmentCode2
Hita: Holistic Tokenizer for Autoregressive Image GenerationCode0
Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work0
Continual Gradient Low-Rank Projection Fine-Tuning for LLMsCode0
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics EmulationCode2
AnyI2V: Animating Any Conditional Image with Motion Control0
Fast and Simplex: 2-Simplicial Attention in Triton0
IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders0
RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business ProcessesCode0
Early Signs of Steganographic Capabilities in Frontier LLMsCode0
No time to train! Training-Free Reference-Based Instance SegmentationCode3
Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited DataCode1
CyberRAG: An agentic RAG cyber attack classification and reporting tool0
MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image EnhancementCode0
Understanding and Improving Length Generalization in Recurrent Models0
Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents0
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks0
Exploring Gender Bias Beyond Occupational TitlesCode0
Adversarial Manipulation of Reasoning Models using Internal RepresentationsCode0
WebSailor: Navigating Super-human Reasoning for Web AgentCode11
Physics-informed Ground Reaction Dynamics from Human Motion CaptureCode0
Confidence and Stability of Global and Pairwise Scores in NLP EvaluationCode0
Optimizing Methane Detection On Board Satellites: Speed, Accuracy, and Low-Power Solutions for Resource-Constrained HardwareCode0
Just Noticeable Difference for Large Multimodal ModelsCode0
Show:102550
← PrevPage 266 of 9486Next →