SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1750117550 of 474278 papers

TitleStatusHype
Algebraic Machine Learning: Learning as computing an algebraic decomposition of a taskCode1
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object RepresentationCode1
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep LearningCode1
Your contrastive learning problem is secretly a distribution alignment problemCode1
Complex LLM Planning via Automated Heuristics DiscoveryCode1
HDEE: Heterogeneous Domain Expert EnsembleCode1
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit ProfilesCode1
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real UsersCode1
Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and InvestigationCode1
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent SystemsCode1
Sparklen: A Statistical Learning Toolkit for High-Dimensional Hawkes Processes in PythonCode1
Distilling Reinforcement Learning Algorithms for In-Context Model-Based PlanningCode1
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample CreationCode1
SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein DegradationCode1
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking StudyCode1
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMsCode1
OS-Kairos: Adaptive Interaction for MLLM-Powered GUI AgentsCode1
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event CamerasCode1
AKDT: Adaptive Kernel Dilation Transformer for Effective Image DenoisingCode1
TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory SimulationCode1
Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?Code1
CAMEx: Curvature-aware Merging of ExpertsCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
Poster: Long PHP webshell files detection based on sliding window attentionCode1
Evaluating Intelligence via Trial and ErrorCode1
EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-trainingCode1
Reward Shaping to Mitigate Reward Hacking in RLHFCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated CodeCode1
Starjob: Dataset for LLM-Driven Job Shop SchedulingCode1
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question AnsweringCode1
CryptoPulse: Short-Term Cryptocurrency Forecasting with Dual-Prediction and Cross-Correlated Market IndicatorsCode1
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment ModelCode1
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language ModelsCode1
LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading ArenaCode1
Escaping The Big Data Paradigm in Self-Supervised Representation LearningCode1
Multi-Perspective Data Augmentation for Few-shot Object DetectionCode1
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric VideosCode1
Transfer Learning Assisted Fast Design Migration Over Technology Nodes: A Study on Transformer Matching NetworkCode1
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose EstimationCode1
Unveiling the Key Factors for Distilling Chain-of-Thought ReasoningCode1
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning AttacksCode1
Inverse Materials Design by Large Language Model-Assisted Generative FrameworkCode1
Can Multimodal LLMs Perform Time Series Anomaly Detection?Code1
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMsCode1
Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of ThoughtCode1
Towards Enhanced Immersion and Agency for LLM-based Interactive DramaCode1
MRBTP: Efficient Multi-Robot Behavior Tree Planning and CollaborationCode1
Training Consistency Models with Variational Noise CouplingCode1
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous ConstraintsCode1
Show:102550
← PrevPage 351 of 9486Next →