SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2045120500 of 474278 papers

TitleStatusHype
Self-Assessed Generation: Trustworthy Label Generation for Optical Flow and Stereo Matching in Real-worldCode1
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement LearningCode1
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers ContentCode1
Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly DetectionCode1
MAIR: A Massive Benchmark for Evaluating Instructed RetrievalCode1
PCF-Lift: Panoptic Lifting by Probabilistic Contrastive FusionCode1
Hard-Constrained Neural Networks with Universal Approximation GuaranteesCode1
MAFin: Motif Detection in Multiple Alignment FilesCode1
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized EmbeddingsCode1
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language ModelsCode1
GraFPrint: A GNN-Based Approach for Audio IdentificationCode1
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with GraphsCode1
Customize Your Visual Autoregressive Recipe with Set Autoregressive ModelingCode1
TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory PredictionCode1
Differentiable Weightless Neural NetworksCode1
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video ModelsCode1
Replay-and-Forget-Free Graph Class-Incremental Learning: A Task Profiling and Prompting ApproachCode1
LoLI-Street: Benchmarking Low-Light Image Enhancement and BeyondCode1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
Taming Overconfidence in LLMs: Reward Calibration in RLHFCode1
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple DomainsCode1
STA-Unet: Rethink the semantic redundant for Medical Imaging SegmentationCode1
Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise PerturbationCode1
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video CompressionCode1
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor DefenseCode1
TULIP: Token-length Upgraded CLIPCode1
Combining Generative and Geometry Priors for Wide-Angle Portrait CorrectionCode1
Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text RecognitionCode1
AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion PriorCode1
EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMsCode1
Prompt Tuning for Audio Deepfake Detection: Computationally Efficient Test-time Domain Adaptation with Limited Target DatasetCode1
Robust 3D Point Clouds Classification based on Declarative DefendersCode1
RMB: Comprehensively Benchmarking Reward Models in LLM AlignmentCode1
InterMask: 3D Human Interaction Generation via Collaborative Masked ModellingCode1
Variational Diffusion Posterior Sampling with Midpoint GuidanceCode1
UnSeg: One Universal Unlearnable Example Generator is Enough against All Image SegmentationCode1
Exploring Behavior-Relevant and Disentangled Neural Dynamics with Generative Diffusion ModelsCode1
Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid ViewsCode1
The Best of Both Worlds: On the Dilemma of Out-of-distribution DetectionCode1
Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog GenerationCode1
FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation ModelsCode1
SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight CompressionCode1
SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric AnalysisCode1
Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space ModelsCode1
Skipping Computations in Multimodal LLMsCode1
Towards Multi-Modal Animal Pose Estimation: A Survey and In-Depth AnalysisCode1
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question AnsweringCode1
MTL-LoRA: Low-Rank Adaptation for Multi-Task LearningCode1
Rethinking Data Selection at Scale: Random Selection is Almost All You NeedCode1
Show:102550
← PrevPage 410 of 9486Next →