SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 83018350 of 661570 papers

TitleStatusHype
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMsCode2
Consistency-diversity-realism Pareto fronts of conditional image generative modelsCode2
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation DetectionCode2
ControlVAR: Exploring Controllable Visual Autoregressive ModelingCode2
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language ModelsCode2
DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving ApplicationsCode2
Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language NavigationCode2
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation ModelsCode2
BEACON: Benchmark for Comprehensive RNA Tasks and Language ModelsCode2
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code GenerationCode2
QQQ: Quality Quattuor-Bit Quantization for Large Language ModelsCode2
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian SplattingCode2
An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare RecordsCode2
Dynamic Asset Allocation with Asset-Specific Regime ForecastsCode2
Interpreting the Weight Space of Customized Diffusion ModelsCode2
Yo'LLaVA: Your Personalized Language and Vision AssistantCode2
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMsCode2
Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content DetectorsCode2
Understanding Hallucinations in Diffusion Models through Mode InterpolationCode2
Fredformer: Frequency Debiased Transformer for Time Series ForecastingCode2
On Softmax Direct Preference Optimization for RecommendationCode2
Classic GNNs are Strong Baselines: Reassessing GNNs for Node ClassificationCode2
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMsCode2
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided TransformerCode2
LRM-Zero: Training Large Reconstruction Models with Synthesized DataCode2
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language ModelsCode2
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite ImageryCode2
StreamBench: Towards Benchmarking Continuous Improvement of Language AgentsCode2
Towards Vision-Language Geo-Foundation Model: A SurveyCode2
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object DetectionCode2
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano PerformanceCode2
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource ScenariosCode2
S^3 -- Semantic Signal SeparationCode2
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision MakingCode2
Explore the Limits of Omni-modal Pretraining at ScaleCode2
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language ModelsCode2
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image RetrievalCode2
Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language ModelCode2
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future DirectionsCode2
CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion ModelsCode2
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMsCode2
Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding ChallengesCode2
BTS: Building Timeseries Dataset: Empowering Large-Scale Building AnalyticsCode2
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style CaptioningCode2
Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding NetworkCode2
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal ModelsCode2
LVBench: An Extreme Long Video Understanding BenchmarkCode2
DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional TransformerCode2
Time-MMD: Multi-Domain Multimodal Dataset for Time Series AnalysisCode2
Spoof Diarization: "What Spoofed When" in Partially Spoofed AudioCode2
Show:102550
← PrevPage 167 of 13232Next →