SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 83518400 of 661570 papers

TitleStatusHype
Discovering Preference Optimization Algorithms with and for Large Language ModelsCode2
Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image ClassificationCode2
Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding NetworkCode2
LVBench: An Extreme Long Video Understanding BenchmarkCode2
Time-MMD: Multi-Domain Multimodal Dataset for Time Series AnalysisCode2
KernelWarehouse: Rethinking the Design of Dynamic ConvolutionCode2
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language ModelsCode2
Spoof Diarization: "What Spoofed When" in Partially Spoofed AudioCode2
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile DevicesCode2
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-SpeechCode2
EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy NetworkCode2
Autoregressive Pretraining with Mamba in VisionCode2
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow UnderstandingCode2
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance DistillationCode2
RWKV-CLIP: A Robust Vision-Language Representation LearnerCode2
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression LearningCode2
Blur-aware Spatio-temporal Sparse Transformer for Video DeblurringCode2
Let Go of Your Labels with Unsupervised TransferCode2
Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation ModelsCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
Improving Autoformalization using Type CheckingCode2
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat IntelligenceCode2
QuickLLaMA: Query-aware Inference Acceleration for Large Language ModelsCode2
Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition MonitoringCode2
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly DetectionCode2
Needle In A Multimodal HaystackCode2
Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted TreesCode2
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and ArenaCode2
RS-Agent: Automating Remote Sensing Tasks through Intelligent AgentCode2
A Synthetic Dataset for Personal Attribute InferenceCode2
Meent: Differentiable Electromagnetic Simulator for Machine LearningCode2
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence ModelsCode2
FRAG: Frequency Adapting Group for Diffusion Video EditingCode2
ProcessPainter: Learn Painting Process from Sequence DataCode2
Towards Lifelong Learning of Large Language Models: A SurveyCode2
EpiLearn: A Python Library for Machine Learning in Epidemic ModelingCode2
Generalizable Human Gaussians from Single-View ImageCode2
Compositional Video Generation as Flow EqualizationCode2
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance AssessorCode2
Vript: A Video Is Worth Thousands of WordsCode2
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video EditingCode2
RepoQA: Evaluating Long Context Code UnderstandingCode2
MVGamba: Unify 3D Content Generation as State Space Sequence ModelingCode2
Safety Alignment Should Be Made More Than Just a Few Tokens DeepCode2
STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomicsCode2
Low-Rank Quantization-Aware Training for LLMsCode2
Compute Better Spent: Replacing Dense Layers with Structured MatricesCode2
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language ModelsCode2
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale DatasetCode2
Show:102550
← PrevPage 168 of 13232Next →