SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 60016050 of 661570 papers

TitleStatusHype
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit TopologiesCode2
MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and EditingCode2
Neural Posterior Estimation for Cataloging Astronomical Images with Spatially Varying Backgrounds and Point Spread FunctionsCode2
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly DetectionCode2
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence InferenceCode2
Mobius: Text to Seamless Looping Video Generation via Latent ShiftCode2
LiteASR: Efficient Automatic Speech Recognition with Low-Rank ApproximationCode2
FlexVAR: Flexible Visual Autoregressive Modeling without Residual PredictionCode2
Image Referenced Sketch Colorization Based on Animation Creation WorkflowCode2
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion ModelCode2
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report GenerationCode2
One-for-More: Continual Diffusion Model for Anomaly DetectionCode2
Sanity Checking Causal Representation Learning on a Simple Real-World SystemCode2
InsTaG: Learning Personalized 3D Talking Head from Few-Second VideoCode2
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image FusionCode2
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASRCode2
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You ThinkCode2
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian SplattingCode2
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series ForecastingCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
BIG-Bench Extra HardCode2
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K TokensCode2
Medical Hallucinations in Foundation Models and Their Impact on HealthcareCode2
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsCode2
OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language ModelsCode2
NeoBERT: A Next-Generation BERTCode2
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward SystemsCode2
Rank1: Test-Time Compute for Reranking in Information RetrievalCode2
SPECTRE: An FFT-Based Efficient Drop-In Replacement to Self-Attention for Long ContextsCode2
LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented SearchersCode2
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision SupportCode2
RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-ThoughtsCode2
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human PreferenceCode2
WebGames: Challenging General-Purpose Web-Browsing AI AgentsCode2
MegaLoc: One Retrieval to Place Them AllCode2
Diffusion Models for Tabular Data: Challenges, Current Progress, and Future DirectionsCode2
Benchmarking Retrieval-Augmented Generation in Multi-Modal ContextsCode2
Delta Decompression for MoE-based LLMs CompressionCode2
PointSea: Point Cloud Completion via Self-structure AugmentationCode2
Introducing Visual Perception Token into Multimodal Large Language ModelCode2
The GigaMIDI Dataset with Features for Expressive Music Performance DetectionCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical SystemsCode2
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization AlignmentCode2
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and VerificationCode2
Audio-FLAN: A Preliminary ReleaseCode2
FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor RecognitionCode2
A Survey on Industrial Anomalies SynthesisCode2
SalM2: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver AttentionCode2
Show:102550
← PrevPage 121 of 13232Next →