SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 33513400 of 659983 papers

TitleStatusHype
MMedAgent: Learning to Use Medical Tools with Multi-modal AgentCode3
Searching for Best Practices in Retrieval-Augmented GenerationCode3
BERGEN: A Benchmarking Library for Retrieval-Augmented GenerationCode3
Evaluation of Text-to-Video Generation Models: A Dynamics PerspectiveCode3
xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba CounterpartCode3
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model AgentsCode3
Retrieval-augmented generation in multilingual settingsCode3
StyleShot: A Snapshot on Any StyleCode3
Tree Search for Language Model AgentsCode3
Instruct-IPT: All-in-One Image Processing Transformer via Weight ModulationCode3
Deep Frequency Derivative Learning for Non-stationary Time Series ForecastingCode3
SpotlessSplats: Ignoring Distractors in 3D Gaussian SplattingCode3
LLaRA: Supercharging Robot Learning Data for Vision-Language PolicyCode3
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything ModelCode3
Segment Anything without SupervisionCode3
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at ScaleCode3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
A Survey on Mixture of ExpertsCode3
Diffusion Model-Based Video Editing: A SurveyCode3
A Review of Large Language Models and Autonomous Agents in ChemistryCode3
AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha FactorsCode3
Director3D: Real-world Camera Trajectory and 3D Scene Generation from TextCode3
Point-SAM: Promptable 3D Segmentation Model for Point CloudsCode3
Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear ClassificationCode3
Adam-mini: Use Fewer Learning Rates To Gain MoreCode3
Panza: Design and Analysis of a Fully-Local Personalized Text Writing AssistantCode3
Lossless data compression by large modelsCode3
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and LocalizationCode3
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image AnalysisCode3
AudioBench: A Universal Benchmark for Audio Large Language ModelsCode3
Are Language Models Actually Useful for Time Series Forecasting?Code3
Taming 3DGS: High-Quality Radiance Fields with Limited ResourcesCode3
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion ModelsCode3
^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network PotentialsCode3
Consistency Models Made EasyCode3
Visible-Thermal Tiny Object Detection: A Benchmark Dataset and BaselinesCode3
LLM4CP: Adapting Large Language Models for Channel PredictionCode3
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM AgentsCode3
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language ModelsCode3
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual GenerationCode3
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language ModelsCode3
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model PromptsCode3
SpatialBot: Precise Spatial Understanding with Vision Language ModelsCode3
Detecting hallucinations in large language models using semantic entropyCode3
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?Code3
Evaluating representation learning on the protein structure universeCode3
DF40: Toward Next-Generation Deepfake DetectionCode3
TSI-Bench: Benchmarking Time Series ImputationCode3
VoCo-LLaMA: Towards Vision Compression with Large Language ModelsCode3
Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of CzechCode3
Show:102550
← PrevPage 68 of 13200Next →