SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,984 papers248,104 code links4,818 tasks

Papers

Showing 26012650 of 177340 papers

TitleStatusHype
Universal Language Model Fine-tuning for Text ClassificationCode3
pfl-research: simulation framework for accelerating research in Private Federated LearningCode3
8-bit Optimizers via Block-wise QuantizationCode3
YuLan-Mini: An Open Data-efficient Language ModelCode3
Diffusion-LM Improves Controllable Text GenerationCode3
HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based ApproachCode3
LLM-QAT: Data-Free Quantization Aware Training for Large Language ModelsCode3
Declarative generation of RDF-star graphs from heterogeneous dataCode3
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache QuantizationCode3
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation ModelCode3
SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligenceCode3
Model-Free Opponent ShapingCode3
Robust Latent Matters: Boosting Image Generation with Sampling ErrorCode3
Emergence of Segmentation with Minimalistic White-Box TransformersCode3
EfficientNetV2: Smaller Models and Faster TrainingCode3
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous GraphsCode3
Composer: Creative and Controllable Image Synthesis with Composable ConditionsCode3
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale GamesCode3
FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance FieldsCode3
LLM4Drive: A Survey of Large Language Models for Autonomous DrivingCode3
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian SplattingCode3
Zero-shot Entity Linking with Less DataCode3
Paint by Example: Exemplar-based Image Editing with Diffusion ModelsCode3
From Automation to Autonomy: A Survey on Large Language Models in Scientific DiscoveryCode3
Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox OptimizationCode3
BayLing 2: A Multilingual Large Language Model with Efficient Language AlignmentCode3
RSMamba: Remote Sensing Image Classification with State Space ModelCode3
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language ModelsCode3
Proteina: Scaling Flow-based Protein Structure Generative ModelsCode3
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback LearningCode3
AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic BenchmarkingCode3
Self-rewarding correction for mathematical reasoningCode3
Moving Object Segmentation: All You Need Is SAM (and Flow)Code3
MDCrow: Automating Molecular Dynamics Workflows with Large Language ModelsCode3
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech RepresentationsCode3
Prompt-to-LeaderboardCode3
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image GenerationCode3
GameBench: Evaluating Strategic Reasoning Abilities of LLM AgentsCode3
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical AbstractsCode3
BigGait: Learning Gait Representation You Want by Large Vision ModelsCode3
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based AgentsCode3
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeersCode3
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A SurveyCode3
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene DecompositionCode3
OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in InfographicsCode3
nnInteractive: Redefining 3D Promptable SegmentationCode3
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image GenerationCode3
Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative FrameworkCode3
Ai2 Scholar QA: Organized Literature Synthesis with AttributionCode3
Show:102550
← PrevPage 53 of 3547Next →