SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 21012150 of 659983 papers

TitleStatusHype
Images Speak in Images: A Generalist Painter for In-Context Visual LearningCode4
DreamGen: Unlocking Generalization in Robot Learning through Video World ModelsCode4
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement LearningCode4
Cognitive Architectures for Language AgentsCode4
AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video DataCode4
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking CompetitionCode4
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text RetrieversCode4
Mamba YOLO: A Simple Baseline for Object Detection with State Space ModelCode4
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model MeasurementsCode4
Compressible-composable NeRF via Rank-residual DecompositionCode4
Structured Pruning for Deep Convolutional Neural Networks: A surveyCode4
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judgeCode4
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing TasksCode4
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active AssistanceCode4
Orb: A Fast, Scalable Neural Network PotentialCode4
Spirit LM: Interleaved Spoken and Written Language ModelCode4
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world EnvironmentsCode4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBenchCode4
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion TokensCode4
Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades LaterCode4
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object DetectionCode4
TabM: Advancing Tabular Deep Learning with Parameter-Efficient EnsemblingCode4
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank AdaptationCode4
SegGPT: Segmenting Everything In ContextCode4
TinyLLaVA: A Framework of Small-scale Large Multimodal ModelsCode4
Building reliable sim driving agents by scaling self-playCode4
Follow-Your-Click: Open-domain Regional Image Animation via Short PromptsCode4
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNNCode4
SkyReels-A2: Compose Anything in Video Diffusion TransformersCode4
Croissant: A Metadata Format for ML-Ready DatasetsCode4
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement LearningCode4
LLMMapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long ResourcesCode4
KISS-Matcher: Fast and Robust Point Cloud Registration RevisitedCode4
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal ControlCode4
Prototypical Verbalizer for Prompt-based Few-shot TuningCode4
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual ReasoningCode4
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual SynthesisCode4
Autoregressive Video Generation without Vector QuantizationCode4
Best-of-N JailbreakingCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Continual Learning of Large Language Models: A Comprehensive SurveyCode4
KTO: Model Alignment as Prospect Theoretic OptimizationCode4
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image RetrievalCode4
Text2SQL is Not Enough: Unifying AI and Databases with TAGCode4
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected LossCode4
Convolutional Differentiable Logic Gate NetworksCode4
Billion-scale similarity search with GPUsCode4
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained TransformersCode4
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock LevelCode4
Show:102550
← PrevPage 43 of 13200Next →