SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 69016950 of 661570 papers

TitleStatusHype
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-AnsweringCode2
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language ModelsCode2
DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditionsCode2
LLM-PySC2: Starcraft II learning environment for Large Language ModelsCode2
AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual AlignmentCode2
Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information RetrievalCode2
Scaling Laws for PrecisionCode2
Dialectal Coverage And Generalization in Arabic Speech RecognitionCode2
Improved Multi-Task Brain Tumour Segmentation with Synthetic Data AugmentationCode2
PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-trainingCode2
Brain Tumour Removing and Missing Modality Generation using 3D WDMCode2
Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and AlternativesCode2
HourVideo: 1-Hour Video-Language UnderstandingCode2
AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-MakingCode2
Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View SynthesisCode2
3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object RearrangementCode2
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-RewardingCode2
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?Code2
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video UnderstandingCode2
VQA^2: Visual Question Answering for Video Quality AssessmentCode2
GIS Copilot: Towards an Autonomous GIS Agent for Spatial AnalysisCode2
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language ModelsCode2
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive PrototypingCode2
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference OptimizationCode2
Learning General-Purpose Biomedical Volume Representations using Randomized SynthesisCode2
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic EnvironmentsCode2
Attacking Vision-Language Computer Agents via Pop-upsCode2
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical VectorCode2
RAGViz: Diagnose and Visualize Retrieval-Augmented GenerationCode2
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
Adaptive Length Image Tokenization via Recurrent AllocationCode2
Combining Induction and Transduction for Abstract ReasoningCode2
INQUIRE: A Natural World Text-to-Image Retrieval BenchmarkCode2
Foundations and Recent Trends in Multimodal Mobile Agents: A SurveyCode2
Training on test proteins improves fitness, structure, and function predictionCode2
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation AdaptationCode2
Real-Time Polygonal Semantic Mapping for Humanoid Robot Stair ClimbingCode2
Mapping Global Floods with 10 Years of Satellite Radar DataCode2
GarmentLab: A Unified Simulation and Benchmark for Garment ManipulationCode2
Unlocking the Archives: Using Large Language Models to Transcribe Handwritten Historical DocumentsCode2
X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenariosCode2
Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-OptimizationCode2
On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDARCode2
A Survey of Financial AI: Architectures, Advances and Open ChallengesCode2
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language ModelsCode2
Communication Learning in Multi-Agent Systems from Graph Modeling PerspectiveCode2
APEBench: A Benchmark for Autoregressive Neural Emulators of PDEsCode2
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient PerspectiveCode2
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video DetectionCode2
Show:102550
← PrevPage 139 of 13232Next →