SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 62516300 of 661570 papers

TitleStatusHype
On the Feasibility of Using LLMs to Autonomously Execute Multi-host Network AttacksCode2
TopoNets: High Performing Vision and Language Models with Brain-Like TopographyCode2
MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language PretrainingCode2
LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring SystemCode2
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware SparsityCode2
LUCY: Linguistic Understanding and Control Yielding Early Stage of HerCode2
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-ResolutionCode2
Universal Image Restoration Pre-training via Degradation ClassificationCode2
Baichuan-Omni-1.5 Technical ReportCode2
TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video UnderstandingCode2
iFormer: Integrating ConvNet and Transformer for Mobile ApplicationCode2
GaussianToken: An Effective Image Tokenizer with 2D Gaussian SplattingCode2
Visual Generation Without GuidanceCode2
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement LearningCode2
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language ModelsCode2
Uni-Sign: Toward Unified Sign Language Understanding at ScaleCode2
VideoShield: Regulating Diffusion-based Video Generation Models via WatermarkingCode2
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image UnderstandingCode2
STAMP: Scalable Task And Model-agnostic Collaborative PerceptionCode2
Deeply Optimizing the SAT Solver for the IC3 AlgorithmCode2
Advancing MRI Reconstruction: A Systematic Review of Deep Learning and Compressed Sensing IntegrationCode2
Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge GraphCode2
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy VideoCode2
Bayesian Neural Networks for One-to-Many Mapping in Image EnhancementCode2
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution FittingCode2
An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep NetworksCode2
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the WildCode2
Spurious Forgetting in Continual Learning of Language ModelsCode2
PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object DetectionCode2
YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-IDCode2
Parameter-Efficient Fine-Tuning for Foundation ModelsCode2
Querying Databases with Function CallingCode2
GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian SplattingCode2
Tensor-Var: Variational Data Assimilation in Tensor Product Feature SpaceCode2
GeoPixel: Pixel Grounding Large Multimodal Model in Remote SensingCode2
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced KnowledgeCode2
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual FeedbackCode2
TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series ForecastingCode2
A Survey on Multimodal Recommender Systems: Recent Advances and Future DirectionsCode2
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian SplattingCode2
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning PruningCode2
Towards Robust Multi-tab Website FingerprintingCode2
Distillation Quantification for Large Language ModelsCode2
MedS^3: Towards Medical Small Language Models with Self-Evolved Slow ThinkingCode2
MMVU: Measuring Expert-Level Multi-Discipline Video UnderstandingCode2
Supervised Learning for Analog and RF Circuit Design: Benchmarks and Comparative InsightsCode2
EmbodiedEval: Evaluate Multimodal LLMs as Embodied AgentsCode2
Automating High Quality RT Planning at ScaleCode2
Exploring Temporally-Aware Features for Point TrackingCode2
Episodic Memories Generation and Evaluation Benchmark for Large Language ModelsCode2
Show:102550
← PrevPage 126 of 13232Next →