SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 81518200 of 661570 papers

TitleStatusHype
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language ModelsCode2
LumberChunker: Long-Form Narrative Document SegmentationCode2
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech DetectionCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
MG-LLaVA: Towards Multi-Granularity Visual Instruction TuningCode2
European Space Agency Benchmark for Anomaly Detection in Satellite TelemetryCode2
Revitalizing Convolutional Network for Image RestorationCode2
Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient FlowsCode2
SUM: Saliency Unification through Mamba for Visual Attention ModelingCode2
Q-DiT: Accurate Post-Training Quantization for Diffusion TransformersCode2
Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement LearningCode2
The Balanced-Pairwise-Affinities Feature TransformCode2
Dual-Space Knowledge Distillation for Large Language ModelsCode2
FedBiOT: LLM Local Fine-tuning in Federated Learning without Full ModelCode2
Disentangled Motion Modeling for Video Frame InterpolationCode2
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIPCode2
DiffusionPDE: Generative PDE-Solving Under Partial ObservationCode2
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QACode2
Finding Transformer Circuits with Edge PruningCode2
One Thousand and One Pairs: A "novel" challenge for long-context language modelsCode2
Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?Code2
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-ConquerCode2
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text EnvironmentsCode2
FaceScore: Benchmarking and Enhancing Face Quality in Human GenerationCode2
From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness BenchmarkingCode2
Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character CustomizationCode2
Alpha^2: Discovering Logical Formulaic Alphas using Deep Reinforcement LearningCode2
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?Code2
GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New InsightsCode2
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationCode2
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion ModelsCode2
SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point CloudCode2
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal ModelsCode2
CausalFormer: An Interpretable Transformer for Temporal Causal DiscoveryCode2
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view RepresentationCode2
Towards Open Respiratory Acoustic Foundation Models: Pretraining and BenchmarkingCode2
LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene ReconstructionCode2
Efficient Evolutionary Search Over Chemical Space with Large Language ModelsCode2
PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point CloudCode2
Soft Masked Mamba Diffusion Model for CT to MRI ConversionCode2
Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next LevelCode2
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMsCode2
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and VotingCode2
What Matters in Transformers? Not All Attention is NeededCode2
RouteFinder: Towards Foundation Models for Vehicle Routing ProblemsCode2
DExter: Learning and Controlling Performance Expression with Diffusion ModelsCode2
SelfReg-UNet: Self-Regularized UNet for Medical Image SegmentationCode2
MoA: Mixture of Sparse Attention for Automatic Large Language Model CompressionCode2
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language ModelsCode2
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian GenerationCode2
Show:102550
← PrevPage 164 of 13232Next →