SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1040110450 of 661570 papers

TitleStatusHype
Sparse Crosscoders for diffing MoEs and Dense models0
MoE Lens -- An Expert Is All You Need0
EventGeM: Global-to-Local Feature Matching for Event-Based Visual Place Recognition0
Margin and Consistency Supervision for Calibrated and Robust Vision Models0
HART: Data-Driven Hallucination Attribution and Evidence-Based Tracing for Large Language Models0
Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics0
Architectural Unification for Polarimetric Imaging Across Multiple Degradations0
Evaluating LLM Alignment With Human Trust Models0
Remote Sensing Image Classification Using Deep Ensemble Learning0
Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation0
Stochastic Event Prediction via Temporal Motif Transitions0
CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis0
InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation0
Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification0
Beyond Geometry: Artistic Disparity Synthesis for Immersive 2D-to-3D0
Pano3DComposer: Feed-Forward Compositional 3D Scene Generation from Single Panoramic Image0
InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning0
The World Won't Stay Still: Programmable Evolution for Agent Benchmarks0
CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning0
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality0
Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis0
Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes0
Addressing the Ecological Fallacy in Larger LMs with Human Context0
Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs0
RouteGoT: Node-Adaptive Routing for Cost-Efficient Graph of Thoughts Reasoning0
Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery0
Improving the accuracy of physics-informed neural networks via last-layer retraining0
Training-free Latent Inter-Frame Pruning with Attention Recovery0
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement LearningCode0
Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh0
Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions0
PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models0
Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation0
Shifting Adaptation from Weight Space to Memory Space: A Memory-Augmented Agent for Medical Image Segmentation0
MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer0
Design Experiments to Compare Multi-armed Bandit Algorithms0
RAC: Rectified Flow Auto Coder0
VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction0
VerChol -- Grammar-First Tokenization for Agglutinative Languages0
Learning Next Action Predictors from Human-Computer Interaction0
Laser interferometry as a robust neuromorphic platform for machine learning0
Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio CodecCode0
C^2Prompt: Class-aware Client Knowledge Interaction for Federated Continual LearningCode0
Cross-Scale Pansharpening via ScaleFormer and the PanScale BenchmarkCode0
CollabOD: Collaborative Multi-Backbone with Cross-scale Vision for UAV Small Object DetectionCode0
BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response DeviationCode0
Weak-SIGReg: Covariance Regularization for Stable Deep LearningCode0
mlx-vis: GPU-Accelerated Dimensionality Reduction and Visualization on Apple SiliconCode0
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought PerturbationsCode0
Token Bottleneck: One Token to Remember DynamicsCode0
Show:102550
← PrevPage 209 of 13232Next →