SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1330113350 of 474278 papers

TitleStatusHype
DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues0
ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought ReasoningCode0
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on HarmfulnessCode0
PanTS: The Pancreatic Tumor Segmentation Dataset0
DIY-MKG: An LLM-Based Polyglot Language Learning System0
Evaluating Robustness of Monocular Depth Estimation with Procedural Scene PerturbationsCode0
TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning AgentsCode0
Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier SystemsCode0
MassTool: A Multi-Task Search-Based Tool Retrieval Framework for Large Language ModelsCode0
CaptionSmiths: Flexibly Controlling Language Pattern in Image CaptioningCode0
HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing SupervisionCode0
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile DevicesCode0
CI-VID: A Coherent Interleaved Text-Video DatasetCode0
MARVIS: Modality Adaptive Reasoning over VISualizationsCode0
Hierarchical Patch Compression for ColPali: Efficient Multi-Vector Document Retrieval with Dynamic Pruning and QuantizationCode0
Classification based deep learning models for lung cancer and disease using medical imagesCode0
Structure and Smoothness Constrained Dual Networks for MR Bias Field CorrectionCode0
Medical-Knowledge Driven Multiple Instance Learning for Classifying Severe Abdominal Anomalies on Prenatal UltrasoundCode0
Active Control Points-based 6DoF Pose Tracking for Industrial Metal ObjectsCode0
Is External Information Useful for Stance Detection with LLMs?Code0
Depth Anything at Any ConditionCode0
AirV2X: Unified Air-Ground Vehicle-to-Everything CollaborationCode0
Non-exchangeable Conformal Prediction for Temporal Graph Neural NetworksCode0
CLUES: Collaborative High-Quality Data Selection for LLMs via Training DynamicsCode0
OpenTable-R1: A Reinforcement Learning Augmented Tool Agent for Open-Domain Table Question AnsweringCode0
Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions0
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs0
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis0
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation0
Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training0
DARTS: A Dual-View Attack Framework for Targeted Manipulation in Federated Sequential Recommendation0
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks0
A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning0
Advancing Magnetic Materials Discovery -- A structure-based machine learning approach for magnetic ordering and magnetic moment prediction0
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges0
Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp SegmentationCode0
Crop Pest Classification Using Deep Learning Techniques: A Review0
First Steps Towards Voice Anonymization for Code-Switching Speech0
SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation0
NOCTIS: Novel Object Cyclic Threshold based Instance SegmentationCode0
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather0
Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning0
Autoadaptive Medical Segment Anything ModelCode0
Following the Clues: Experiments on Person Re-ID using Cross-Modal IntelligenceCode0
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback SynergyCode1
The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems0
3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage GenerationCode0
Energy-Based Transformers are Scalable Learners and ThinkersVerified5
Latent Chain-of-Thought? Decoding the Depth-Recurrent TransformerCode1
LLM-based Realistic Safety-Critical Driving Video Generation0
Show:102550
← PrevPage 267 of 9486Next →