SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2105121100 of 474278 papers

TitleStatusHype
M^2PT: Multimodal Prompt Tuning for Zero-shot Instruction LearningCode1
CAD: Memory Efficient Convolutional Adapter for Segment AnythingCode1
CloudTrack: Scalable UAV Tracking with Cloud SemanticsCode1
Neuromorphic Drone Detection: an Event-RGB Multimodal ApproachCode1
Looped Transformers for Length GeneralizationCode1
Exploring Hint Generation Approaches in Open-Domain Question AnsweringCode1
AIM 2024 Challenge on UHD Blind Photo Quality AssessmentCode1
ComiCap: A VLMs pipeline for dense captioning of Comic PanelsCode1
LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose EstimationCode1
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMsCode1
CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam DataCode1
Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic CountingCode1
CDChat: A Large Multimodal Model for Remote Sensing Change DescriptionCode1
VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly DetectionCode1
VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus ImagesCode1
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language ModelsCode1
From Commands to Prompts: LLM-based Semantic File System for AIOSCode1
FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye CameraCode1
DecoupleNet: A Lightweight Backbone Network With Efficient Feature Decoupling for Remote Sensing Visual TasksCode1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation modelsCode1
GroCo: Ground Constraint for Metric Self-Supervised Monocular DepthCode1
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera SynthesisCode1
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration MethodCode1
RAMBO: Enhancing RAG-based Repository-Level Method Body CompletionCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
A new baseline for edge detection: Make Encoder-Decoder great againCode1
Neural Differential Appearance EquationsCode1
MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI AcceleratorCode1
For Overall Nighttime Visibility: Integrate Irregular Glow Removal With Glow-Aware EnhancementCode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology ImagesCode1
SpaGBOL: Spatial-Graph-Based Orientated LocalisationCode1
Matérn Kernels for Tunable Implicit Surface ReconstructionCode1
FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large ScaleCode1
The BRAVO Semantic Segmentation Challenge Results in UNCV2024Code1
VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image ModelsCode1
ControlEdit: A MultiModal Local Clothing Image Editing MethodCode1
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme ClassificationCode1
RMCBench: Benchmarking Large Language Models' Resistance to Malicious CodeCode1
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror ReflectionsCode1
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMsCode1
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and FeedbackCode1
Steward: Natural Language Web AutomationCode1
CUTE: Measuring LLMs' Understanding of Their TokensCode1
UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized FrameworkCode1
AIM 2024 Challenge on Video Saliency Prediction: Methods and ResultsCode1
TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node FeaturesCode1
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through EditsCode1
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation EvaluatorsCode1
Show:102550
← PrevPage 422 of 9486Next →