SOTAVerified

Benchmarking

Papers

Showing 576600 of 5548 papers

TitleStatusHype
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRTCode1
DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender SystemsCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Analog or Digital In-memory Computing? Benchmarking through Quantitative ModelingCode1
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksCode1
CriticBench: Benchmarking LLMs for Critique-Correct ReasoningCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image CaptioningCode1
Anabranch Network for Camouflaged Object SegmentationCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of CancerCode1
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle PerceptionCode1
Deluca -- A Differentiable Control Library: Environments, Methods, and BenchmarkingCode1
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?Code1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
A multi-schematic classifier-independent oversampling approach for imbalanced datasetsCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERTCode1
AdaPool: Exponential Adaptive Pooling for Information-Retaining DownsamplingCode1
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation ModelsCode1
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object DetectionCode1
Show:102550
← PrevPage 24 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified