SOTAVerified

Benchmarking

Papers

Showing 16011625 of 5548 papers

TitleStatusHype
HazeSpace2M: A Dataset for Haze Aware Single Image DehazingCode1
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices0
SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking0
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual LocalizationCode2
Ducho meets Elliot: Large-scale Benchmarks for Multimodal RecommendationCode0
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted DataCode0
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting FrameworkCode0
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling0
HLB: Benchmarking LLMs' Humanlikeness in Language Use0
Small Language Models: Survey, Measurements, and InsightsCode2
Building a continuous benchmarking ecosystem in bioinformatics0
Benchmarking Edge AI Platforms for High-Performance ML Inference0
Boosting Healthcare LLMs Through Retrieved ContextCode1
Towards Ground-truth-free Evaluation of Any Segmentation in Medical ImagesCode0
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment BenchmarkingCode0
RMCBench: Benchmarking Large Language Models' Resistance to Malicious CodeCode1
AlphaZip: Neural Network-Enhanced Lossless Text CompressionCode0
Margin-bounded Confidence Scores for Out-of-Distribution DetectionCode0
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests0
A Survey on Multimodal Benchmarks: In the Era of Large AI ModelsCode2
CONGRA: Benchmarking Automatic Conflict ResolutionCode0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Show:102550
← PrevPage 65 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified