Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
Deep Learning-Based Synchronization for Uplink NB-IoT	May 22, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1	5
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs	Mar 10, 2020	BenchmarkingEntity Alignment	CodeCode Available	1	5
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models	May 19, 2025	BenchmarkingChatbot	CodeCode Available	1	5
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
A Dataset for Answering Time-Sensitive Questions	Aug 13, 2021	Benchmarking	CodeCode Available	1	5
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing	Apr 27, 2021	BenchmarkingRetrieval	CodeCode Available	1	5
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1	5
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets	Apr 11, 2022	Action Triplet RecognitionBenchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions	Jan 1, 2024	BenchmarkingInstruction Following	CodeCode Available	1	5
DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation	Oct 11, 2022	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available	1	5
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory	Jul 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4	Mar 20, 2023	BenchmarkingDe-identification	CodeCode Available	1	5
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation	Feb 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Data-Driven Denoising of Stationary Accelerometer Signals	Jun 13, 2022	BenchmarkingDenoising	CodeCode Available	1	5
D2S: Document-to-Slide Generation Via Query-Based Text Summarization	May 8, 2021	BenchmarkingLong Form Question Answering	CodeCode Available	1	5
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Jan 2, 2025	BenchmarkingComputer Security	CodeCode Available	1	5
DACBench: A Benchmark Library for Dynamic Algorithm Configuration	May 18, 2021	Benchmarking	CodeCode Available	1	5
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data	Apr 16, 2021	BenchmarkingCausal Discovery	CodeCode Available	1	5
Benchmarking Image Retrieval for Visual Localization	Nov 24, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM	Mar 28, 2024	Benchmarking	CodeCode Available	1	5
Curious Hierarchical Actor-Critic Reinforcement Learning	May 7, 2020	BenchmarkingHierarchical Reinforcement Learning	CodeCode Available	1	5
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT	Jun 13, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1	5
DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender Systems	Oct 30, 2024	BenchmarkingManagement	CodeCode Available	1	5
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation	Oct 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1	5
Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling	May 23, 2024	Benchmarking	CodeCode Available	1	5
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks	Oct 23, 2023	Benchmarking	CodeCode Available	1	5
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning	Feb 22, 2024	Benchmarking	CodeCode Available	1	5
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)	Nov 19, 2022	BenchmarkingC++ code	CodeCode Available	1	5
Benchmarking Graph Neural Networks on Dynamic Link Prediction	Sep 29, 2021	BenchmarkingDynamic Link Prediction	CodeCode Available	1	5
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Dec 11, 2024	AttributeBenchmarking	CodeCode Available	1	5
Anabranch Network for Camouflaged Object Segmentation	May 20, 2021	BenchmarkingCamouflaged Object Segmentation	CodeCode Available	1	5
Benchmarking Graph Neural Networks for FMRI analysis	Nov 16, 2022	Benchmarking	CodeCode Available	1	5
Benchmarking Language Models for Code Syntax Understanding	Oct 26, 2022	Benchmarking	CodeCode Available	1	5
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer	Dec 2, 2021	BenchmarkingOrdinal Classification	CodeCode Available	1	5
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception	Jan 24, 2024	Benchmarking	CodeCode Available	1	5
Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking	Feb 19, 2021	BenchmarkingOpenAI Gym	CodeCode Available	1	5
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?	Apr 29, 2024	Answer GenerationBenchmarking	CodeCode Available	1	5
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?	Aug 14, 2023	BenchmarkingDrug Design	CodeCode Available	1	5
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1	5
A multi-schematic classifier-independent oversampling approach for imbalanced datasets	Jul 15, 2021	Benchmarking	CodeCode Available	1	5
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1	5
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks	Feb 4, 2023	Adversarial AttackAdversarial Robustness	CodeCode Available	1	5
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT	Jul 9, 2021	BenchmarkingDocument Classification	CodeCode Available	1	5
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling	Nov 1, 2021	Benchmarkingobject-detection	CodeCode Available	1	5
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models	Aug 2, 2022	BenchmarkingSynthetic Data Generation	CodeCode Available	1	5
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection	May 16, 2025	Benchmarkingobject-detection	CodeCode Available	1	5

Show:10 25 50

← PrevPage 12 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified