Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1251–1275 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Automatic Detection of Generated Text is Easiest when Humans are Fooled	Nov 2, 2019	BenchmarkingLanguage Modelling	CodeCode Available	1	5
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking the Generation of Fact Checking Explanations	Aug 29, 2023	Abstractive Text SummarizationArticles	CodeCode Available	1	5
Benchmarking emergency department triage prediction models with machine learning and large public electronic health records	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
How to Train Neural Field Representations: A Comprehensive Study and Benchmark	Dec 16, 2023	Benchmarking	CodeCode Available	1	5
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1	5
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1	5
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1	5
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1	5
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Dec 12, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation	Feb 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1	5
Benchmarking Quantized Neural Networks on FPGAs with FINN	Feb 2, 2021	BenchmarkingQuantization	CodeCode Available	1	5
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1	5
HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media	Oct 14, 2021	3D Pose EstimationBenchmarking	CodeCode Available	1	5
How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models	Aug 29, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters	Jul 5, 2024	Benchmarkingvalid	CodeCode Available	1	5
A framework for benchmarking clustering algorithms	Sep 20, 2022	BenchmarkingClustering	CodeCode Available	1	5
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations	Sep 28, 2021	BenchmarkingDialogue State Tracking	CodeCode Available	1	5
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency	Jun 14, 2024	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 51 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified