Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4226–4250 of 5548 papers

Title	Date	Tasks	Status
CodeS: Towards Code Model Generalization Under Distribution Shift	Jun 11, 2022	BenchmarkingCode Classification	CodeCode Available
SAIBench: Benchmarking AI for Science	Jun 11, 2022	BenchmarkingFriction	—Unverified
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization	Jun 8, 2022	BenchmarkingFederated Learning	—Unverified
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified
Scaling laws in global corporations as a benchmarking approach to assess environmental performance	Jun 7, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified
MorisienMT: A Dataset for Mauritian Creole Machine Translation	Jun 6, 2022	BenchmarkingMachine Translation	—Unverified
Which models are innately best at uncertainty estimation?	Jun 5, 2022	BenchmarkingOut-of-Distribution Detection	—Unverified
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates	Jun 2, 2022	Benchmarking	CodeCode Available
Evaluation of Three Welsh Language POS Taggers	Jun 1, 2022	BenchmarkingPOS	—Unverified
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified
Deep One-Class Hate Speech Detection Model	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French	Jun 1, 2022	BenchmarkingLow Resource Neural Machine Translation	—Unverified
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction	Jun 1, 2022	16kBenchmarking	—Unverified
MTLens: Machine Translation Output Debugging	Jun 1, 2022	BenchmarkingMachine Translation	—Unverified
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems	May 31, 2022	Benchmarking	—Unverified
NEWTS: A Corpus for News Topic-Focused Summarization	May 31, 2022	BenchmarkingText Summarization	—Unverified
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition	May 30, 2022	Benchmarking	CodeCode Available
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite	May 30, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Benchmarking Unsupervised Anomaly Detection and Localization	May 30, 2022	Anomaly DetectionBenchmarking	—Unverified
A Framework for Generating Informative Benchmark Instances	May 29, 2022	Benchmarking	CodeCode Available
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset Generation	May 27, 2022	BenchmarkingDataset Generation	CodeCode Available
Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder	May 26, 2022	BenchmarkingDeep Learning	—Unverified
Large Language Models are Few-Shot Clinical Information Extractors	May 25, 2022	Benchmarkingcoreference-resolution	—Unverified

Show:10 25 50

← PrevPage 170 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified