Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3826–3850 of 5548 papers

Title	Date	Tasks	Status	Hype
EmProx: Neural Network Performance Estimation For Neural Architecture Search	Jun 13, 2022	BenchmarkingDecoder	CodeCode Available	0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents	Jun 13, 2022	Benchmarking	—Unverified	0
Data-Driven Denoising of Stationary Accelerometer Signals	Jun 13, 2022	BenchmarkingDenoising	CodeCode Available	1
CodeS: Towards Code Model Generalization Under Distribution Shift	Jun 11, 2022	BenchmarkingCode Classification	CodeCode Available	0
SAIBench: Benchmarking AI for Science	Jun 11, 2022	BenchmarkingFriction	—Unverified	0
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations	Jun 9, 2022	Benchmarkingcontinuous-control	CodeCode Available	2
SwinCheX: Multi-label classification on chest X-ray images with transformers	Jun 9, 2022	BenchmarkingMulti-Label Classification	CodeCode Available	1
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified	0
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark	Jun 8, 2022	BenchmarkingExplainable Artificial Intelligence (XAI)	CodeCode Available	1
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified	0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization	Jun 8, 2022	BenchmarkingFederated Learning	—Unverified	0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance	Jun 7, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified	0
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering	Jun 6, 2022	BenchmarkingClustering	CodeCode Available	1
MorisienMT: A Dataset for Mauritian Creole Machine Translation	Jun 6, 2022	BenchmarkingMachine Translation	—Unverified	0
Which models are innately best at uncertainty estimation?	Jun 5, 2022	BenchmarkingOut-of-Distribution Detection	—Unverified	0
Revisiting the "Video" in Video-Language Understanding	Jun 3, 2022	BenchmarkingQuestion Answering	CodeCode Available	1
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates	Jun 2, 2022	Benchmarking	CodeCode Available	0
Evaluation of Three Welsh Language POS Taggers	Jun 1, 2022	BenchmarkingPOS	—Unverified	0
Deep One-Class Hate Speech Detection Model	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified	0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction	Jun 1, 2022	16kBenchmarking	—Unverified	0
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified	0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French	Jun 1, 2022	BenchmarkingLow Resource Neural Machine Translation	—Unverified	0
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain	Jun 1, 2022	BenchmarkingEmotion Recognition	CodeCode Available	1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking	Jun 1, 2022	BenchmarkingSentence	CodeCode Available	1
MTLens: Machine Translation Output Debugging	Jun 1, 2022	BenchmarkingMachine Translation	—Unverified	0

Show:10 25 50

← PrevPage 154 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified