Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4201–4250 of 5548 papers

Title	Date	Tasks	Status
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking	Jul 1, 2022	BenchmarkingNatural Language Understanding	—Unverified
HATE-ITA: New Baselines for Hate Speech Detection in Italian	Jul 1, 2022	BenchmarkingHate Speech Detection	CodeCode Available
Benchmarking Intersectional Biases in NLP	Jul 1, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features	Jul 1, 2022	BenchmarkingSentence	—Unverified
Local manifold learning and its link to domain-based physics knowledge	Jul 1, 2022	BenchmarkingDimensionality Reduction	CodeCode Available
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations	Jul 1, 2022	BenchmarkingCombinatorial Optimization	—Unverified
Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms	Jul 1, 2022	BenchmarkingClassification	CodeCode Available
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding	Jul 1, 2022	Benchmarking	—Unverified
Computer-aided diagnosis and prediction in brain disorders	Jun 29, 2022	BenchmarkingDecision Making	—Unverified
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes Equations	Jun 29, 2022	Benchmarking	CodeCode Available
Toward an ImageNet Library of Functions for Global Optimization Benchmarking	Jun 27, 2022	Benchmarkingglobal-optimization	—Unverified
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in Omniverse	Jun 23, 2022	BenchmarkingIndoor Scene Synthesis	CodeCode Available
Beyond Uniform Lipschitz Condition in Differentially Private Optimization	Jun 21, 2022	Benchmarkingregression	—Unverified
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs	Jun 21, 2022	Anomaly DetectionBenchmarking	CodeCode Available
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets	Jun 20, 2022	BenchmarkingFraud Detection	CodeCode Available
Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking	Jun 18, 2022	Benchmarkingimage-classification	—Unverified
Motley: Benchmarking Heterogeneity and Personalization in Federated Learning	Jun 18, 2022	BenchmarkingFairness	CodeCode Available
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration	Jun 17, 2022	BenchmarkingDepth Estimation	—Unverified
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning	Jun 16, 2022	BenchmarkingClustering	CodeCode Available
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case	Jun 16, 2022	BenchmarkingDensity Estimation	—Unverified
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks	Jun 16, 2022	BenchmarkingDynamic neural networks	CodeCode Available
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability	Jun 16, 2022	BenchmarkingFeature Importance	—Unverified
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models	Jun 16, 2022	BenchmarkingLanguage Modeling	—Unverified
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents	Jun 13, 2022	Benchmarking	—Unverified
EmProx: Neural Network Performance Estimation For Neural Architecture Search	Jun 13, 2022	BenchmarkingDecoder	CodeCode Available
CodeS: Towards Code Model Generalization Under Distribution Shift	Jun 11, 2022	BenchmarkingCode Classification	CodeCode Available
SAIBench: Benchmarking AI for Science	Jun 11, 2022	BenchmarkingFriction	—Unverified
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization	Jun 8, 2022	BenchmarkingFederated Learning	—Unverified
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified
Scaling laws in global corporations as a benchmarking approach to assess environmental performance	Jun 7, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified
MorisienMT: A Dataset for Mauritian Creole Machine Translation	Jun 6, 2022	BenchmarkingMachine Translation	—Unverified
Which models are innately best at uncertainty estimation?	Jun 5, 2022	BenchmarkingOut-of-Distribution Detection	—Unverified
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates	Jun 2, 2022	Benchmarking	CodeCode Available
Evaluation of Three Welsh Language POS Taggers	Jun 1, 2022	BenchmarkingPOS	—Unverified
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified
Deep One-Class Hate Speech Detection Model	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French	Jun 1, 2022	BenchmarkingLow Resource Neural Machine Translation	—Unverified
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction	Jun 1, 2022	16kBenchmarking	—Unverified
MTLens: Machine Translation Output Debugging	Jun 1, 2022	BenchmarkingMachine Translation	—Unverified
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems	May 31, 2022	Benchmarking	—Unverified
NEWTS: A Corpus for News Topic-Focused Summarization	May 31, 2022	BenchmarkingText Summarization	—Unverified
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition	May 30, 2022	Benchmarking	CodeCode Available
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite	May 30, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Benchmarking Unsupervised Anomaly Detection and Localization	May 30, 2022	Anomaly DetectionBenchmarking	—Unverified
A Framework for Generating Informative Benchmark Instances	May 29, 2022	Benchmarking	CodeCode Available
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset Generation	May 27, 2022	BenchmarkingDataset Generation	CodeCode Available
Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder	May 26, 2022	BenchmarkingDeep Learning	—Unverified
Large Language Models are Few-Shot Clinical Information Extractors	May 25, 2022	Benchmarkingcoreference-resolution	—Unverified

Show:10 25 50

← PrevPage 85 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified