Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3801–3850 of 5548 papers

Title	Date	Tasks	Status	Hype
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in Omniverse	Jun 23, 2022	BenchmarkingIndoor Scene Synthesis	CodeCode Available	0
The ArtBench Dataset: Benchmarking Generative Models with Artworks	Jun 22, 2022	BenchmarkingConditional Image Generation	CodeCode Available	2
DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation	Jun 22, 2022	BenchmarkingRecommendation Systems	CodeCode Available	2
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code	Jun 22, 2022	BenchmarkingText Generation	CodeCode Available	1
OpenXAI: Towards a Transparent Evaluation of Model Explanations	Jun 22, 2022	BenchmarkingExplainable Artificial Intelligence (XAI)	CodeCode Available	1
Beyond Uniform Lipschitz Condition in Differentially Private Optimization	Jun 21, 2022	Benchmarkingregression	—Unverified	0
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs	Jun 21, 2022	Anomaly DetectionBenchmarking	CodeCode Available	0
Benchmarking Constraint Inference in Inverse Reinforcement Learning	Jun 20, 2022	Autonomous DrivingBenchmarking	CodeCode Available	1
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets	Jun 20, 2022	BenchmarkingFraud Detection	CodeCode Available	0
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	Jun 19, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking	Jun 18, 2022	Benchmarkingimage-classification	—Unverified	0
NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search	Jun 18, 2022	BenchmarkingGraph Neural Network	CodeCode Available	1
Motley: Benchmarking Heterogeneity and Personalization in Federated Learning	Jun 18, 2022	BenchmarkingFairness	CodeCode Available	0
SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments	Jun 17, 2022	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration	Jun 17, 2022	BenchmarkingDepth Estimation	—Unverified	0
Long Range Graph Benchmark	Jun 16, 2022	BenchmarkingGraph Classification	CodeCode Available	1
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks	Jun 16, 2022	BenchmarkingDynamic neural networks	CodeCode Available	0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case	Jun 16, 2022	BenchmarkingDensity Estimation	—Unverified	0
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability	Jun 16, 2022	BenchmarkingFeature Importance	—Unverified	0
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models	Jun 16, 2022	BenchmarkingLanguage Modeling	—Unverified	0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning	Jun 16, 2022	BenchmarkingClustering	CodeCode Available	0
Taxonomy of Benchmarks in Graph Representation Learning	Jun 15, 2022	BenchmarkingGraph Representation Learning	CodeCode Available	1
RecBole 2.0: Towards a More Up-to-Date Recommendation Library	Jun 15, 2022	BenchmarkingData Augmentation	CodeCode Available	4
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset	Jun 14, 2022	BenchmarkingIschemic Stroke Lesion Segmentation	CodeCode Available	1
Evaluating histopathology transfer learning with ChampKit	Jun 14, 2022	BenchmarkingCell Detection	CodeCode Available	1
EmProx: Neural Network Performance Estimation For Neural Architecture Search	Jun 13, 2022	BenchmarkingDecoder	CodeCode Available	0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents	Jun 13, 2022	Benchmarking	—Unverified	0
Data-Driven Denoising of Stationary Accelerometer Signals	Jun 13, 2022	BenchmarkingDenoising	CodeCode Available	1
CodeS: Towards Code Model Generalization Under Distribution Shift	Jun 11, 2022	BenchmarkingCode Classification	CodeCode Available	0
SAIBench: Benchmarking AI for Science	Jun 11, 2022	BenchmarkingFriction	—Unverified	0
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations	Jun 9, 2022	Benchmarkingcontinuous-control	CodeCode Available	2
SwinCheX: Multi-label classification on chest X-ray images with transformers	Jun 9, 2022	BenchmarkingMulti-Label Classification	CodeCode Available	1
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified	0
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark	Jun 8, 2022	BenchmarkingExplainable Artificial Intelligence (XAI)	CodeCode Available	1
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified	0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization	Jun 8, 2022	BenchmarkingFederated Learning	—Unverified	0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance	Jun 7, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified	0
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering	Jun 6, 2022	BenchmarkingClustering	CodeCode Available	1
MorisienMT: A Dataset for Mauritian Creole Machine Translation	Jun 6, 2022	BenchmarkingMachine Translation	—Unverified	0
Which models are innately best at uncertainty estimation?	Jun 5, 2022	BenchmarkingOut-of-Distribution Detection	—Unverified	0
Revisiting the "Video" in Video-Language Understanding	Jun 3, 2022	BenchmarkingQuestion Answering	CodeCode Available	1
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates	Jun 2, 2022	Benchmarking	CodeCode Available	0
Evaluation of Three Welsh Language POS Taggers	Jun 1, 2022	BenchmarkingPOS	—Unverified	0
Deep One-Class Hate Speech Detection Model	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified	0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction	Jun 1, 2022	16kBenchmarking	—Unverified	0
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified	0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French	Jun 1, 2022	BenchmarkingLow Resource Neural Machine Translation	—Unverified	0
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain	Jun 1, 2022	BenchmarkingEmotion Recognition	CodeCode Available	1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking	Jun 1, 2022	BenchmarkingSentence	CodeCode Available	1
MTLens: Machine Translation Output Debugging	Jun 1, 2022	BenchmarkingMachine Translation	—Unverified	0

Show:10 25 50

← PrevPage 77 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified