Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3801–3850 of 5548 papers

Title	Date	Tasks	Status
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration	Jun 9, 2023	BenchmarkingTime Series	—Unverified
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework	Jun 8, 2023	Benchmarking	CodeCode Available
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs	Jun 8, 2023	BenchmarkingFederated Learning	CodeCode Available
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems	Jun 8, 2023	BenchmarkingDescriptive	CodeCode Available
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems	Jun 8, 2023	BenchmarkingEdge-computing	—Unverified
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available
RD-Suite: A Benchmark for Ranking Distillation	Jun 7, 2023	Benchmarking	—Unverified
Self-Adjusting Weighted Expected Improvement for Bayesian Optimization	Jun 7, 2023	Bayesian OptimizationBenchmarking	CodeCode Available
Benchmarking Foundation Models with Language-Model-as-an-Examiner	Jun 7, 2023	BenchmarkingLanguage Modeling	—Unverified
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection	Jun 7, 2023	AttributeAutonomous Driving	—Unverified
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation	Jun 7, 2023	BenchmarkingClassification	—Unverified
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities	Jun 6, 2023	BenchmarkingDepth Completion	—Unverified
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models	Jun 6, 2023	BenchmarkingEthics	—Unverified
Explainable AI using expressive Boolean formulas	Jun 6, 2023	BenchmarkingExplainable Artificial Intelligence (XAI)	—Unverified
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging	Jun 6, 2023	BenchmarkingSentence	—Unverified
Benchmarking Middle-Trained Language Models for Neural Search	Jun 5, 2023	BenchmarkingLanguage Modeling	—Unverified
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition	Jun 5, 2023	Arabic Speech RecognitionBenchmarking	—Unverified
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning	Jun 4, 2023	BenchmarkingContrastive Learning	—Unverified
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection	Jun 4, 2023	BenchmarkingFace Detection	—Unverified
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models	Jun 3, 2023	Benchmarking	—Unverified
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation	Jun 3, 2023	Benchmarking	—Unverified
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization	Jun 2, 2023	Benchmarkingimage-classification	—Unverified
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study	Jun 1, 2023	ArticlesBenchmarking	—Unverified
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI	Jun 1, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment	Jun 1, 2023	BenchmarkingHate Speech Detection	CodeCode Available
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?	Jun 1, 2023	BenchmarkingDecoder	CodeCode Available
HySpecNet-11k: A Large-Scale Hyperspectral Dataset for Benchmarking Learning-Based Hyperspectral Image Compression Methods	Jun 1, 2023	BenchmarkingHyperspectral image analysis	—Unverified
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects	Jun 1, 2023	BenchmarkingObject	—Unverified
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces	May 31, 2023	BenchmarkingRecommendation Systems	CodeCode Available
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning	May 30, 2023	BenchmarkingIn-Context Learning	CodeCode Available
ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden States	May 30, 2023	BenchmarkingData Augmentation	CodeCode Available
Design and implementation of intelligent packet filtering in IoT microcontroller-based devices	May 30, 2023	Benchmarking	CodeCode Available
Large-scale Ridesharing DARP Instances Based on Real Travel Demand	May 30, 2023	Benchmarking	CodeCode Available
Human Body Shape Classification Based on a Single Image	May 29, 2023	BenchmarkingClassification	—Unverified
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available
Exploring the Practicality of Generative Retrieval on Dynamic Corpora	May 27, 2023	BenchmarkingInformation Retrieval	—Unverified
BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring	May 27, 2023	BenchmarkingDeblurring	CodeCode Available
Benchmarking Diverse-Modal Entity Linking with Generative Models	May 27, 2023	BenchmarkingDecoder	—Unverified
Learning from Integral Losses in Physics Informed Neural Networks	May 27, 2023	Benchmarking	CodeCode Available
Benchmarking state-of-the-art gradient boosting algorithms for classification	May 26, 2023	Bayesian OptimizationBenchmarking	—Unverified
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset	May 25, 2023	BenchmarkingText to SQL	CodeCode Available
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts	May 25, 2023	Benchmarkingobject-detection	CodeCode Available
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite	May 24, 2023	Benchmarking	—Unverified
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer	May 24, 2023	BenchmarkingCross-Lingual Transfer	—Unverified
LAraBench: Benchmarking Arabic AI with Large Language Models	May 24, 2023	BenchmarkingFew-Shot Learning	—Unverified
Barkour: Benchmarking Animal-level Agility with Quadruped Robots	May 24, 2023	BenchmarkingNavigate	—Unverified
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests	May 23, 2023	BenchmarkingLanguage Modeling	—Unverified
When the Music Stops: Tip-of-the-Tongue Retrieval for Music	May 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available

Show:10 25 50

← PrevPage 77 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified