Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3001–3050 of 5548 papers

Title	Date	Tasks	Status	Hype
Are SNNs Truly Energy-efficient? - A Hardware Perspective	Sep 6, 2023	Benchmarking	—Unverified	0
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models	Sep 5, 2023	BenchmarkingZero-Shot Learning	—Unverified	0
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking	Sep 5, 2023	BenchmarkingKnowledge Distillation	—Unverified	0
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of Datasets	Sep 4, 2023	BenchmarkingMotor Imagery	CodeCode Available	0
Benchmarking Large Language Models in Retrieval-Augmented Generation	Sep 4, 2023	Benchmarkingcounterfactual	CodeCode Available	2
Hybrid data driven/thermal simulation model for comfort assessment	Sep 4, 2023	Benchmarking	—Unverified	0
Benchmarking Autoregressive Conditional Diffusion Models for Turbulent Flow Simulation	Sep 4, 2023	Benchmarking	CodeCode Available	1
Orientation-Independent Chinese Text Recognition in Scene Images	Sep 3, 2023	BenchmarkingImage Reconstruction	CodeCode Available	2
FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees	Sep 3, 2023	BenchmarkingInstance Segmentation	—Unverified	0
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction	Sep 3, 2023	BenchmarkingExposure Correction	—Unverified	0
NeMig -- A Bilingual News Collection and Knowledge Graph about Migration	Sep 1, 2023	ArticlesBenchmarking	CodeCode Available	0
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning	Sep 1, 2023	BenchmarkingFederated Learning	—Unverified	0
Can humans help BERT gain "confidence"?	Aug 31, 2023	BenchmarkingEEG	—Unverified	0
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering	Aug 31, 2023	BenchmarkingDataset Generation	CodeCode Available	1
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO	Aug 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified	0
Benchmarking Multilabel Topic Classification in the Kyrgyz Language	Aug 30, 2023	BenchmarkingClassification	CodeCode Available	0
Benchmarking the Generation of Fact Checking Explanations	Aug 29, 2023	Abstractive Text SummarizationArticles	CodeCode Available	1
Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadata	Aug 29, 2023	BenchmarkingDiagnostic	CodeCode Available	1
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions	Aug 28, 2023	BenchmarkingFormation Energy	CodeCode Available	3
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads	Aug 28, 2023	BenchmarkingSelf-Supervised Learning	—Unverified	0
MLLM-DataEngine: An Iterative Refinement Approach for MLLM	Aug 25, 2023	Benchmarking	CodeCode Available	1
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models	Aug 24, 2023	Action LocalizationBenchmarking	—Unverified	0
Beyond Document Page Classification: Design, Datasets, and Challenges	Aug 24, 2023	BenchmarkingClassification	CodeCode Available	0
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations	Aug 23, 2023	BenchmarkingDecoder	CodeCode Available	2
Benchmarking Causal Study to Interpret Large Language Models for Source Code	Aug 23, 2023	BenchmarkingCausal Inference	—Unverified	0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0	Aug 23, 2023	Benchmarkingregression	CodeCode Available	0
LLMRec: Benchmarking Large Language Models on Recommendation Task	Aug 23, 2023	BenchmarkingExplanation Generation	CodeCode Available	1
Efficient Benchmarking of Language Models	Aug 22, 2023	BenchmarkingGPU	—Unverified	0
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection	Aug 22, 2023	BenchmarkingOut-of-Distribution Detection	CodeCode Available	0
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process	Aug 22, 2023	BenchmarkingDomain Adaptation	CodeCode Available	0
Beyond MD17: the reactive xxMD dataset	Aug 22, 2023	BenchmarkingComputational chemistry	CodeCode Available	0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models	Aug 21, 2023	Adversarial RobustnessBenchmarking	—Unverified	0
UGSL: A Unified Framework for Benchmarking Graph Structure Learning	Aug 21, 2023	BenchmarkingGraph structure learning	—Unverified	0
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations	Aug 19, 2023	6D Pose Estimation using RGBBenchmarking	CodeCode Available	1
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks	Aug 17, 2023	BenchmarkingEEG	CodeCode Available	0
Benchmarking Neural Network Generalization for Grammar Induction	Aug 16, 2023	Benchmarking	CodeCode Available	1
Benchmarking Adversarial Robustness of Compressed Deep Learning Models	Aug 16, 2023	Adversarial RobustnessBenchmarking	—Unverified	0
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available	0
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions	Aug 15, 2023	BenchmarkingComputational Efficiency	—Unverified	0
A Survey on Model Compression for Large Language Models	Aug 15, 2023	BenchmarkingKnowledge Distillation	—Unverified	0
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ Segmentation	Aug 15, 2023	BenchmarkingMedical Image Analysis	CodeCode Available	0
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?	Aug 14, 2023	BenchmarkingDrug Design	CodeCode Available	1
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents	Aug 11, 2023	BenchmarkingDecision Making	CodeCode Available	2
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields	Aug 11, 2023	Benchmarking	—Unverified	0
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Aug 11, 2023	BenchmarkingDiversity	CodeCode Available	1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization	Aug 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations	Aug 10, 2023	BenchmarkingClassification	—Unverified	0
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation	Aug 10, 2023	AttributeBenchmarking	—Unverified	0
Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints	Aug 9, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 61 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified