Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3651–3700 of 5548 papers

Title	Date	Tasks	Status
Towards Effective Disambiguation for Machine Translation with Large Language Models	Sep 20, 2023	BenchmarkingIn-Context Learning	—Unverified
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder	Sep 20, 2023	BenchmarkingClustering	CodeCode Available
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction	Sep 19, 2023	3D ReconstructionBenchmarking	—Unverified
Training neural mapping schemes for satellite altimetry with simulation data	Sep 19, 2023	Benchmarking	—Unverified
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design	Sep 18, 2023	Benchmarking	—Unverified
Exploration of TPUs for AI Applications	Sep 16, 2023	BenchmarkingEdge-computing	—Unverified
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool	Sep 16, 2023	BenchmarkingImage Super-Resolution	—Unverified
Anchor Points: Benchmarking Models with Much Fewer Examples	Sep 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations	Sep 14, 2023	BenchmarkingComputed Tomography (CT)	CodeCode Available
Benchmarking machine learning models for quantum state classification	Sep 14, 2023	BenchmarkingClassification	—Unverified
Leveraging Contextual Information for Effective Entity Salience Detection	Sep 14, 2023	ArticlesBenchmarking	—Unverified
So you think you can track?	Sep 13, 2023	BenchmarkingObject	—Unverified
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish	Sep 13, 2023	BenchmarkingTranslation	CodeCode Available
Unveiling the potential of large language models in generating semantic and cross-language clones	Sep 12, 2023	BenchmarkingCode Generation	—Unverified
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving	Sep 12, 2023	Autonomous DrivingBenchmarking	—Unverified
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learning	Sep 8, 2023	BenchmarkingContinual Learning	CodeCode Available
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation	Sep 7, 2023	BenchmarkingNeural Architecture Search	—Unverified
Better Practices for Domain Adaptation	Sep 7, 2023	BenchmarkingDomain Adaptation	—Unverified
Using representation balancing to learn conditional-average dose responses from clustered data	Sep 7, 2023	BenchmarkingCausal Inference	CodeCode Available
Are SNNs Truly Energy-efficient? - A Hardware Perspective	Sep 6, 2023	Benchmarking	—Unverified
Neural Networks for Fast Optimisation in Model Predictive Control: A Review	Sep 6, 2023	BenchmarkingModel Predictive Control	—Unverified
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models	Sep 5, 2023	BenchmarkingZero-Shot Learning	—Unverified
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking	Sep 5, 2023	BenchmarkingKnowledge Distillation	—Unverified
Hybrid data driven/thermal simulation model for comfort assessment	Sep 4, 2023	Benchmarking	—Unverified
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of Datasets	Sep 4, 2023	BenchmarkingMotor Imagery	CodeCode Available
FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees	Sep 3, 2023	BenchmarkingInstance Segmentation	—Unverified
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction	Sep 3, 2023	BenchmarkingExposure Correction	—Unverified
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning	Sep 1, 2023	BenchmarkingFederated Learning	—Unverified
NeMig -- A Bilingual News Collection and Knowledge Graph about Migration	Sep 1, 2023	ArticlesBenchmarking	CodeCode Available
Can humans help BERT gain "confidence"?	Aug 31, 2023	BenchmarkingEEG	—Unverified
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO	Aug 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified
Benchmarking Multilabel Topic Classification in the Kyrgyz Language	Aug 30, 2023	BenchmarkingClassification	CodeCode Available
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads	Aug 28, 2023	BenchmarkingSelf-Supervised Learning	—Unverified
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models	Aug 24, 2023	Action LocalizationBenchmarking	—Unverified
Beyond Document Page Classification: Design, Datasets, and Challenges	Aug 24, 2023	BenchmarkingClassification	CodeCode Available
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0	Aug 23, 2023	Benchmarkingregression	CodeCode Available
Benchmarking Causal Study to Interpret Large Language Models for Source Code	Aug 23, 2023	BenchmarkingCausal Inference	—Unverified
Efficient Benchmarking of Language Models	Aug 22, 2023	BenchmarkingGPU	—Unverified
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process	Aug 22, 2023	BenchmarkingDomain Adaptation	CodeCode Available
Beyond MD17: the reactive xxMD dataset	Aug 22, 2023	BenchmarkingComputational chemistry	CodeCode Available
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection	Aug 22, 2023	BenchmarkingOut-of-Distribution Detection	CodeCode Available
UGSL: A Unified Framework for Benchmarking Graph Structure Learning	Aug 21, 2023	BenchmarkingGraph structure learning	—Unverified
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models	Aug 21, 2023	Adversarial RobustnessBenchmarking	—Unverified
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks	Aug 17, 2023	BenchmarkingEEG	CodeCode Available
Benchmarking Adversarial Robustness of Compressed Deep Learning Models	Aug 16, 2023	Adversarial RobustnessBenchmarking	—Unverified
A Survey on Model Compression for Large Language Models	Aug 15, 2023	BenchmarkingKnowledge Distillation	—Unverified
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ Segmentation	Aug 15, 2023	BenchmarkingMedical Image Analysis	CodeCode Available
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions	Aug 15, 2023	BenchmarkingComputational Efficiency	—Unverified
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields	Aug 11, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 74 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified