Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3626–3650 of 5548 papers

Title	Date	Tasks	Status
TRAM: Benchmarking Temporal Reasoning for Large Language Models	Oct 2, 2023	BenchmarkingFew-Shot Learning	—Unverified
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation	Oct 2, 2023	BenchmarkingContinual Learning	CodeCode Available
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks	Sep 30, 2023	Benchmarking	—Unverified
Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method	Sep 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified
Benchmarking Collaborative Learning Methods Cost-Effectiveness for Prostate Segmentation	Sep 29, 2023	BenchmarkingFederated Learning	—Unverified
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater	Sep 29, 2023	Benchmarking	—Unverified
Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts	Sep 29, 2023	BenchmarkingDecision Making	—Unverified
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection	Sep 29, 2023	BenchmarkingDiversity	—Unverified
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors	Sep 29, 2023	BenchmarkingComputational Efficiency	—Unverified
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym	Sep 29, 2023	Bayesian OptimizationBenchmarking	—Unverified
Language Models as a Service: Overview of a New Paradigm and its Challenges	Sep 28, 2023	Benchmarking	—Unverified
Demographic Parity: Mitigating Biases in Real-World Data	Sep 27, 2023	Benchmarking	—Unverified
On quantifying and improving realism of images generated with diffusion	Sep 26, 2023	AttributeBenchmarking	—Unverified
Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression	Sep 26, 2023	BenchmarkingImage Compression	—Unverified
Thalamic nuclei segmentation from T_1-weighted MRI: unifying and benchmarking state-of-the-art methods with young and old cohorts	Sep 26, 2023	BenchmarkingSegmentation	—Unverified
Optimization Techniques for a Physical Model of Human Vocalisation	Sep 26, 2023	Benchmarking	—Unverified
Efficient Pauli channel estimation with logarithmic quantum memory	Sep 25, 2023	Benchmarking	—Unverified
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph	Sep 24, 2023	BenchmarkingKnowledge Graphs	—Unverified
Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data	Sep 24, 2023	Benchmarking	—Unverified
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence	Sep 24, 2023	BenchmarkingChange Detection	CodeCode Available
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data	Sep 23, 2023	BenchmarkingSuper-Resolution	—Unverified
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts	Sep 22, 2023	ArticlesBenchmarking	—Unverified
Multimodal Deep Learning for Scientific Imaging Interpretation	Sep 21, 2023	ArticlesBenchmarking	—Unverified
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam	Sep 21, 2023	BenchmarkingComputational Efficiency	—Unverified
On the relationship between Benchmarking, Standards and Certification in Robotics and AI	Sep 21, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 146 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified