Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3601–3650 of 5548 papers

Title	Date	Tasks	Status
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data	Oct 9, 2023	BenchmarkingLanguage Modeling	CodeCode Available
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus	Oct 8, 2023	BenchmarkingMachine Translation	CodeCode Available
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems	Oct 8, 2023	Benchmarking	CodeCode Available
Simple GNNs with Low Rank Non-parametric Aggregators	Oct 8, 2023	BenchmarkingNode Classification	CodeCode Available
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction	Oct 8, 2023	BenchmarkingDecoder	—Unverified
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets	Oct 7, 2023	Benchmarkingnamed-entity-recognition	—Unverified
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data	Oct 7, 2023	Benchmarking	—Unverified
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods	Oct 6, 2023	BenchmarkingExperimental Design	—Unverified
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis	Oct 6, 2023	BenchmarkingDomain Generalization	—Unverified
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation	Oct 6, 2023	BenchmarkingMathematical Reasoning	—Unverified
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards	Oct 6, 2023	Benchmarking	CodeCode Available
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms	Oct 6, 2023	AutoMLBenchmarking	—Unverified
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning	Oct 6, 2023	BenchmarkingFederated Learning	—Unverified
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report	Oct 5, 2023	Benchmarking	—Unverified
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling	Oct 5, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference	Oct 4, 2023	BenchmarkingGPU	—Unverified
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available
Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study	Oct 4, 2023	Autonomous VehiclesBenchmarking	—Unverified
On the Performance of Multimodal Language Models	Oct 4, 2023	BenchmarkingBinary Classification	—Unverified
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations	Oct 3, 2023	Atomic ForcesBenchmarking	—Unverified
Learning Quantum Processes with Quantum Statistical Queries	Oct 3, 2023	BenchmarkingCryptanalysis	CodeCode Available
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods	Oct 3, 2023	Benchmarkingtext-guided-image-editing	—Unverified
Benchmarking and Improving Generator-Validator Consistency of Language Models	Oct 3, 2023	BenchmarkingInstruction Following	—Unverified
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems	Oct 2, 2023	BenchmarkingComputational Efficiency	—Unverified
A New Real-World Video Dataset for the Comparison of Defogging Algorithms	Oct 2, 2023	BenchmarkingDeblurring	—Unverified
TRAM: Benchmarking Temporal Reasoning for Large Language Models	Oct 2, 2023	BenchmarkingFew-Shot Learning	—Unverified
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation	Oct 2, 2023	BenchmarkingContinual Learning	CodeCode Available
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks	Sep 30, 2023	Benchmarking	—Unverified
Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method	Sep 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified
Benchmarking Collaborative Learning Methods Cost-Effectiveness for Prostate Segmentation	Sep 29, 2023	BenchmarkingFederated Learning	—Unverified
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater	Sep 29, 2023	Benchmarking	—Unverified
Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts	Sep 29, 2023	BenchmarkingDecision Making	—Unverified
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection	Sep 29, 2023	BenchmarkingDiversity	—Unverified
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors	Sep 29, 2023	BenchmarkingComputational Efficiency	—Unverified
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym	Sep 29, 2023	Bayesian OptimizationBenchmarking	—Unverified
Language Models as a Service: Overview of a New Paradigm and its Challenges	Sep 28, 2023	Benchmarking	—Unverified
Demographic Parity: Mitigating Biases in Real-World Data	Sep 27, 2023	Benchmarking	—Unverified
On quantifying and improving realism of images generated with diffusion	Sep 26, 2023	AttributeBenchmarking	—Unverified
Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression	Sep 26, 2023	BenchmarkingImage Compression	—Unverified
Thalamic nuclei segmentation from T_1-weighted MRI: unifying and benchmarking state-of-the-art methods with young and old cohorts	Sep 26, 2023	BenchmarkingSegmentation	—Unverified
Optimization Techniques for a Physical Model of Human Vocalisation	Sep 26, 2023	Benchmarking	—Unverified
Efficient Pauli channel estimation with logarithmic quantum memory	Sep 25, 2023	Benchmarking	—Unverified
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph	Sep 24, 2023	BenchmarkingKnowledge Graphs	—Unverified
Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data	Sep 24, 2023	Benchmarking	—Unverified
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence	Sep 24, 2023	BenchmarkingChange Detection	CodeCode Available
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data	Sep 23, 2023	BenchmarkingSuper-Resolution	—Unverified
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts	Sep 22, 2023	ArticlesBenchmarking	—Unverified
Multimodal Deep Learning for Scientific Imaging Interpretation	Sep 21, 2023	ArticlesBenchmarking	—Unverified
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam	Sep 21, 2023	BenchmarkingComputational Efficiency	—Unverified
On the relationship between Benchmarking, Standards and Certification in Robotics and AI	Sep 21, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 73 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified