Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3201–3250 of 5548 papers

Title	Date	Tasks	Status
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future	Aug 5, 2024	BenchmarkingCode Generation	—Unverified
From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising	Apr 30, 2025	BenchmarkingComputational Efficiency	—Unverified
From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification	Mar 28, 2023	BenchmarkingPrivacy Preserving	—Unverified
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution	Apr 9, 2024	Benchmarking	—Unverified
From Sound Representation to Model Robustness	Jul 27, 2020	Adversarial AttackAdversarial Robustness	—Unverified
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems	Jun 5, 2025	BenchmarkingRAG	—Unverified
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference	Oct 4, 2023	BenchmarkingGPU	—Unverified
FSD-10: A Dataset for Competitive Sports Content Analysis	Feb 9, 2020	Action RecognitionBenchmarking	—Unverified
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods	Oct 6, 2023	BenchmarkingExperimental Design	—Unverified
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems	May 24, 2024	BenchmarkingDeep Learning	—Unverified
FunBench: Benchmarking Fundus Reading Skills of MLLMs	Mar 2, 2025	AnatomyBenchmarking	—Unverified
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified
Efficient Pauli channel estimation with logarithmic quantum memory	Sep 25, 2023	Benchmarking	—Unverified
FuzzWiz -- Fuzzing Framework for Efficient Hardware Coverage	Oct 23, 2024	Benchmarking	—Unverified
Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK	Feb 16, 2023	BenchmarkingKnowledge Distillation	—Unverified
Genetic algorithm for feature selection of EEG heterogeneous data	Mar 12, 2021	BenchmarkingEEG	—Unverified
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training	Apr 30, 2025	Benchmarking	—Unverified
GAN-based disentanglement learning for chest X-ray rib suppression	Oct 18, 2021	BenchmarkingComputed Tomography (CT)	—Unverified
GANmut: Generating and Modifying Facial Expressions	Jun 16, 2024	BenchmarkingDiversity	—Unverified
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR	Apr 15, 2025	Benchmarking	—Unverified
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics	Mar 27, 2025	BenchmarkingNatural Language Queries	—Unverified
Gauss-Ramanujan Functions: Constructions, Properties, and Applications in Communications and Signal Processing	May 27, 2025	Benchmarking	—Unverified
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing	Jun 30, 2024	Benchmarkingcounterfactual	—Unverified
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases	May 25, 2024	BenchmarkingHallucination	—Unverified
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference	Feb 25, 2022	BenchmarkingDimensionality Reduction	—Unverified
Generalization Bias in Large Language Model Summarization of Scientific Research	Mar 28, 2025	BenchmarkingLanguage Modeling	—Unverified
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization	May 23, 2022	BenchmarkingDeep Reinforcement Learning	—Unverified
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow	Feb 14, 2025	Benchmarking	—Unverified
Generalized Conflict-directed Search for Optimal Ordering Problems	Mar 31, 2021	BenchmarkingScheduling	—Unverified
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey	Jun 23, 2025	BenchmarkingSurvey	—Unverified
General Scales Unlock AI Evaluation with Explanatory and Predictive Power	Mar 9, 2025	BenchmarkingSpecificity	—Unverified
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey	Jun 5, 2020	BenchmarkingExperimental Design	—Unverified
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems	Jun 4, 2025	BenchmarkingCode Generation	—Unverified
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems	Nov 27, 2024	AutoMLBenchmarking	—Unverified
Hierarchical Data Generator based on Tree-Structured Stick Breaking Process for Benchmarking Clustering Methods	Jun 17, 2016	BenchmarkingClustering	—Unverified
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking	Nov 6, 2024	Benchmarking	—Unverified
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow	Dec 18, 2024	Benchmarking	—Unverified
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?	May 27, 2020	BenchmarkingFraud Detection	—Unverified
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking	Apr 7, 2025	BenchmarkingImage Generation	—Unverified
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors	Jun 29, 2023	Benchmarking	—Unverified
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges	Jun 27, 2024	BenchmarkingClinical Knowledge	—Unverified
Learning Dynamic Feature Selection for Fast Sequential Prediction	May 22, 2015	BenchmarkingDependency Parsing	—Unverified
Learning Environment Models with Continuous Stochastic Dynamics	Jun 29, 2023	AcrobotBenchmarking	—Unverified
Learning Graphs for Knowledge Transfer With Limited Labels	Jun 19, 2021	Action RecognitionBenchmarking	—Unverified
Learning Hidden Physics and System Parameters with Deep Operator Networks	Dec 6, 2024	BenchmarkingUncertainty Quantification	—Unverified
Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation	Jul 19, 2019	Benchmarking	—Unverified
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge	Sep 22, 2021	BenchmarkingData Augmentation	—Unverified
Learning to Adapt to Online Streams with Distribution Shifts	Mar 2, 2023	BenchmarkingMeta-Learning	—Unverified
Realistic Large-Scale Fine-Depth Dehazing Dataset from 3D Videos	Apr 18, 2020	Autonomous DrivingBenchmarking	—Unverified
Learning to Disambiguate by Asking Discriminative Questions	Aug 9, 2017	BenchmarkingImage Captioning	—Unverified

Show:10 25 50

← PrevPage 65 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified