Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5251–5300 of 5548 papers

Title	Date	Tasks	Status
End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings	Jun 19, 2018	Benchmarking	—Unverified
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption	Feb 17, 2025	BenchmarkingCode Summarization	—Unverified
Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials	Feb 5, 2025	Benchmarking	—Unverified
Energy Management in Storage-Augmented, Grid-Connected Prosumer Buildings and Neighbourhoods Using a Modified Simulated Annealing Optimization	Mar 28, 2015	Benchmarkingenergy management	—Unverified
T^2K^2: The Twitter Top-K Keywords Benchmark	Sep 14, 2017	BenchmarkingInformation Retrieval	—Unverified
Enhanced Multiobjective Evolutionary Algorithm based on Decomposition for Solving the Unit Commitment Problem	Oct 16, 2014	Benchmarking	—Unverified
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery	Apr 5, 2022	Benchmarkingobject-detection	—Unverified
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT	Nov 1, 2020	Automatic Post-EditingBenchmarking	—Unverified
Characterizing the adversarial vulnerability of speech self-supervised learning	Nov 8, 2021	Adversarial RobustnessBenchmarking	—Unverified
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration	Jun 19, 2024	BenchmarkingDistractor Generation	—Unverified
Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies	Apr 17, 2025	BenchmarkingDecision Making	—Unverified
Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures	Mar 14, 2025	BenchmarkingGesture Recognition	—Unverified
TabKAN: Advancing Tabular Data Analysis using Kolmogorov-Arnold Network	Apr 9, 2025	BenchmarkingDeep Learning	—Unverified
Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement	Feb 24, 2025	Benchmarkingfeature selection	—Unverified
Characterizing Missing Information in Deep Networks Using Backpropagated Gradients	Jan 1, 2020	Anomaly DetectionAttribute	—Unverified
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages	Mar 24, 2025	BenchmarkingDecision Making	—Unverified
Enhancing Navigation Benchmarking and Perception Data Generation for Row-based Crops in Simulation	Jun 27, 2023	Autonomous NavigationBenchmarking	—Unverified
Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification	Nov 29, 2023	BenchmarkingDecision Making	—Unverified
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG	Sep 12, 2024	BenchmarkingQuestion Answering	—Unverified
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries	Nov 7, 2024	Benchmarking	—Unverified
Characterization of Multiple 3D LiDARs for Localization and Mapping using Normal Distributions Transform	Apr 3, 2020	Benchmarking	—Unverified
Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations	Apr 22, 2025	BenchmarkingFew-Shot Learning	—Unverified
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model	Jan 20, 2025	Benchmarking	—Unverified
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs	Jun 4, 2024	BenchmarkingFairness	—Unverified
TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer	Jan 2, 2025	BenchmarkingQuantization	—Unverified
ALdataset: a benchmark for pool-based active learning	Oct 16, 2020	Active LearningBenchmarking	—Unverified
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective	Feb 4, 2023	BenchmarkingMultiobjective Optimization	—Unverified
EnronQA: Towards Personalized RAG over Private Documents	May 1, 2025	BenchmarkingMemorization	—Unverified
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling	Jul 8, 2022	Benchmarking	—Unverified
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Feature Space Perspective	Sep 9, 2021	BenchmarkingMultiobjective Optimization	—Unverified
TabularQGAN: A Quantum Generative Model for Tabular Data	May 28, 2025	BenchmarkingGenerative Adversarial Network	—Unverified
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies	May 18, 2022	BenchmarkingEntity Alignment	—Unverified
Entity Personalized Talent Search Models with Tree Interaction Features	Feb 25, 2019	Benchmarking	—Unverified
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models	Jun 16, 2022	BenchmarkingLanguage Modeling	—Unverified
A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration	Jun 1, 2013	Benchmarking	—Unverified
Entropic one-class classifiers	Jul 28, 2014	Anomaly DetectionBenchmarking	—Unverified
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models	May 18, 2024	BenchmarkingSpecificity	—Unverified
Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution	Jun 2, 2020	BenchmarkingDepth Map Super-Resolution	—Unverified
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming	Apr 18, 2024	Benchmarking	—Unverified
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity	Jun 28, 2023	BenchmarkingImage Captioning	—Unverified
EnvSDD: Benchmarking Environmental Sound Deepfake Detection	May 25, 2025	Audio Deepfake DetectionAudio Generation	—Unverified
EnzChemRED, a rich enzyme chemistry relation extraction dataset	Apr 22, 2024	Benchmarkingnamed-entity-recognition	—Unverified
Challenges in Benchmarking Stream Learning Algorithms with Real-world Data	Apr 30, 2020	Benchmarking	—Unverified
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking	Feb 18, 2025	BenchmarkingBinary Classification	—Unverified
Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking	Apr 29, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection	Oct 6, 2024	BenchmarkingMathematical Reasoning	—Unverified
Challenges and perspectives in computational deconvolution of genomics data	Nov 21, 2022	Benchmarking	—Unverified
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit	Apr 10, 2023	BenchmarkingSimultaneous Speech-to-Text Translation	—Unverified
Tackling the Story Ending Biases in The Story Cloze Test	Jul 1, 2018	BenchmarkingCloze Test	—Unverified
Establishing Reliability Metrics for Reward Models in Large Language Models	Apr 21, 2025	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 106 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified