Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2800 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Pathology Feature Extractors for Whole Slide Image Classification	Nov 20, 2023	Benchmarkingimage-classification	CodeCode Available	1
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions	Nov 19, 2023	Bayesian OptimizationBenchmarking	CodeCode Available	0
Benchmarking Machine Learning Models for Quantum Error Correction	Nov 18, 2023	Benchmarking	—Unverified	0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization	Nov 18, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified	0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach	Nov 17, 2023	BenchmarkingCollision Avoidance	—Unverified	0
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction	Nov 16, 2023	BenchmarkingEvent Extraction	CodeCode Available	1
Exponentially Faster Language Modelling	Nov 15, 2023	BenchmarkingCPU	CodeCode Available	2
Domain Aligned CLIP for Few-shot Classification	Nov 15, 2023	BenchmarkingClassification	—Unverified	0
Social Bias Probing: Fairness Benchmarking for Language Models	Nov 15, 2023	BenchmarkingFairness	—Unverified	0
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph	Nov 15, 2023	Benchmarking	CodeCode Available	1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1
Model Agnostic Explainable Selective Regression via Uncertainty Estimation	Nov 15, 2023	Benchmarkingmodel	—Unverified	0
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks	Nov 15, 2023	BenchmarkingNetwork Pruning	CodeCode Available	0
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation	Nov 14, 2023	BenchmarkingMachine Translation	CodeCode Available	0
Benchmarking Individual Tree Mapping with Sub-meter Imagery	Nov 14, 2023	BenchmarkingSegmentation	—Unverified	0
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration	Nov 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
Combinatorial Optimization with Policy Adaptation using Latent Space Search	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images	Nov 13, 2023	BenchmarkingClassification	CodeCode Available	0
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks	Nov 13, 2023	Benchmarking	—Unverified	0
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data	Nov 13, 2023	BenchmarkingFeature Importance	—Unverified	0
The Disagreement Problem in Faithfulness Metrics	Nov 13, 2023	BenchmarkingExplainable artificial intelligence	—Unverified	0
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models	Nov 13, 2023	BenchmarkingInstruction Following	CodeCode Available	1
Flames: Benchmarking Value Alignment of LLMs in Chinese	Nov 12, 2023	BenchmarkingFairness	CodeCode Available	1
Identification of vortex in unstructured mesh with graph neural networks	Nov 11, 2023	BenchmarkingGraph Generation	—Unverified	0
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
MultiIoT: Benchmarking Machine Learning for the Internet of Things	Nov 10, 2023	BenchmarkingRepresentation Learning	CodeCode Available	1
SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification	Nov 9, 2023	BenchmarkingInstance Segmentation	—Unverified	0
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs	Nov 9, 2023	BenchmarkingQuestion Answering	CodeCode Available	1
An efficiency analysis of Spanish airports	Nov 8, 2023	Benchmarking	—Unverified	0
The voraus-AD Dataset for Anomaly Detection in Robot Applications	Nov 8, 2023	Anomaly DetectionBenchmarking	CodeCode Available	1
Prompt Sketching for Large Language Models	Nov 8, 2023	Arithmetic ReasoningBenchmarking	—Unverified	0
The PetShop Dataset -- Finding Causes of Performance Issues across Microservices	Nov 8, 2023	Benchmarking	CodeCode Available	1
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction	Nov 8, 2023	BenchmarkingClick-Through Rate Prediction	CodeCode Available	0
Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts	Nov 7, 2023	BenchmarkingMachine Translation	CodeCode Available	1
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding	Nov 7, 2023	3D ReconstructionBenchmarking	CodeCode Available	0
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding	Nov 6, 2023	BenchmarkingData Compression	CodeCode Available	1
Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild	Nov 6, 2023	BenchmarkingFacial Expression Recognition	—Unverified	0
Benchmarking Differential Evolution on a Quantum Simulator	Nov 6, 2023	BenchmarkingEvolutionary Algorithms	—Unverified	0
Exploitation-Guided Exploration for Semantic Embodied Navigation	Nov 6, 2023	Benchmarking	—Unverified	0
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones	Nov 5, 2023	Benchmarking	CodeCode Available	1
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds	Nov 5, 2023	Autonomous NavigationAutonomous Vehicles	CodeCode Available	1
Benchmarking a Benchmark: How Reliable is MS-COCO?	Nov 5, 2023	Benchmarkingimage-classification	—Unverified	0
Learning Disentangled Speech Representations	Nov 4, 2023	BenchmarkingDisentanglement	—Unverified	0
NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications	Nov 4, 2023	BenchmarkingDeep Learning	CodeCode Available	1
LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion	Nov 4, 2023	BenchmarkingImitation Learning	CodeCode Available	3
FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation	Nov 4, 2023	BenchmarkingDrug Discovery	CodeCode Available	1
Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics	Nov 3, 2023	Benchmarkingquantile regression	—Unverified	0
Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature	Nov 3, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 56 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified