Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2526–2550 of 5548 papers

Title	Date	Tasks	Status
Reassessing Layer Pruning in LLMs: New Insights and Methods	Nov 23, 2024	BenchmarkingGPU	CodeCode Available
AdamZ: An Enhanced Optimisation Method for Neural Network Training	Nov 22, 2024	Benchmarking	CodeCode Available
Benchmarking the Robustness of Optical Flow Estimation to Corruptions	Nov 22, 2024	Autonomous DrivingBenchmarking	CodeCode Available
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains	Nov 22, 2024	BenchmarkingCaption Generation	—Unverified
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels	Nov 21, 2024	BenchmarkingMachine Translation	CodeCode Available
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series	Nov 21, 2024	Anomaly DetectionBenchmarking	CodeCode Available
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver	Nov 20, 2024	Benchmarking	—Unverified
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation	Nov 20, 2024	BenchmarkingPoint Cloud Segmentation	—Unverified
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking	Nov 20, 2024	BenchmarkingLanguage Modeling	—Unverified
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Nov 20, 2024	BenchmarkingNetHack	—Unverified
Delta-Influence: Unlearning Poisons via Influence Functions	Nov 20, 2024	AttributeBenchmarking	CodeCode Available
Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis	Nov 19, 2024	Benchmarking	—Unverified
Benchmarking Positional Encodings for GNNs and Graph Transformers	Nov 19, 2024	Benchmarking	CodeCode Available
The Moral Mind(s) of Large Language Models	Nov 19, 2024	BenchmarkingDecision Making	—Unverified
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available
Benchmarking pre-trained text embedding models in aligning built asset information	Nov 18, 2024	Asset ManagementBenchmarking	CodeCode Available
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies	Nov 17, 2024	Benchmarking	—Unverified
FastDraft: How to Train Your Draft	Nov 17, 2024	BenchmarkingCode Completion	—Unverified
Reinforcing Competitive Multi-Agents for Playing So Long Sucker	Nov 17, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML	Nov 17, 2024	BenchmarkingFairness	—Unverified
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections	Nov 16, 2024	BenchmarkingDiagnostic	CodeCode Available
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level	Nov 15, 2024	Benchmarkingcounterfactual	—Unverified
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering	Nov 15, 2024	BenchmarkingClustering	—Unverified
Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT	Nov 15, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 102 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified