Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1876–1900 of 5548 papers

Title	Date	Tasks	Status
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation	May 8, 2025	BenchmarkingFederated Learning	—Unverified
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available
False Promises in Medical Imaging AI? Assessing Validity of Outperformance Claims	May 7, 2025	Benchmarking	CodeCode Available
Alpha Excel Benchmark	May 7, 2025	Benchmarking	—Unverified
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers	May 7, 2025	BenchmarkingFault Detection	CodeCode Available
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?	May 7, 2025	BenchmarkingSemantic Segmentation	CodeCode Available
Call for Action: towards the next generation of symbolic regression benchmark	May 6, 2025	BenchmarkingDiversity	—Unverified
Multimodal Benchmarking and Recommendation of Text-to-Image Generation Models	May 6, 2025	BenchmarkingImage Generation	CodeCode Available
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach	May 6, 2025	BenchmarkingEarth Observation	CodeCode Available
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	May 6, 2025	BenchmarkingMultiple-choice	CodeCode Available
Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning	May 5, 2025	Benchmarking	—Unverified
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities	May 5, 2025	BenchmarkingQuantization	CodeCode Available
Completing Spatial Transcriptomics Data for Gene Expression Prediction Benchmarking	May 5, 2025	BenchmarkingPrediction	—Unverified
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation	May 4, 2025	BenchmarkingFeature Upsampling	CodeCode Available
Meta-Black-Box-Optimization through Offline Q-function Learning	May 4, 2025	BenchmarkingMamba	CodeCode Available
Representation Learning of Limit Order Book: A Comprehensive Study and Benchmarking	May 4, 2025	BenchmarkingRepresentation Learning	CodeCode Available
NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks	May 4, 2025	BenchmarkingRepresentation Learning	CodeCode Available
Not Every Tree Is a Forest: Benchmarking Forest Types from Satellite Remote Sensing	May 3, 2025	BenchmarkingImage Segmentation	—Unverified
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture	May 3, 2025	Autonomous DrivingBenchmarking	—Unverified
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models	May 3, 2025	BenchmarkingHyperparameter Optimization	—Unverified
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach	May 3, 2025	BenchmarkingImage-to-Image Translation	—Unverified
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey	May 3, 2025	Autonomous DrivingBenchmarking	—Unverified
Interpretable graph-based models on multimodal biomedical data integration: A technical review and benchmarking	May 3, 2025	BenchmarkingData Integration	—Unverified
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models	May 2, 2025	Benchmarking	CodeCode Available
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging	May 2, 2025	BenchmarkingComputational Efficiency	—Unverified

Show:10 25 50

← PrevPage 76 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified