Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1951–1975 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation	Jun 25, 2024	Action DetectionBenchmarking	CodeCode Available	0
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA	Jun 25, 2024	BenchmarkingLong-Context Understanding	CodeCode Available	2
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language	Jun 25, 2024	Benchmarking	—Unverified	0
MatText: Do Language Models Need More than Text & Scale for Materials Modeling?	Jun 25, 2024	Benchmarking	CodeCode Available	1
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available	0
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems	Jun 25, 2024	BenchmarkingCollaborative Filtering	CodeCode Available	0
Towards Efficient and Scalable Training of Differentially Private Deep Learning	Jun 25, 2024	BenchmarkingDeep Learning	CodeCode Available	0
NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods	Jun 25, 2024	3DGSBenchmarking	—Unverified	0
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models	Jun 25, 2024	Benchmarking	—Unverified	0
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models	Jun 24, 2024	Benchmarking	—Unverified	0
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization	Jun 24, 2024	Bayesian OptimizationBenchmarking	—Unverified	0
FaceScore: Benchmarking and Enhancing Face Quality in Human Generation	Jun 24, 2024	BenchmarkingDenoising	CodeCode Available	2
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1
From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking	Jun 24, 2024	BenchmarkingNeRF	CodeCode Available	2
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models	Jun 24, 2024	BenchmarkingData Augmentation	CodeCode Available	1
Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments	Jun 24, 2024	Benchmarking	CodeCode Available	4
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation	Jun 24, 2024	BenchmarkingImage Generation	CodeCode Available	2
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track	Jun 24, 2024	BenchmarkingRAG	CodeCode Available	1
General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design	Jun 24, 2024	BenchmarkingDrug Design	CodeCode Available	1
PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs	Jun 24, 2024	BenchmarkingMachine Unlearning	—Unverified	0
GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets	Jun 23, 2024	Benchmarking	—Unverified	0
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis	Jun 23, 2024	BenchmarkingRepresentation Learning	CodeCode Available	3
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking	Jun 23, 2024	Benchmarking	CodeCode Available	2
Position: Benchmarking is Limited in Reinforcement Learning Research	Jun 23, 2024	BenchmarkingPosition	—Unverified	0
MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication	Jun 22, 2024	BenchmarkingMeta-Learning	CodeCode Available	0

Show:10 25 50

← PrevPage 79 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified