Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–925 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Cognitive Biases in Large Language Models as Evaluators	Sep 29, 2023	BenchmarkingIn-Context Learning	CodeCode Available	1
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data	Sep 29, 2023	BenchmarkingContrastive Learning	CodeCode Available	1
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks	Sep 29, 2023	Benchmarking	CodeCode Available	1
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things	Sep 29, 2023	BenchmarkingFederated Learning	CodeCode Available	1
Revisiting Neural Program Smoothing for Fuzzing	Sep 28, 2023	BenchmarkingCPU	CodeCode Available	1
FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding	Sep 28, 2023	BenchmarkingImage Retrieval	CodeCode Available	1
The Trickle-down Impact of Reward (In-)consistency on RLHF	Sep 28, 2023	Benchmarking	CodeCode Available	1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite	Sep 28, 2023	Benchmarking	CodeCode Available	1
NLPBench: Evaluating Large Language Models on Solving NLP Problems	Sep 27, 2023	BenchmarkingMath	CodeCode Available	1
OceanBench: The Sea Surface Height Edition	Sep 27, 2023	BenchmarkingSensor Fusion	CodeCode Available	1
Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step Retrosynthesis	Sep 27, 2023	BenchmarkingGraph Generation	CodeCode Available	1
Unified Long-Term Time-Series Forecasting Benchmark	Sep 27, 2023	BenchmarkingTime Series	CodeCode Available	1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction	Sep 24, 2023	3D Shape ReconstructionAnatomy	CodeCode Available	1
Grad DFT: a software library for machine learning enhanced density functional theory	Sep 23, 2023	Benchmarking	CodeCode Available	1
Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation	Sep 21, 2023	BenchmarkingClassification	CodeCode Available	1
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels	Sep 13, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
Formalizing Multimedia Recommendation through Multimodal Deep Learning	Sep 11, 2023	BenchmarkingDeep Learning	CodeCode Available	1
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions	Sep 10, 2023	3D Human Pose Estimation3D Pose Estimation	CodeCode Available	1
RecAD: Towards A Unified Library for Recommender Attack and Defense	Sep 9, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
Evaluation of large language models for discovery of gene set function	Sep 7, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking Autoregressive Conditional Diffusion Models for Turbulent Flow Simulation	Sep 4, 2023	Benchmarking	CodeCode Available	1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering	Aug 31, 2023	BenchmarkingDataset Generation	CodeCode Available	1
Benchmarking the Generation of Fact Checking Explanations	Aug 29, 2023	Abstractive Text SummarizationArticles	CodeCode Available	1

Show:10 25 50

← PrevPage 37 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified