Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 5548 papers

Title	Date	Tasks	Status	Hype
SustainDC: Benchmarking for Sustainable Data Center Control	Aug 14, 2024	BenchmarkingManagement	CodeCode Available	2
COALA: A Practical and Vision-Centric Federated Learning Platform	Jul 23, 2024	BenchmarkingContinual Learning	CodeCode Available	2
MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning	Jul 23, 2024	BenchmarkingDecision Making	CodeCode Available	2
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models	Jul 17, 2024	BenchmarkingRed Teaming	CodeCode Available	2
GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection	Jul 16, 2024	BenchmarkingLoop Closure Detection	CodeCode Available	2
WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving	Jul 11, 2024	Autonomous DrivingBenchmarking	CodeCode Available	2
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior	Jul 10, 2024	BenchmarkingDecoder	CodeCode Available	2
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance	Jul 9, 2024	BenchmarkingConditional Image Generation	CodeCode Available	2
SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry	Jul 5, 2024	Benchmarkingobject-detection	CodeCode Available	2
Benchmarking Complex Instruction-Following with Multiple Constraints Composition	Jul 4, 2024	BenchmarkingInstruction Following	CodeCode Available	2
Craftium: An Extensible Framework for Creating Reinforcement Learning Environments	Jul 4, 2024	BenchmarkingMinecraft	CodeCode Available	2
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models	Jul 3, 2024	BenchmarkingCode Search	CodeCode Available	2
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models	Jul 1, 2024	BenchmarkingFairness	CodeCode Available	2
Benchmarking Predictive Coding Networks -- Made Simple	Jul 1, 2024	Benchmarking	CodeCode Available	2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations	Jul 1, 2024	Benchmarkingdocument understanding	CodeCode Available	2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models	Jun 27, 2024	AttributeBenchmarking	CodeCode Available	2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Jun 26, 2024	BenchmarkingMath	CodeCode Available	2
GenRL: Multimodal-foundation world models for generalization in embodied agents	Jun 26, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	2
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA	Jun 25, 2024	BenchmarkingLong-Context Understanding	CodeCode Available	2
From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking	Jun 24, 2024	BenchmarkingNeRF	CodeCode Available	2
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation	Jun 24, 2024	BenchmarkingImage Generation	CodeCode Available	2
FaceScore: Benchmarking and Enhancing Face Quality in Human Generation	Jun 24, 2024	BenchmarkingDenoising	CodeCode Available	2
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking	Jun 23, 2024	Benchmarking	CodeCode Available	2
GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis	Jun 21, 2024	AI AgentAutoML	CodeCode Available	2
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph	Jun 21, 2024	BenchmarkingText Generation	CodeCode Available	2

Show:10 25 50

← PrevPage 10 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified