Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1926–1950 of 5548 papers

Title	Date	Tasks	Status	Hype
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models	Jul 1, 2024	BenchmarkingFairness	CodeCode Available	2
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting	Jul 1, 2024	3D ReconstructionBenchmarking	—Unverified	0
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations	Jul 1, 2024	Benchmarkingdocument understanding	CodeCode Available	2
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration	Jul 1, 2024	Benchmarking	CodeCode Available	0
Benchmarking Predictive Coding Networks -- Made Simple	Jul 1, 2024	Benchmarking	CodeCode Available	2
AI Agents That Matter	Jul 1, 2024	Benchmarking	CodeCode Available	1
Overcoming Common Flaws in the Evaluation of Selective Classification Systems	Jul 1, 2024	BenchmarkingClassification	CodeCode Available	1
Commute Graph Neural Networks	Jun 30, 2024	Benchmarking	—Unverified	0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing	Jun 30, 2024	Benchmarkingcounterfactual	—Unverified	0
PerSEval: Assessing Personalization in Text Summarizers	Jun 29, 2024	BenchmarkingHuman Judgment Correlation	—Unverified	0
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1
iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities	Jun 27, 2024	Benchmarking	CodeCode Available	1
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges	Jun 27, 2024	BenchmarkingClinical Knowledge	—Unverified	0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives	Jun 27, 2024	Benchmarking	—Unverified	0
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models	Jun 27, 2024	AttributeBenchmarking	CodeCode Available	2
Quantum-tunnelling deep neural network for optical illusion recognition	Jun 26, 2024	Autonomous VehiclesBenchmarking	—Unverified	0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI	Jun 26, 2024	BenchmarkingCrop Type Mapping	—Unverified	0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis	Jun 26, 2024	Autonomous DrivingBenchmarking	—Unverified	0
GenRL: Multimodal-foundation world models for generalization in embodied agents	Jun 26, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Jun 26, 2024	BenchmarkingMath	CodeCode Available	2
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems	Jun 25, 2024	BenchmarkingRAG	—Unverified	0
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making	Jun 25, 2024	BenchmarkingDecision Making	—Unverified	0
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection	Jun 25, 2024	BenchmarkingPrompt Learning	CodeCode Available	1
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)	Jun 25, 2024	BenchmarkingExperimental Design	CodeCode Available	1

Show:10 25 50

← PrevPage 78 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified