SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 781–790 of 5548 papers

Title	Date	Tasks	Status	Hype
Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology	Mar 11, 2024	BenchmarkingContent-Based Image Retrieval	CodeCode Available	1
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark	Mar 9, 2024	BenchmarkingFairness	CodeCode Available	1
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents	Mar 8, 2024	BenchmarkingDecision Making	CodeCode Available	1
R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations	Mar 7, 2024	Benchmarking	CodeCode Available	1
Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation	Mar 7, 2024	BenchmarkingMultimodal Recommendation	CodeCode Available	1
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing	Mar 2, 2024	AttributeBenchmarking	CodeCode Available	1
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs	Mar 1, 2024	Benchmarking	CodeCode Available	1
Efficient Lifelong Model Evaluation in an Era of Rapid Progress	Feb 29, 2024	BenchmarkingGPU	CodeCode Available	1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Feb 28, 2024	BenchmarkingMultiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 79 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified