SOTAVerified

Benchmarking

Papers

Showing 781790 of 5548 papers

TitleStatusHype
Leveraging Foundation Models for Content-Based Medical Image Retrieval in RadiologyCode1
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New BenchmarkCode1
Benchmarking Micro-action Recognition: Dataset, Methods, and ApplicationsCode1
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis AgentsCode1
R^2-Bench: Benchmarking the Robustness of Referring Perception Models under PerturbationsCode1
Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in RecommendationCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMsCode1
Efficient Lifelong Model Evaluation in an Era of Rapid ProgressCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Show:102550
← PrevPage 79 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified