SOTAVerified

Benchmarking

Papers

Showing 19261950 of 5548 papers

TitleStatusHype
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation ModelsCode2
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting0
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by IntegrationCode0
Benchmarking Predictive Coding Networks -- Made SimpleCode2
AI Agents That MatterCode1
Overcoming Common Flaws in the Evaluation of Selective Classification SystemsCode1
Commute Graph Neural Networks0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
PerSEval: Assessing Personalization in Text Summarizers0
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activitiesCode1
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives0
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
Quantum-tunnelling deep neural network for optical illusion recognition0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis0
GenRL: Multimodal-foundation world models for generalization in embodied agentsCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems0
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making0
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)Code1
Show:102550
← PrevPage 78 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified