SOTAVerified

Benchmarking

Papers

Showing 10511075 of 5548 papers

TitleStatusHype
FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven InterpretationCode1
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive ScenariosCode1
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World ConditionsCode1
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge GraphCode1
3D Common Corruptions and Data AugmentationCode1
Continual Learning with Foundation Models: An Empirical Study of Latent ReplayCode1
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM AgentsCode1
Benchmarking Quantized Neural Networks on FPGAs with FINNCode1
Foundation Model of Electronic Medical Records for Adaptive Risk EstimationCode1
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking AlgorithmsCode1
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking PlatformCode1
Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networksCode1
From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image SegmentationCode1
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic ScenariosCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMsCode1
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question AnsweringCode1
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery AnalysisCode1
3D AffordanceNet: A Benchmark for Visual Object Affordance UnderstandingCode1
Benchmarking Reinforcement Learning Techniques for Autonomous NavigationCode1
Formalizing Multimedia Recommendation through Multimodal Deep LearningCode1
FTNet: Feature Transverse Network for Thermal Image Semantic SegmentationCode1
Flames: Benchmarking Value Alignment of LLMs in ChineseCode1
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone NavigationCode1
Show:102550
← PrevPage 43 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified