SOTAVerified

Benchmarking

Papers

Showing 25012525 of 5548 papers

TitleStatusHype
Personalized Multimodal Large Language Models: A Survey0
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations0
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction MethodsCode0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media TextsCode0
Benchmarking symbolic regression constant optimization schemes0
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning0
AI Benchmarks and Datasets for LLM Evaluation0
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking0
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)Code0
Understanding the World's Museums through Vision-Language ReasoningCode0
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social SciencesCode0
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark0
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering0
Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks0
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos0
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics0
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems0
Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring0
Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches0
Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals0
Abnormality-Driven Representation Learning for Radiology Imaging0
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation0
Performance Benchmarking of Psychomotor Skills Using Wearable Devices: An Application in Sport0
Benchmarking Active Learning for NILM0
ChemSafetyBench: Benchmarking LLM Safety on Chemistry DomainCode0
Show:102550
← PrevPage 101 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified