| Personalized Multimodal Large Language Models: A Survey | Dec 3, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | Dec 3, 2024 | BenchmarkingFace Recognition | —Unverified | 0 |
| Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods | Dec 3, 2024 | Benchmarking | CodeCode Available | 0 |
| BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts | Dec 3, 2024 | Age And Gender ClassificationAge and Gender Estimation | CodeCode Available | 0 |
| Benchmarking symbolic regression constant optimization schemes | Dec 3, 2024 | Benchmarkingregression | —Unverified | 0 |
| VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | Dec 3, 2024 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| AI Benchmarks and Datasets for LLM Evaluation | Dec 2, 2024 | BenchmarkingDistributed Computing | —Unverified | 0 |
| Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | Dec 2, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024) | Dec 2, 2024 | BenchmarkingHigh-Level Synthesis | CodeCode Available | 0 |
| Understanding the World's Museums through Vision-Language Reasoning | Dec 2, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences | Nov 30, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 |
| One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering | Nov 29, 2024 | BenchmarkingObject | —Unverified | 0 |
| Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks | Nov 28, 2024 | BenchmarkingNatural Language Inference | —Unverified | 0 |
| HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Nov 28, 2024 | BenchmarkingObject Tracking | —Unverified | 0 |
| λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics | Nov 28, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems | Nov 27, 2024 | AutoMLBenchmarking | —Unverified | 0 |
| Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring | Nov 27, 2024 | BenchmarkingEarth Observation | —Unverified | 0 |
| Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches | Nov 26, 2024 | Benchmarking | —Unverified | 0 |
| Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals | Nov 26, 2024 | BenchmarkingRetrieval | —Unverified | 0 |
| Abnormality-Driven Representation Learning for Radiology Imaging | Nov 25, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation | Nov 25, 2024 | Active LearningBayesian Inference | —Unverified | 0 |
| Performance Benchmarking of Psychomotor Skills Using Wearable Devices: An Application in Sport | Nov 25, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Active Learning for NILM | Nov 24, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain | Nov 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |