| Magnetic Resonance Imaging Feature-Based Subtyping and Model Ensemble for Enhanced Brain Tumor Segmentation | Dec 5, 2024 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 0 |
| Benchmarking Harmonized Tariff Schedule Classification Models | Dec 4, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Video Quality Assessment: A Comprehensive Survey | Dec 4, 2024 | BenchmarkingSurvey | CodeCode Available | 2 |
| AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations? | Dec 4, 2024 | BenchmarkingVisual Question Answering (VQA) | —Unverified | 0 |
| Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus | Dec 4, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images | Dec 4, 2024 | BenchmarkingBuilding Damage Assessment | —Unverified | 0 |
| Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy | Dec 4, 2024 | AnatomyBenchmarking | —Unverified | 0 |
| Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data | Dec 3, 2024 | Benchmarking | —Unverified | 0 |
| Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications | Dec 3, 2024 | BenchmarkingDisaster Response | CodeCode Available | 3 |
| BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts | Dec 3, 2024 | Age And Gender ClassificationAge and Gender Estimation | CodeCode Available | 0 |
| Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods | Dec 3, 2024 | Benchmarking | CodeCode Available | 0 |
| Personalized Multimodal Large Language Models: A Survey | Dec 3, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| Benchmarking symbolic regression constant optimization schemes | Dec 3, 2024 | Benchmarkingregression | —Unverified | 0 |
| VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | Dec 3, 2024 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | Dec 3, 2024 | BenchmarkingFace Recognition | —Unverified | 0 |
| Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | Dec 2, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Understanding the World's Museums through Vision-Language Reasoning | Dec 2, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs" | Dec 2, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| Commit0: Library Generation from Scratch | Dec 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| AI Benchmarks and Datasets for LLM Evaluation | Dec 2, 2024 | BenchmarkingDistributed Computing | —Unverified | 0 |
| Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024) | Dec 2, 2024 | BenchmarkingHigh-Level Synthesis | CodeCode Available | 0 |
| TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences | Nov 30, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering | Nov 29, 2024 | BenchmarkingObject | —Unverified | 0 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 |