| Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods | Dec 3, 2024 | Benchmarking | CodeCode Available | 0 |
| VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | Dec 3, 2024 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| Benchmarking symbolic regression constant optimization schemes | Dec 3, 2024 | Benchmarkingregression | —Unverified | 0 |
| Personalized Multimodal Large Language Models: A Survey | Dec 3, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | Dec 3, 2024 | BenchmarkingFace Recognition | —Unverified | 0 |
| Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs" | Dec 2, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| Commit0: Library Generation from Scratch | Dec 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | Dec 2, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| AI Benchmarks and Datasets for LLM Evaluation | Dec 2, 2024 | BenchmarkingDistributed Computing | —Unverified | 0 |
| Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024) | Dec 2, 2024 | BenchmarkingHigh-Level Synthesis | CodeCode Available | 0 |