| Segmenting Maxillofacial Structures in CBCT Volumes | Jan 1, 2025 | AnatomyBenchmarking | —Unverified | 0 |
| Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation | Jan 1, 2025 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation | Jan 1, 2025 | BenchmarkingDiagnostic | —Unverified | 0 |
| On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries | Dec 31, 2024 | BenchmarkingOut-of-Distribution Generalization | —Unverified | 0 |
| A review of faithfulness metrics for hallucination assessment in Large Language Models | Dec 31, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects | Dec 31, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Measuring Large Language Models Capacity to Annotate Journalistic Sourcing | Dec 30, 2024 | BenchmarkingEthics | —Unverified | 0 |
| SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | Dec 30, 2024 | BenchmarkingCode Generation | —Unverified | 0 |