| Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices | Jan 7, 2025 | BenchmarkingClustering | —Unverified | 0 |
| Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding | Jan 7, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models | Jan 6, 2025 | BenchmarkingFeature Compression | —Unverified | 0 |
| The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input | Jan 6, 2025 | BenchmarkingForm | —Unverified | 0 |
| Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks | Jan 5, 2025 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation | Jan 3, 2025 | BenchmarkingCrowd Counting | CodeCode Available | 0 |
| PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | Jan 3, 2025 | Benchmarking | —Unverified | 0 |
| AI-Powered Cow Detection in Complex Farm Environments | Jan 3, 2025 | Benchmarking | —Unverified | 0 |
| QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture | Jan 3, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 |
| TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer | Jan 2, 2025 | BenchmarkingQuantization | —Unverified | 0 |