| Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos | Mar 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | Mar 17, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination | Mar 17, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era | Mar 16, 2025 | BenchmarkingImage Captioning | —Unverified | 0 |
| Advancing Human-Machine Teaming: Concepts, Challenges, and Applications | Mar 16, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Genicious: Contextual Few-shot Prompting for Insights Discovery | Mar 15, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms | Mar 15, 2025 | BenchmarkingBrain Morphometry | —Unverified | 0 |
| Language Models for Automated Classification of Brain MRI Reports and Growth Chart Generation | Mar 15, 2025 | Benchmarking | —Unverified | 0 |
| Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study | Mar 14, 2025 | Benchmarking | —Unverified | 0 |
| LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama | Mar 14, 2025 | BenchmarkingMMLU | —Unverified | 0 |