| Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Dec 20, 2024 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| AI-generated Image Quality Assessment in Visual Communication | Dec 20, 2024 | BenchmarkingImage Quality Assessment | CodeCode Available | 0 |
| Generative CKM Construction using Partially Observed Data with Diffusion Model | Dec 19, 2024 | Benchmarking | CodeCode Available | 1 |
| TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation | Dec 19, 2024 | BenchmarkingDescription-guided molecule generation | CodeCode Available | 1 |
| Pitfalls of topology-aware image segmentation | Dec 19, 2024 | BenchmarkingImage Segmentation | —Unverified | 0 |
| AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Dec 19, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| Autonomous Microscopy Experiments through Large Language Model Agents | Dec 18, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 |
| TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | Dec 18, 2024 | Benchmarking | CodeCode Available | 1 |
| AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | Dec 18, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Dec 18, 2024 | BenchmarkingPosition | —Unverified | 0 |