| SEED-Bench: Benchmarking Multimodal Large Language Models | Jan 1, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark | Jan 1, 2024 | Age EstimationBenchmarking | CodeCode Available | 2 |
| FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures | Jan 1, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Sheared Backpropagation for Fine-tuning Foundation Models | Jan 1, 2024 | Benchmarking | —Unverified | 0 |
| FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models | Jan 1, 2024 | Benchmarking | CodeCode Available | 1 |
| Temporal Validity Change Prediction | Jan 1, 2024 | BenchmarkingPrediction | —Unverified | 0 |
| Benchmarking Large Language Models on Controllable Generation under Diversified Instructions | Jan 1, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Benchmarking Hebbian learning rules for associative memory | Dec 30, 2023 | Benchmarking | —Unverified | 0 |
| Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models | Dec 30, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA | Dec 29, 2023 | AnatomyBenchmarking | CodeCode Available | 1 |