| GCondenser: Benchmarking Graph Condensation | May 23, 2024 | BenchmarkingGraph Representation Learning | CodeCode Available | 1 |
| Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding | May 21, 2024 | BenchmarkingKeypoint Detection | CodeCode Available | 1 |
| DocuMint: Docstring Generation for Python using Small Language Models | May 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation | May 14, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | May 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 1 |
| AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets | May 7, 2024 | BenchmarkingCancer Classification | CodeCode Available | 1 |
| Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? | May 4, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging | Apr 30, 2024 | BenchmarkingImage Reconstruction | CodeCode Available | 1 |
| Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? | Apr 29, 2024 | Answer GenerationBenchmarking | CodeCode Available | 1 |
| 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs | Apr 28, 2024 | Benchmarking | CodeCode Available | 1 |
| Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments | Apr 27, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection | Apr 25, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 |
| ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Apr 24, 2024 | AttributeAttribute Value Extraction | CodeCode Available | 1 |
| SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data | Apr 24, 2024 | BenchmarkingFairness | CodeCode Available | 1 |
| TAVGBench: Benchmarking Text to Audible-Video Generation | Apr 22, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging | Apr 22, 2024 | Benchmarking | CodeCode Available | 1 |
| A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models | Apr 22, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 1 |
| REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking | Apr 19, 2024 | Benchmarkingcoreference-resolution | CodeCode Available | 1 |
| How to Benchmark Vision Foundation Models for Semantic Segmentation? | Apr 18, 2024 | BenchmarkingDecoder | CodeCode Available | 1 |
| Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data | Apr 16, 2024 | BenchmarkingFace Recognition | CodeCode Available | 1 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 |
| A Review and Efficient Implementation of Scene Graph Generation Metrics | Apr 15, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems | Apr 15, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation | Apr 15, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion | Apr 14, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |