| Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging | Apr 22, 2024 | Benchmarking | CodeCode Available | 1 |
| A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models | Apr 22, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 1 |
| REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking | Apr 19, 2024 | Benchmarkingcoreference-resolution | CodeCode Available | 1 |
| How to Benchmark Vision Foundation Models for Semantic Segmentation? | Apr 18, 2024 | BenchmarkingDecoder | CodeCode Available | 1 |
| Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data | Apr 16, 2024 | BenchmarkingFace Recognition | CodeCode Available | 1 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 |
| A Review and Efficient Implementation of Scene Graph Generation Metrics | Apr 15, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems | Apr 15, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation | Apr 15, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion | Apr 14, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |