| Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection | Apr 25, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 |
| SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Apr 25, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 3 |
| Benchmarking Mobile Device Control Agents across Diverse Configurations | Apr 25, 2024 | BenchmarkingImitation Learning | —Unverified | 0 |
| ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees | Apr 24, 2024 | BenchmarkingMolecular Property Prediction | CodeCode Available | 0 |
| Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler | Apr 24, 2024 | Benchmarking | —Unverified | 0 |
| SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data | Apr 24, 2024 | BenchmarkingFairness | CodeCode Available | 1 |
| ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Apr 24, 2024 | AttributeAttribute Value Extraction | CodeCode Available | 1 |
| DPO: A Differential and Pointwise Control Approach to Reinforcement Learning | Apr 24, 2024 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification | Apr 23, 2024 | BenchmarkingHyperspectral Image Classification | CodeCode Available | 0 |
| Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging | Apr 22, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches | Apr 22, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking | Apr 22, 2024 | BenchmarkingMisinformation | —Unverified | 0 |
| A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models | Apr 22, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 1 |
| EnzChemRED, a rich enzyme chemistry relation extraction dataset | Apr 22, 2024 | Benchmarkingnamed-entity-recognition | —Unverified | 0 |
| Open Datasets for Satellite Radio Resource Control | Apr 22, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| TAVGBench: Benchmarking Text to Audible-Video Generation | Apr 22, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Apr 22, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 |
| In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review | Apr 21, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News | Apr 21, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization | Apr 20, 2024 | Benchmarking | —Unverified | 0 |
| DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection | Apr 19, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 3 |
| STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases | Apr 19, 2024 | BenchmarkingRetrieval | CodeCode Available | 3 |
| Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection | Apr 19, 2024 | BenchmarkingIntegrated sensing and communication | —Unverified | 0 |
| REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking | Apr 19, 2024 | Benchmarkingcoreference-resolution | CodeCode Available | 1 |
| Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning | Apr 19, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Environment-aware UAV Communications: CKM Construction and Predictive Beamforming | Apr 18, 2024 | Benchmarking | —Unverified | 0 |
| How to Benchmark Vision Foundation Models for Semantic Segmentation? | Apr 18, 2024 | BenchmarkingDecoder | CodeCode Available | 1 |
| LongEmbed: Extending Embedding Models for Long Context Retrieval | Apr 18, 2024 | 4k8k | CodeCode Available | 2 |
| Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems | Apr 17, 2024 | BenchmarkingQuantization | —Unverified | 0 |
| Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions | Apr 17, 2024 | Benchmarking | —Unverified | 0 |
| VBR: A Vision Benchmark in Rome | Apr 17, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 2 |
| Benchmarking changepoint detection algorithms on cardiac time series | Apr 16, 2024 | BenchmarkingChange Point Detection | —Unverified | 0 |
| White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs | Apr 16, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset | Apr 16, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Iterated Invariant Extended Kalman Filter (IterIEKF) | Apr 16, 2024 | Benchmarking | —Unverified | 0 |
| Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network | Apr 16, 2024 | BenchmarkingMotion Segmentation | —Unverified | 0 |
| Revealing data leakage in protein interaction benchmarks | Apr 16, 2024 | Benchmarking | CodeCode Available | 2 |
| Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data | Apr 16, 2024 | BenchmarkingFace Recognition | CodeCode Available | 1 |
| LLM Evaluators Recognize and Favor Their Own Generations | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach | Apr 15, 2024 | Benchmarkingfeature selection | —Unverified | 0 |
| A Universal Protocol to Benchmark Camera Calibration for Sports | Apr 15, 2024 | BenchmarkingCamera Calibration | —Unverified | 0 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 |
| nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation | Apr 15, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| A Large-Scale Evaluation of Speech Foundation Models | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| MMInA: Benchmarking Multihop Multimodal Internet Agents | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems | Apr 15, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 |
| A Review and Efficient Implementation of Scene Graph Generation Metrics | Apr 15, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides | Apr 15, 2024 | BenchmarkingProtein Language Model | CodeCode Available | 0 |
| RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion | Apr 14, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |