| Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling | Nov 21, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series | Nov 21, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Multi-Agent Environments for Vehicle Routing Problems | Nov 21, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 1 |
| Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking | Nov 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver | Nov 20, 2024 | Benchmarking | —Unverified | 0 |
| Delta-Influence: Unlearning Poisons via Influence Functions | Nov 20, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Nov 20, 2024 | BenchmarkingImage Generation | CodeCode Available | 5 |
| BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Nov 20, 2024 | BenchmarkingPoint Cloud Segmentation | —Unverified | 0 |
| BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Nov 20, 2024 | BenchmarkingNetHack | —Unverified | 0 |
| The Moral Mind(s) of Large Language Models | Nov 19, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis | Nov 19, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Positional Encodings for GNNs and Graph Transformers | Nov 19, 2024 | Benchmarking | CodeCode Available | 0 |
| DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Nov 19, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Introducing Milabench: Benchmarking Accelerators for AI | Nov 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Benchmarking pre-trained text embedding models in aligning built asset information | Nov 18, 2024 | Asset ManagementBenchmarking | CodeCode Available | 0 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 |
| Reinforcing Competitive Multi-Agents for Playing So Long Sucker | Nov 17, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies | Nov 17, 2024 | Benchmarking | —Unverified | 0 |
| Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML | Nov 17, 2024 | BenchmarkingFairness | —Unverified | 0 |
| FastDraft: How to Train Your Draft | Nov 17, 2024 | BenchmarkingCode Completion | —Unverified | 0 |
| Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections | Nov 16, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods | Nov 15, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering | Nov 15, 2024 | BenchmarkingClustering | —Unverified | 0 |
| Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT | Nov 15, 2024 | Benchmarking | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking | Nov 14, 2024 | BenchmarkingDrug Discovery | —Unverified | 0 |
| A survey of probabilistic generative frameworks for molecular simulations | Nov 14, 2024 | BenchmarkingDenoising | CodeCode Available | 0 |
| Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and Forecasts | Nov 14, 2024 | Benchmarking | CodeCode Available | 3 |
| BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation | Nov 14, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset | Nov 13, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| A Survey on Vision Autoregressive Model | Nov 13, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere | Nov 13, 2024 | BenchmarkingDataset Generation | —Unverified | 0 |
| FM-TS: Flow Matching for Time Series Generation | Nov 12, 2024 | BenchmarkingImputation | CodeCode Available | 1 |
| Evaluating the Generation of Spatial Relations in Text and Image Generative Models | Nov 12, 2024 | BenchmarkingImage Generation | —Unverified | 0 |
| Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Nov 11, 2024 | 16kBenchmarking | CodeCode Available | 0 |
| BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes | Nov 11, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 |
| General Geospatial Inference with a Population Dynamics Foundation Model | Nov 11, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 3 |
| Benchmarking LLMs' Judgments with No Gold Standard | Nov 11, 2024 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification | Nov 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design | Nov 10, 2024 | 3D geometryBenchmarking | —Unverified | 0 |
| Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication | Nov 9, 2024 | BenchmarkingIntegrated sensing and communication | —Unverified | 0 |
| Benchmarking 3D multi-coil NC-PDNet MRI reconstruction | Nov 8, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| FactLens: Benchmarking Fine-Grained Fact Verification | Nov 8, 2024 | BenchmarkingFact Verification | —Unverified | 0 |
| Open-set object detection: towards unified problem formulation and benchmarking | Nov 8, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking Distributional Alignment of Large Language Models | Nov 8, 2024 | Benchmarking | CodeCode Available | 0 |
| A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics | Nov 8, 2024 | Benchmarking | —Unverified | 0 |
| ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Nov 7, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale | Nov 7, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis | Nov 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images | Nov 7, 2024 | AnatomyBenchmarking | —Unverified | 0 |