| Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling | Nov 21, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series | Nov 21, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Multi-Agent Environments for Vehicle Routing Problems | Nov 21, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 1 |
| Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking | Nov 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver | Nov 20, 2024 | Benchmarking | —Unverified | 0 |
| Delta-Influence: Unlearning Poisons via Influence Functions | Nov 20, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Nov 20, 2024 | BenchmarkingImage Generation | CodeCode Available | 5 |
| BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Nov 20, 2024 | BenchmarkingPoint Cloud Segmentation | —Unverified | 0 |
| BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Nov 20, 2024 | BenchmarkingNetHack | —Unverified | 0 |
| The Moral Mind(s) of Large Language Models | Nov 19, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis | Nov 19, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Positional Encodings for GNNs and Graph Transformers | Nov 19, 2024 | Benchmarking | CodeCode Available | 0 |
| DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Nov 19, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Introducing Milabench: Benchmarking Accelerators for AI | Nov 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Benchmarking pre-trained text embedding models in aligning built asset information | Nov 18, 2024 | Asset ManagementBenchmarking | CodeCode Available | 0 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 |
| Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies | Nov 17, 2024 | Benchmarking | —Unverified | 0 |
| FastDraft: How to Train Your Draft | Nov 17, 2024 | BenchmarkingCode Completion | —Unverified | 0 |
| Reinforcing Competitive Multi-Agents for Playing So Long Sucker | Nov 17, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML | Nov 17, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections | Nov 16, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods | Nov 15, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering | Nov 15, 2024 | BenchmarkingClustering | —Unverified | 0 |
| Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT | Nov 15, 2024 | Benchmarking | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |