| The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods | Nov 15, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking | Nov 14, 2024 | BenchmarkingDrug Discovery | —Unverified | 0 |
| BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation | Nov 14, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| A survey of probabilistic generative frameworks for molecular simulations | Nov 14, 2024 | BenchmarkingDenoising | CodeCode Available | 0 |
| Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset | Nov 13, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere | Nov 13, 2024 | BenchmarkingDataset Generation | —Unverified | 0 |
| A Survey on Vision Autoregressive Model | Nov 13, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| Evaluating the Generation of Spatial Relations in Text and Image Generative Models | Nov 12, 2024 | BenchmarkingImage Generation | —Unverified | 0 |
| BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes | Nov 11, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 |
| Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Nov 11, 2024 | 16kBenchmarking | CodeCode Available | 0 |
| Benchmarking LLMs' Judgments with No Gold Standard | Nov 11, 2024 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design | Nov 10, 2024 | 3D geometryBenchmarking | —Unverified | 0 |
| Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication | Nov 9, 2024 | BenchmarkingIntegrated sensing and communication | —Unverified | 0 |
| Benchmarking Distributional Alignment of Large Language Models | Nov 8, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking 3D multi-coil NC-PDNet MRI reconstruction | Nov 8, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| FactLens: Benchmarking Fine-Grained Fact Verification | Nov 8, 2024 | BenchmarkingFact Verification | —Unverified | 0 |
| A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics | Nov 8, 2024 | Benchmarking | —Unverified | 0 |
| Open-set object detection: towards unified problem formulation and benchmarking | Nov 8, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis | Nov 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Nov 7, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Perspective on recent developments and challenges in regulatory and systems genomics | Nov 7, 2024 | Benchmarking | —Unverified | 0 |
| HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images | Nov 7, 2024 | AnatomyBenchmarking | —Unverified | 0 |
| Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Nov 7, 2024 | Benchmarking | —Unverified | 0 |
| Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet | Nov 7, 2024 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Benchmarking Large Language Models with Integer Sequence Generation Tasks | Nov 7, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale | Nov 7, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking | Nov 6, 2024 | Benchmarking | —Unverified | 0 |
| Beemo: Benchmark of Expert-edited Machine-generated Outputs | Nov 6, 2024 | Benchmarking | CodeCode Available | 0 |
| SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration | Nov 5, 2024 | Benchmarkingregression | —Unverified | 0 |
| On the Loss of Context-awareness in General Instruction Fine-tuning | Nov 5, 2024 | BenchmarkingInstruction Following | CodeCode Available | 0 |
| TDDBench: A Benchmark for Training data detection | Nov 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level | Nov 5, 2024 | Bayesian OptimisationBenchmarking | —Unverified | 0 |
| Imagining and building wise machines: The centrality of AI metacognition | Nov 4, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| Benchmarking XAI Explanations with Human-Aligned Evaluations | Nov 4, 2024 | Benchmarking | —Unverified | 0 |
| SinaTools: Open Source Toolkit for Arabic Natural Language Processing | Nov 3, 2024 | BenchmarkingLemmatization | —Unverified | 0 |
| Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models | Nov 2, 2024 | Benchmarking | —Unverified | 0 |
| FEET: A Framework for Evaluating Embedding Techniques | Nov 2, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |
| Artificial Intelligence for Microbiology and Microbiome Research | Nov 2, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| Modern, Efficient, and Differentiable Transport Equation Models using JAX: Applications to Population Balance Equations | Nov 1, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Benchmarking Bias in Large Language Models during Role-Playing | Nov 1, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing | Nov 1, 2024 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 |
| Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model | Nov 1, 2024 | BenchmarkingCross-Domain Named Entity Recognition | —Unverified | 0 |
| A Review of Reinforcement Learning in Financial Applications | Nov 1, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| IdeaBench: Benchmarking Large Language Models for Research Idea Generation | Oct 31, 2024 | Benchmarkingscientific discovery | CodeCode Available | 0 |
| Benchmark Data Repositories for Better Benchmarking | Oct 31, 2024 | Benchmarking | —Unverified | 0 |
| NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation | Oct 30, 2024 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Evaluating Cultural and Social Awareness of LLM Web Agents | Oct 30, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| Low-Density 3D Point Cloud Classification | Oct 30, 2024 | 3D Point Cloud ClassificationAutonomous Driving | —Unverified | 0 |
| DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes | Oct 30, 2024 | Benchmarking | —Unverified | 0 |