| HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images | Nov 7, 2024 | AnatomyBenchmarking | —Unverified | 0 |
| Perspective on recent developments and challenges in regulatory and systems genomics | Nov 7, 2024 | Benchmarking | —Unverified | 0 |
| HourVideo: 1-Hour Video-Language Understanding | Nov 7, 2024 | Benchmarkingcounterfactual | CodeCode Available | 2 |
| Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet | Nov 7, 2024 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Benchmarking Large Language Models with Integer Sequence Generation Tasks | Nov 7, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking | Nov 6, 2024 | Benchmarking | —Unverified | 0 |
| Beemo: Benchmark of Expert-edited Machine-generated Outputs | Nov 6, 2024 | Benchmarking | CodeCode Available | 0 |
| SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration | Nov 5, 2024 | Benchmarkingregression | —Unverified | 0 |
| TDDBench: A Benchmark for Training data detection | Nov 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset | Nov 5, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level | Nov 5, 2024 | Bayesian OptimisationBenchmarking | —Unverified | 0 |
| On the Loss of Context-awareness in General Instruction Fine-tuning | Nov 5, 2024 | BenchmarkingInstruction Following | CodeCode Available | 0 |
| Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Nov 5, 2024 | BenchmarkingHallucination | CodeCode Available | 3 |
| Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping | Nov 5, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 |
| Imagining and building wise machines: The centrality of AI metacognition | Nov 4, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| Benchmarking XAI Explanations with Human-Aligned Evaluations | Nov 4, 2024 | Benchmarking | —Unverified | 0 |
| LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation | Nov 4, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| TableGPT2: A Large Multimodal Model with Tabular Data Integration | Nov 4, 2024 | BenchmarkingData Integration | CodeCode Available | 4 |
| ROAD-Waymo: Action Awareness at Scale for Autonomous Driving | Nov 3, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| SinaTools: Open Source Toolkit for Arabic Natural Language Processing | Nov 3, 2024 | BenchmarkingLemmatization | —Unverified | 0 |
| FEET: A Framework for Evaluating Embedding Techniques | Nov 2, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |
| Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models | Nov 2, 2024 | Benchmarking | —Unverified | 0 |
| Artificial Intelligence for Microbiology and Microbiome Research | Nov 2, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| A Review of Reinforcement Learning in Financial Applications | Nov 1, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Modern, Efficient, and Differentiable Transport Equation Models using JAX: Applications to Population Balance Equations | Nov 1, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model | Nov 1, 2024 | BenchmarkingCross-Domain Named Entity Recognition | —Unverified | 0 |
| MIRFLEX: Music Information Retrieval Feature Library for Extraction | Nov 1, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 1 |
| Benchmarking Bias in Large Language Models during Role-Playing | Nov 1, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing | Nov 1, 2024 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 |
| LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nov 1, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 |
| LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators | Oct 31, 2024 | BenchmarkingText Generation | CodeCode Available | 2 |
| IdeaBench: Benchmarking Large Language Models for Research Idea Generation | Oct 31, 2024 | Benchmarkingscientific discovery | CodeCode Available | 0 |
| LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction | Oct 31, 2024 | BenchmarkingPrediction | CodeCode Available | 1 |
| Pedestrian Trajectory Prediction with Missing Data: Datasets, Imputation, and Benchmarking | Oct 31, 2024 | BenchmarkingImputation | CodeCode Available | 1 |
| EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography | Oct 31, 2024 | BenchmarkingElectromyography (EMG) | CodeCode Available | 1 |
| Benchmark Data Repositories for Better Benchmarking | Oct 31, 2024 | Benchmarking | —Unverified | 0 |
| XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM | Oct 31, 2024 | 3DGSBenchmarking | CodeCode Available | 3 |
| AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | Oct 31, 2024 | Benchmarking | CodeCode Available | 3 |
| DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Oct 31, 2024 | BenchmarkingLLM-generated Text Detection | CodeCode Available | 1 |
| AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery | Oct 31, 2024 | BenchmarkingCloud Removal | CodeCode Available | 1 |
| CALE: Continuous Arcade Learning Environment | Oct 31, 2024 | Atari GamesBenchmarking | CodeCode Available | 7 |
| Low-Density 3D Point Cloud Classification | Oct 30, 2024 | 3D Point Cloud ClassificationAutonomous Driving | —Unverified | 0 |
| Survey of Cultural Awareness in Language Models: Text and Beyond | Oct 30, 2024 | Benchmarking | CodeCode Available | 1 |
| NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation | Oct 30, 2024 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes | Oct 30, 2024 | Benchmarking | —Unverified | 0 |
| InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models | Oct 30, 2024 | Benchmarking | CodeCode Available | 2 |
| CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation | Oct 30, 2024 | BenchmarkingPassage Retrieval | CodeCode Available | 2 |
| Evaluating Cultural and Social Awareness of LLM Web Agents | Oct 30, 2024 | BenchmarkingNavigate | —Unverified | 0 |