| TDDBench: A Benchmark for Training data detection | Nov 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping | Nov 5, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Nov 5, 2024 | BenchmarkingHallucination | CodeCode Available | 3 |
| On the Loss of Context-awareness in General Instruction Fine-tuning | Nov 5, 2024 | BenchmarkingInstruction Following | CodeCode Available | 0 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 |
| Benchmarking XAI Explanations with Human-Aligned Evaluations | Nov 4, 2024 | Benchmarking | —Unverified | 0 |
| Imagining and building wise machines: The centrality of AI metacognition | Nov 4, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation | Nov 4, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| TableGPT2: A Large Multimodal Model with Tabular Data Integration | Nov 4, 2024 | BenchmarkingData Integration | CodeCode Available | 4 |
| SinaTools: Open Source Toolkit for Arabic Natural Language Processing | Nov 3, 2024 | BenchmarkingLemmatization | —Unverified | 0 |