| Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization | Nov 15, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration | Nov 14, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models | Nov 13, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Combinatorial Optimization with Policy Adaptation using Latent Space Search | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Flames: Benchmarking Value Alignment of LLMs in Chinese | Nov 12, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| MultiIoT: Benchmarking Machine Learning for the Internet of Things | Nov 10, 2023 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs | Nov 9, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| The voraus-AD Dataset for Anomaly Detection in Robot Applications | Nov 8, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| The PetShop Dataset -- Finding Causes of Performance Issues across Microservices | Nov 8, 2023 | Benchmarking | CodeCode Available | 1 |
| Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts | Nov 7, 2023 | BenchmarkingMachine Translation | CodeCode Available | 1 |
| Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089 | Nov 6, 2023 | BenchmarkingKnowledge Base Question Answering | CodeCode Available | 1 |
| Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding | Nov 6, 2023 | BenchmarkingData Compression | CodeCode Available | 1 |
| Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones | Nov 5, 2023 | Benchmarking | CodeCode Available | 1 |
| JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds | Nov 5, 2023 | Autonomous NavigationAutonomous Vehicles | CodeCode Available | 1 |
| FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation | Nov 4, 2023 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications | Nov 4, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO | Nov 2, 2023 | BenchmarkingEdge-computing | CodeCode Available | 1 |
| EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence | Nov 1, 2023 | BenchmarkingCryogenic Electron Microscopy (cryo-EM) | CodeCode Available | 1 |
| In Search of Lost Online Test-time Adaptation: A Survey | Oct 31, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| Re-evaluating Retrosynthesis Algorithms with Syntheseus | Oct 30, 2023 | BenchmarkingMulti-step retrosynthesis | CodeCode Available | 1 |
| MLFMF: Data Sets for Machine Learning for Mathematical Formalization | Oct 24, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 1 |
| CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks | Oct 23, 2023 | Benchmarking | CodeCode Available | 1 |
| MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark | Oct 20, 2023 | Benchmarkingde-en | CodeCode Available | 1 |