| In Search of Lost Online Test-time Adaptation: A Survey | Oct 31, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| Re-evaluating Retrosynthesis Algorithms with Syntheseus | Oct 30, 2023 | BenchmarkingMulti-step retrosynthesis | CodeCode Available | 1 |
| MLFMF: Data Sets for Machine Learning for Mathematical Formalization | Oct 24, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 1 |
| CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks | Oct 23, 2023 | Benchmarking | CodeCode Available | 1 |
| Fast hyperboloid decision tree algorithms | Oct 20, 2023 | BenchmarkingRiemannian optimization | CodeCode Available | 1 |
| MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark | Oct 20, 2023 | Benchmarkingde-en | CodeCode Available | 1 |
| OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift | Oct 19, 2023 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| Object-aware Inversion and Reassembly for Image Editing | Oct 18, 2023 | BenchmarkingDenoising | CodeCode Available | 1 |
| To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now | Oct 18, 2023 | Adversarial Robustness | CodeCode Available | 1 |
| FactCHD: Benchmarking Fact-Conflicting Hallucination Detection | Oct 18, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |