| My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks | Jun 24, 2023 | BenchmarkingHate Speech Detection | —Unverified | 0 |
| MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models | Jun 23, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| OptIForest: Optimal Isolation Forest for Anomaly Detection | Jun 22, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase | Jun 21, 2023 | 3D-Aware Image SynthesisBenchmarking | CodeCode Available | 1 |
| GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection | Jun 21, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| On-orbit model training for satellite imagery with label proportions | Jun 21, 2023 | BenchmarkingEarth Observation | CodeCode Available | 0 |
| On Evaluation of Document Classification using RVL-CDIP | Jun 21, 2023 | BenchmarkingClassification | —Unverified | 0 |
| VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution | Jun 21, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |
| Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious Features | Jun 21, 2023 | BenchmarkingModel Selection | CodeCode Available | 1 |