| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope? | Feb 18, 2025 | BenchmarkingBlocking | CodeCode Available | 1 |
| ILIAS: Instance-Level Image retrieval At Scale | Feb 17, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 1 |
| Positional Encoding in Transformer-Based Time Series Models: A Survey | Feb 17, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims | Feb 17, 2025 | BenchmarkingFact Checking | CodeCode Available | 1 |
| Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Feb 13, 2025 | BenchmarkingRetrieval | CodeCode Available | 1 |
| LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data | Feb 13, 2025 | BenchmarkingState Space Models | CodeCode Available | 1 |
| Foundation Model of Electronic Medical Records for Adaptive Risk Estimation | Feb 10, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments | Feb 10, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 1 |
| ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts | Feb 8, 2025 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 1 |