| Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing | Nov 1, 2024 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 |
| Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model | Nov 1, 2024 | BenchmarkingCross-Domain Named Entity Recognition | —Unverified | 0 |
| A Review of Reinforcement Learning in Financial Applications | Nov 1, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| IdeaBench: Benchmarking Large Language Models for Research Idea Generation | Oct 31, 2024 | Benchmarkingscientific discovery | CodeCode Available | 0 |
| Benchmark Data Repositories for Better Benchmarking | Oct 31, 2024 | Benchmarking | —Unverified | 0 |
| NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation | Oct 30, 2024 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Evaluating Cultural and Social Awareness of LLM Web Agents | Oct 30, 2024 | BenchmarkingNavigate | —Unverified | 0 |
| Low-Density 3D Point Cloud Classification | Oct 30, 2024 | 3D Point Cloud ClassificationAutonomous Driving | —Unverified | 0 |
| DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes | Oct 30, 2024 | Benchmarking | —Unverified | 0 |