| Multi-Behavior Recommendation with Personalized Directed Acyclic Behavior Graphs | Dec 9, 2024 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Does your model understand genes? A benchmark of gene properties for biological and text models | Dec 5, 2024 | BenchmarkingMulti-class Classification | CodeCode Available | 1 |
| Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Dec 5, 2024 | AttributeBenchmarking | CodeCode Available | 1 |
| Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs" | Dec 2, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis | Nov 29, 2024 | BenchmarkingClaim Verification | CodeCode Available | 1 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |
| CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models | Nov 27, 2024 | BenchmarkingEarth Observation | CodeCode Available | 1 |
| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Nov 26, 2024 | BenchmarkingText-to-Video Generation | CodeCode Available | 1 |
| VidHal: Benchmarking Temporal Hallucinations in Vision LLMs | Nov 25, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks | Nov 25, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 |