| GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Apr 3, 2025 | Image GenerationWorld Knowledge | CodeCode Available | 3 |
| GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views | Apr 2, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 |
| AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning | Jun 16, 2025 | Action GenerationAutonomous Driving | CodeCode Available | 3 |
| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation | Apr 15, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 |
| LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Jun 28, 2024 | Vision-Language-ActionWorld Knowledge | CodeCode Available | 3 |
| Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Jan 3, 2025 | Recommendation SystemsWorld Knowledge | CodeCode Available | 3 |
| Are We on the Right Way for Evaluating Large Vision-Language Models? | Mar 29, 2024 | World Knowledge | CodeCode Available | 3 |
| GreaseLM: Graph REASoning Enhanced Language Models for Question Answering | Jan 21, 2022 | Knowledge GraphsMedical Question Answering | CodeCode Available | 2 |
| Grasp-Anything: Large-scale Grasp Dataset from Foundation Models | Sep 18, 2023 | DiversityRobotic Grasping | CodeCode Available | 2 |