| Flamingo: a Visual Language Model for Few-Shot Learning | Apr 29, 2022 | Few-Shot LearningGenerative Visual Question Answering | CodeCode Available | 4 |
| Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content? | Feb 14, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| ControlVAE: Tuning, Analytical Properties, and Performance Analysis | Oct 31, 2020 | DisentanglementImage Generation | CodeCode Available | 4 |
| ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | Jun 22, 2025 | GPUImage Generation | CodeCode Available | 3 |
| FlexRAG: A Flexible and Comprehensive Framework for Retrieval-Augmented Generation | Jun 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning | Jun 3, 2025 | Decision MakingDiagnostic | CodeCode Available | 3 |
| VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation | May 26, 2025 | DecoderLanguage Modeling | CodeCode Available | 3 |
| LaViDa: A Large Diffusion Language Model for Multimodal Understanding | May 22, 2025 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| A Comprehensive Survey on Long Context Language Modeling | Mar 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks | Mar 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |