| Tag2Text: Guiding Vision-Language Model via Image Tagging | Mar 10, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss | Feb 16, 2024 | RAG | CodeCode Available | 4 | 5 |
| ImgEdit: A Unified Image Editing Dataset and Benchmark | May 26, 2025 | Image Editing | CodeCode Available | 4 | 5 |
| Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling | Nov 1, 2023 | HallucinationKnowledge Distillation | CodeCode Available | 4 | 5 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 | 5 |
| Image Fusion via Vision-Language Model | Feb 3, 2024 | DecoderLanguage Modeling | CodeCode Available | 4 | 5 |
| Looking Backward: Streaming Video-to-Video Translation with Feature Banks | May 24, 2024 | GPUTranslation | CodeCode Available | 4 | 5 |
| Restructuring Vector Quantization with the Rotation Trick | Oct 8, 2024 | Quantization | CodeCode Available | 4 | 5 |
| ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents | Feb 25, 2025 | Question AnsweringRAG | CodeCode Available | 4 | 5 |
| SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis | May 22, 2025 | DiversityInformation Retrieval | CodeCode Available | 4 | 5 |
| JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Nov 14, 2024 | Image AnimationMotion Generation | CodeCode Available | 4 | 5 |
| TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models | May 18, 2023 | Natural Language InferenceSynthetic Data Generation | CodeCode Available | 4 | 5 |
| Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey | Feb 3, 2024 | parameter-efficient fine-tuningTransfer Learning | CodeCode Available | 4 | 5 |
| OpenAgents: An Open Platform for Language Agents in the Wild | Oct 16, 2023 | 2D Object Detection | CodeCode Available | 4 | 5 |
| Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis | Jun 1, 2023 | Audio SynthesisComputational Efficiency | CodeCode Available | 4 | 5 |
| A Survey on Diffusion Models for Time Series and Spatio-Temporal Data | Apr 29, 2024 | Anomaly DetectionImputation | CodeCode Available | 4 | 5 |
| OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM | Feb 14, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 4 | 5 |
| Factorio Learning Environment | Mar 6, 2025 | Program SynthesisSpatial Reasoning | CodeCode Available | 4 | 5 |
| GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation | May 26, 2025 | Question AnsweringSynthetic Data Generation | CodeCode Available | 4 | 5 |
| SimPO: Simple Preference Optimization with a Reference-Free Reward | May 23, 2024 | ChatbotInstruction Following | CodeCode Available | 4 | 5 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 | 5 |
| Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Apr 21, 2025 | Video Generation | CodeCode Available | 4 | 5 |
| ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning | Aug 4, 2024 | DecoderImitation Learning | CodeCode Available | 4 | 5 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 | 5 |
| TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities | Dec 13, 2022 | Decoder | CodeCode Available | 4 | 5 |