| A Time Series is Worth 64 Words: Long-term Forecasting with Transformers | Nov 27, 2022 | Multivariate Time Series ForecastingRepresentation Learning | CodeCode Available | 5 |
| Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization | Apr 15, 2024 | Audio Generation | CodeCode Available | 5 |
| GLEAN: Generative Latent Bank for Image Super-Resolution and Beyond | Jul 29, 2022 | ColorizationDecoder | CodeCode Available | 5 |
| LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Aug 13, 2024 | | CodeCode Available | 5 |
| Latte: Latent Diffusion Transformer for Video Generation | Jan 5, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| StableAnimator: High-Quality Identity-Preserving Human Image Animation | Nov 26, 2024 | DenoisingFace Reenactment | CodeCode Available | 5 |
| ImageBind-LLM: Multi-modality Instruction Tuning | Sep 7, 2023 | Instruction FollowingText Generation | CodeCode Available | 5 |
| DanceGRPO: Unleashing GRPO on Visual Generation | May 12, 2025 | Denoisingreinforcement-learning | CodeCode Available | 5 |
| Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models | May 24, 2025 | Position | CodeCode Available | 5 |
| BERTopic: Neural topic modeling with a class-based TF-IDF procedure | Mar 11, 2022 | ClusteringDocument Embedding | CodeCode Available | 5 |
| Structure-Aware Sparse-View X-ray 3D Reconstruction | Nov 18, 2023 | 3D ReconstructionCT Reconstruction | CodeCode Available | 5 |
| Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model | Jun 27, 2024 | MambaSegmentation | CodeCode Available | 5 |
| Do "English" Named Entity Recognizers Work Well on Global Englishes? | Apr 20, 2024 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 5 |
| Matching Anything by Segmenting Anything | Jun 6, 2024 | Domain GeneralizationMultiple Object Tracking | CodeCode Available | 5 |
| DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning | May 20, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 5 |
| Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks | Dec 20, 2024 | AllRAG | CodeCode Available | 5 |
| Learning to (Learn at Test Time): RNNs with Expressive Hidden States | Jul 5, 2024 | 16k8k | CodeCode Available | 5 |
| FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion | Jun 11, 2024 | GPU | CodeCode Available | 5 |
| RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | Jan 31, 2024 | Question AnsweringRetrieval | CodeCode Available | 5 |
| PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | Mar 7, 2024 | 4kImage Captioning | CodeCode Available | 5 |
| DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models | Nov 2, 2022 | Image GenerationText to Image Generation | CodeCode Available | 5 |
| LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | Apr 9, 2024 | Contrastive LearningDecoder | CodeCode Available | 5 |
| EvTexture: Event-driven Texture Enhancement for Video Super-Resolution | Jun 19, 2024 | Event-based visionSuper-Resolution | CodeCode Available | 5 |
| AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding | May 6, 2024 | Metric LearningSelf-Supervised Learning | CodeCode Available | 5 |
| Wings: Learning Multimodal LLMs without Text-only Forgetting | Jun 5, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 5 |
| OminiControl: Minimal and Universal Control for Diffusion Transformer | Nov 22, 2024 | | CodeCode Available | 5 |
| Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI | Jul 9, 2024 | Survey | CodeCode Available | 5 |
| VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | Jan 22, 2025 | PhilosophyVideo Question Answering | CodeCode Available | 5 |
| StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text | Mar 21, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond | Mar 28, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 5 |
| Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Mar 9, 2023 | DecoderObject Detection | CodeCode Available | 5 |
| Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise | Jan 14, 2025 | Optical Flow Estimation | CodeCode Available | 5 |
| TrustRAG: An Information Assistant with Retrieval Augmented Generation | Feb 19, 2025 | Answer GenerationChunking | CodeCode Available | 5 |
| ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving | Apr 25, 2024 | Diversity | CodeCode Available | 5 |
| Parrot: Multilingual Visual Instruction Tuning | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| Improved Differentially Private Regression via Gradient Boosting | Mar 6, 2023 | regression | CodeCode Available | 5 |
| AIDE: AI-Driven Exploration in the Space of Code | Feb 18, 2025 | | CodeCode Available | 5 |
| WizardLM: Empowering Large Language Models to Follow Complex Instructions | Apr 24, 2023 | Instruction Following | CodeCode Available | 5 |
| Ovis: Structural Embedding Alignment for Multimodal Large Language Model | May 31, 2024 | Language ModelingMultimodal Large Language Model | CodeCode Available | 5 |
| DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation | Oct 24, 2024 | Image RestorationPrompt Learning | CodeCode Available | 5 |
| Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | Aug 24, 2023 | Chart Question AnsweringFS-MEVQA | CodeCode Available | 5 |
| MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning | Oct 24, 2023 | | CodeCode Available | 5 |
| Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models | May 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Assessing Language Model Deployment with Risk Cards | Mar 31, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| UniVLA: Learning to Act Anywhere with Task-centric Latent Actions | May 9, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 5 |
| SantaCoder: don't reach for the stars! | Jan 9, 2023 | Code GenerationPII Redaction | CodeCode Available | 5 |
| Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | May 18, 2024 | Mixture-of-ExpertsVisual Question Answering | CodeCode Available | 5 |
| Evolutionary Optimization of Model Merging Recipes | Mar 19, 2024 | Evolutionary AlgorithmsMath | CodeCode Available | 5 |
| MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation | Sep 19, 2022 | DecoderImage Generation | CodeCode Available | 5 |
| Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator | Mar 13, 2024 | | CodeCode Available | 5 |