| Efficient Diffusion Model for Image Restoration by Residual Shifting | Mar 12, 2024 | Blind Face RestorationImage Inpainting | CodeCode Available | 5 |
| Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | Mar 11, 2024 | Object DetectionOpen-vocabulary object detection | CodeCode Available | 5 |
| VideoMamba: State Space Model for Efficient Video Understanding | Mar 11, 2024 | Action ClassificationMamba | CodeCode Available | 5 |
| BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion | Mar 11, 2024 | Image Inpainting | CodeCode Available | 5 |
| CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion | Mar 8, 2024 | Computational EfficiencyImage Generation | CodeCode Available | 5 |
| ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment | Mar 8, 2024 | DenoisingImage Generation | CodeCode Available | 5 |
| TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document | Mar 7, 2024 | document understandingKey Information Extraction | CodeCode Available | 5 |
| Controllable Generation with Text-to-Image Diffusion Models: A Survey | Mar 7, 2024 | Denoising | CodeCode Available | 5 |
| Common 7B Language Models Already Possess Strong Math Capabilities | Mar 7, 2024 | GSM8KMath | CodeCode Available | 5 |
| PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | Mar 7, 2024 | 4kImage Captioning | CodeCode Available | 5 |
| GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | Mar 6, 2024 | | CodeCode Available | 5 |
| 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations | Mar 6, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 5 |
| Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral | Mar 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| APISR: Anime Production Inspired Real-World Anime Super-Resolution | Mar 3, 2024 | Super-Resolution | CodeCode Available | 5 |
| LAB: Large-Scale Alignment for ChatBots | Mar 2, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 5 |
| FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning | Feb 29, 2024 | GPULanguage Modeling | CodeCode Available | 5 |
| Retrieval-Augmented Generation for AI-Generated Content: A Survey | Feb 29, 2024 | Information RetrievalLarge Language Model | CodeCode Available | 5 |
| Deep Confident Steps to New Pockets: Strategies for Docking Generalization | Feb 28, 2024 | Blind Docking | CodeCode Available | 5 |
| Datasets for Large Language Models: A Comprehensive Survey | Feb 28, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 |
| Information Flow Routes: Automatically Interpreting Language Models at Scale | Feb 27, 2024 | | CodeCode Available | 5 |
| Language Agents as Optimizable Graphs | Feb 26, 2024 | Prompt Engineering | CodeCode Available | 5 |
| MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | Feb 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| MambaIR: A Simple Baseline for Image Restoration with State-Space Model | Feb 23, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 5 |
| Repetition Improves Language Model Embeddings | Feb 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | Feb 22, 2024 | Code GenerationHumanEval | CodeCode Available | 5 |
| MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases | Feb 22, 2024 | | CodeCode Available | 5 |
| How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey | Feb 20, 2024 | 3DGSSimultaneous Localization and Mapping | CodeCode Available | 5 |
| VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning | Feb 20, 2024 | Autonomous DrivingNavSim | CodeCode Available | 5 |
| A Survey on Knowledge Distillation of Large Language Models | Feb 20, 2024 | Data AugmentationKnowledge Distillation | CodeCode Available | 5 |
| Efficient Multimodal Learning from Data-centric Perspective | Feb 18, 2024 | Image ClassificationReferring Expression Comprehension | CodeCode Available | 5 |
| Trust Regions for Explanations via Black-Box Probabilistic Certification | Feb 17, 2024 | | CodeCode Available | 5 |
| BlackJAX: Composable Bayesian inference in JAX | Feb 16, 2024 | Bayesian InferenceProbabilistic Programming | CodeCode Available | 5 |
| DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows | Feb 16, 2024 | Synthetic Data Generation | CodeCode Available | 5 |
| GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting | Feb 15, 2024 | 3D Object ReconstructionNeural Rendering | CodeCode Available | 5 |
| OS-Copilot: Towards Generalist Computer Agents with Self-Improvement | Feb 12, 2024 | | CodeCode Available | 5 |
| Online Iterative Reinforcement Learning from Human Feedback with General Preference Model | Feb 11, 2024 | | CodeCode Available | 5 |
| WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | Feb 8, 2024 | Conversational Web NavigationText Generation | CodeCode Available | 5 |
| MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | Feb 6, 2024 | AutoMLLanguage Modeling | CodeCode Available | 5 |
| EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models | Feb 5, 2024 | | CodeCode Available | 5 |
| Unified Training of Universal Time Series Forecasting Transformers | Feb 4, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 5 |
| Break the Sequential Dependency of LLM Inference Using Lookahead Decoding | Feb 3, 2024 | Code Completion | CodeCode Available | 5 |
| Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities | Feb 2, 2024 | Acoustic Scene ClassificationAudio captioning | CodeCode Available | 5 |
| Executable Code Actions Elicit Better LLM Agents | Feb 1, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 |
| BootsTAP: Bootstrapped Training for Tracking-Any-Point | Feb 1, 2024 | Point Tracking | CodeCode Available | 5 |
| SymbolicAI: A framework for logic-based approaches combining generative models and solvers | Feb 1, 2024 | Few-Shot LearningIn-Context Learning | CodeCode Available | 5 |
| MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments | Feb 1, 2024 | Embodied Question AnsweringLanguage Modeling | CodeCode Available | 5 |
| RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | Jan 31, 2024 | Question AnsweringRetrieval | CodeCode Available | 5 |
| Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research | Jan 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | Jan 29, 2024 | DecoderMixture-of-Experts | CodeCode Available | 5 |
| Off-Policy Primal-Dual Safe Reinforcement Learning | Jan 26, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 5 |