| Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | May 14, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 7 |
| MambaOut: Do We Really Need Mamba for Vision? | May 13, 2024 | image-classificationImage Classification | CodeCode Available | 7 |
| AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents | May 11, 2024 | | CodeCode Available | 7 |
| Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers | May 9, 2024 | | CodeCode Available | 7 |
| Mirage: A Multi-Level Superoptimizer for Tensor Programs | May 9, 2024 | GPUNavigate | CodeCode Available | 7 |
| xLSTM: Extended Long Short-Term Memory | May 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Labeling supervised fine-tuning data with the scaling law | May 5, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 7 |
| PuLID: Pure and Lightning ID Customization via Contrastive Alignment | Apr 24, 2024 | Image GenerationText to Image Generation | CodeCode Available | 7 |
| Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration | Apr 24, 2024 | ManagementPrompt Engineering | CodeCode Available | 7 |
| Better Synthetic Data by Retrieving and Transforming Existing Datasets | Apr 22, 2024 | Dataset GenerationDiversity | CodeCode Available | 7 |
| CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models | Apr 19, 2024 | | CodeCode Available | 7 |
| MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents | Apr 16, 2024 | Fact CheckingRetrieval-augmented Generation | CodeCode Available | 7 |
| Long-form music generation with latent diffusion | Apr 16, 2024 | Audio GenerationForm | CodeCode Available | 7 |
| OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | Apr 11, 2024 | Benchmarking | CodeCode Available | 7 |
| Interactive Prompt Debugging with Sequence Salience | Apr 11, 2024 | Sentencetext-classification | CodeCode Available | 7 |
| InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models | Apr 10, 2024 | Image to 3D | CodeCode Available | 7 |
| AutoCodeRover: Autonomous Program Improvement | Apr 8, 2024 | Bug fixingCode Search | CodeCode Available | 7 |
| LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | Apr 8, 2024 | | CodeCode Available | 7 |
| Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach | Apr 7, 2024 | Efficient ExplorationHyperparameter Optimization | CodeCode Available | 7 |
| InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation | Apr 3, 2024 | Image GenerationText to Image Generation | CodeCode Available | 7 |
| Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Mar 27, 2024 | Image ClassificationImage Comprehension | CodeCode Available | 7 |
| 2D Gaussian Splatting for Geometrically Accurate Radiance Fields | Mar 26, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 7 |
| Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation | Mar 22, 2024 | Depth EstimationSurface Normal Estimation | CodeCode Available | 7 |
| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 |
| Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Mar 21, 2024 | Animated GIF GenerationImage Animation | CodeCode Available | 7 |
| Foundation Models for Time Series Analysis: A Tutorial and Survey | Mar 21, 2024 | SurveyTime Series | CodeCode Available | 7 |
| One-Step Image Translation with Text-to-Image Models | Mar 18, 2024 | DenoisingTranslation | CodeCode Available | 7 |
| DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers | Mar 15, 2024 | Text GenerationVideo Generation | CodeCode Available | 7 |
| CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | Mar 14, 2024 | HumanEval | CodeCode Available | 7 |
| GenAD: Generalized Predictive Model for Autonomous Driving | Mar 14, 2024 | Autonomous Drivingmodel | CodeCode Available | 7 |
| DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation | Mar 13, 2024 | Image GenerationPrompt Engineering | CodeCode Available | 7 |
| DragAnything: Motion Control for Anything using Entity Representation | Mar 12, 2024 | ObjectVideo Generation | CodeCode Available | 7 |
| Chronos: Learning the Language of Time Series | Mar 12, 2024 | Gaussian ProcessesLanguage Modeling | CodeCode Available | 7 |
| Better than classical? The subtle art of benchmarking quantum machine learning models | Mar 11, 2024 | BenchmarkingBinary Classification | CodeCode Available | 7 |
| DeepSeek-VL: Towards Real-World Vision-Language Understanding | Mar 8, 2024 | ChatbotLanguage Modelling | CodeCode Available | 7 |
| Improving Diffusion Models for Authentic Virtual Try-on in the Wild | Mar 8, 2024 | Virtual Try-on | CodeCode Available | 7 |
| Symmetry Considerations for Learning Task Symmetric Robot Policies | Mar 7, 2024 | Data AugmentationDeep Reinforcement Learning | CodeCode Available | 7 |
| Cradle: Empowering Foundation Agents Towards General Computer Control | Mar 5, 2024 | Efficient Exploration | CodeCode Available | 7 |
| Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation | Mar 1, 2024 | | CodeCode Available | 7 |
| SoftTiger: A Clinical Foundation Model for Healthcare Workflows | Mar 1, 2024 | Language ModellingLarge Language Model | CodeCode Available | 7 |
| TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables | Feb 29, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 7 |
| StarCoder 2 and The Stack v2: The Next Generation | Feb 29, 2024 | Code CompletionCode Generation | CodeCode Available | 7 |
| Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models | Feb 29, 2024 | Language ModellingMamba | CodeCode Available | 7 |
| Transparent Image Layer Diffusion using Latent Transparency | Feb 27, 2024 | | CodeCode Available | 7 |
| Dynamic Evaluation of Large Language Models by Meta Probing Agents | Feb 21, 2024 | Data Augmentation | CodeCode Available | 7 |
| Revisiting Feature Prediction for Learning Visual Representations from Video | Feb 15, 2024 | Prediction | CodeCode Available | 7 |
| On the Vulnerability of LLM/VLM-Controlled Robotics | Feb 15, 2024 | Language ModellingRobot Manipulation | CodeCode Available | 7 |
| SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models | Feb 8, 2024 | BenchmarkingDiversity | CodeCode Available | 7 |
| Fast Timing-Conditioned Latent Audio Diffusion | Feb 7, 2024 | Audio GenerationGPU | CodeCode Available | 7 |