| Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Oct 17, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators | Feb 22, 2022 | Weather Forecasting | CodeCode Available | 2 |
| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 |
| Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation | Mar 31, 2025 | Image to Video Generation | CodeCode Available | 2 |
| VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | May 25, 2025 | Multimodal ReasoningQuestion Answering | CodeCode Available | 2 |
| MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations | Oct 26, 2023 | Imitation Learning | CodeCode Available | 2 |
| Where am I? Cross-View Geo-localization with Natural Language Descriptions | Dec 22, 2024 | geo-localizationImage Retrieval | CodeCode Available | 2 |
| VideoComposer: Compositional Video Synthesis with Motion Controllability | Jun 3, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 2 |
| Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search | Aug 20, 2024 | Decision MakingDialogue Generation | CodeCode Available | 2 |
| Excess Mass Estimates and Tests for Multimodality | Sep 1, 1991 | | CodeCode Available | 2 |
| Recommender Systems with Generative Retrieval | May 8, 2023 | Recommendation SystemsRetrieval | CodeCode Available | 2 |
| BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning | Apr 4, 2022 | image-classificationImage Classification | CodeCode Available | 2 |
| CausalVAE: Structured Causal Disentanglement in Variational Autoencoder | Apr 18, 2020 | counterfactualDisentanglement | CodeCode Available | 2 |
| Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers | Nov 8, 2023 | | CodeCode Available | 2 |
| Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss | Dec 27, 2023 | | CodeCode Available | 2 |
| Depth Field Networks for Generalizable Multi-view Scene Representation | Jul 28, 2022 | Data AugmentationDepth Estimation | CodeCode Available | 2 |
| Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior | Apr 10, 2024 | 3D GenerationModel Optimization | CodeCode Available | 2 |
| Diffsound: Discrete Diffusion Model for Text-to-sound Generation | Jul 20, 2022 | Audio GenerationDecoder | CodeCode Available | 2 |
| BitNet: Scaling 1-bit Transformers for Large Language Models | Oct 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Utilizing Image Transforms and Diffusion Models for Generative Modeling of Short and Long Time Series | Oct 25, 2024 | State Space ModelsTime Series | CodeCode Available | 2 |
| StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images | Jun 19, 2024 | Object RecognitionScene Understanding | CodeCode Available | 2 |
| PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection | Oct 10, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation | Mar 25, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement | Apr 2, 2025 | DecoderImage Generation | CodeCode Available | 2 |
| Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs | Mar 31, 2025 | Large Language ModelVideo Chaptering | CodeCode Available | 2 |
| eRST: A Signaled Graph Theory of Discourse Relations and Organization | Mar 20, 2024 | | CodeCode Available | 2 |
| self-prompting analogical reasoning for uav object detection | Apr 11, 2025 | graph constructionobject-detection | CodeCode Available | 2 |
| SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations | May 4, 2025 | Data Augmentation | CodeCode Available | 2 |
| Explainable AI in Spatial Analysis | May 1, 2025 | Bias DetectionExplainable artificial intelligence | CodeCode Available | 2 |
| AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | Aug 2, 2022 | Causal Language ModelingCommon Sense Reasoning | CodeCode Available | 2 |
| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 |
| GSPMD: General and Scalable Parallelization for ML Computation Graphs | May 10, 2021 | Playing the Game of 2048 | CodeCode Available | 2 |
| The More You See in 2D, the More You Perceive in 3D | Apr 4, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 2 |
| SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | Jul 12, 2024 | In-Context LearningTable Detection | CodeCode Available | 2 |
| Multi-Grained Angle Representation for Remote Sensing Object Detection | Sep 7, 2022 | Objectobject-detection | CodeCode Available | 2 |
| What Makes a Good Diffusion Planner for Decision Making? | Mar 1, 2025 | Action GenerationDecision Making | CodeCode Available | 2 |
| Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information | Jun 11, 2025 | | CodeCode Available | 2 |
| 4-bit Conformer with Native Quantization Aware Training for Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| MVDream: Multi-view Diffusion for 3D Generation | Aug 31, 2023 | 3D GenerationPrompt Learning | CodeCode Available | 2 |
| Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning | Jun 14, 2024 | | CodeCode Available | 2 |
| Scaling Down Text Encoders of Text-to-Image Diffusion Models | Mar 25, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Fully Geometric Panoramic Localization | Mar 29, 2024 | Indoor LocalizationVisual Localization | CodeCode Available | 2 |
| Find Any Part in 3D | Nov 20, 2024 | 3D Part SegmentationDiversity | CodeCode Available | 2 |
| GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting | May 13, 2024 | 3D scene EditingVirtual Try-on | CodeCode Available | 2 |
| AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control | Apr 5, 2021 | Imitation LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| PaLM-E: An Embodied Multimodal Language Model | Mar 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations | Sep 22, 2016 | GPU | CodeCode Available | 2 |
| Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration | Jul 7, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| PRAM: Place Recognition Anywhere Model for Efficient Visual Localization | Apr 11, 2024 | Autonomous DrivingLandmark Recognition | CodeCode Available | 2 |