| Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation | Mar 25, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement | Apr 2, 2025 | DecoderImage Generation | CodeCode Available | 2 |
| Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs | Mar 31, 2025 | Large Language ModelVideo Chaptering | CodeCode Available | 2 |
| eRST: A Signaled Graph Theory of Discourse Relations and Organization | Mar 20, 2024 | | CodeCode Available | 2 |
| self-prompting analogical reasoning for uav object detection | Apr 11, 2025 | graph constructionobject-detection | CodeCode Available | 2 |
| SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations | May 4, 2025 | Data Augmentation | CodeCode Available | 2 |
| Explainable AI in Spatial Analysis | May 1, 2025 | Bias DetectionExplainable artificial intelligence | CodeCode Available | 2 |
| AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | Aug 2, 2022 | Causal Language ModelingCommon Sense Reasoning | CodeCode Available | 2 |
| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 |
| GSPMD: General and Scalable Parallelization for ML Computation Graphs | May 10, 2021 | Playing the Game of 2048 | CodeCode Available | 2 |
| The More You See in 2D, the More You Perceive in 3D | Apr 4, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 2 |
| SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | Jul 12, 2024 | In-Context LearningTable Detection | CodeCode Available | 2 |
| Multi-Grained Angle Representation for Remote Sensing Object Detection | Sep 7, 2022 | Objectobject-detection | CodeCode Available | 2 |
| What Makes a Good Diffusion Planner for Decision Making? | Mar 1, 2025 | Action GenerationDecision Making | CodeCode Available | 2 |
| Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information | Jun 11, 2025 | | CodeCode Available | 2 |
| 4-bit Conformer with Native Quantization Aware Training for Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| MVDream: Multi-view Diffusion for 3D Generation | Aug 31, 2023 | 3D GenerationPrompt Learning | CodeCode Available | 2 |
| Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning | Jun 14, 2024 | | CodeCode Available | 2 |
| Scaling Down Text Encoders of Text-to-Image Diffusion Models | Mar 25, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Fully Geometric Panoramic Localization | Mar 29, 2024 | Indoor LocalizationVisual Localization | CodeCode Available | 2 |
| Find Any Part in 3D | Nov 20, 2024 | 3D Part SegmentationDiversity | CodeCode Available | 2 |
| GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting | May 13, 2024 | 3D scene EditingVirtual Try-on | CodeCode Available | 2 |
| AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control | Apr 5, 2021 | Imitation LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| PaLM-E: An Embodied Multimodal Language Model | Mar 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations | Sep 22, 2016 | GPU | CodeCode Available | 2 |
| Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration | Jul 7, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| PRAM: Place Recognition Anywhere Model for Efficient Visual Localization | Apr 11, 2024 | Autonomous DrivingLandmark Recognition | CodeCode Available | 2 |
| Learning to Predict Without Looking Ahead: World Models Without Forward Prediction | Oct 29, 2019 | Model-based Reinforcement Learningreinforcement-learning | CodeCode Available | 2 |
| P2Object: Single Point Supervised Object Detection and Instance Segmentation | Apr 10, 2025 | Instance SegmentationMultiple Instance Learning | CodeCode Available | 2 |
| The Revolution of Multimodal Large Language Models: A Survey | Feb 19, 2024 | Image GenerationInstruction Following | CodeCode Available | 2 |
| SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views | Jun 12, 2022 | Neural RenderingSurface Reconstruction | CodeCode Available | 2 |
| RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework | Sep 18, 2024 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 |
| CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs | Nov 21, 2024 | Clone DetectionCode Search | CodeCode Available | 2 |
| Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Jan 13, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| Vikhr: Constructing a State-of-the-art Bilingual Open-Source Instruction-Following Large Language Model for Russian | May 22, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Uncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons | Jan 19, 2022 | BIG-bench Machine LearningUncertainty Quantification | CodeCode Available | 2 |
| Learning to Act from Actionless Videos through Dense Correspondences | Oct 12, 2023 | | CodeCode Available | 2 |
| Effective Long-Context Scaling of Foundation Models | Sep 27, 2023 | Continual PretrainingLanguage Modeling | CodeCode Available | 2 |
| DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer | Jun 12, 2024 | Image DehazingNonhomogeneous Image Dehazing | CodeCode Available | 2 |
| What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? | Jul 5, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Palette: Image-to-Image Diffusion Models | Nov 10, 2021 | ColorizationDenoising | CodeCode Available | 2 |
| EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks | Jan 31, 2024 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 |
| PaLM: Scaling Language Modeling with Pathways | Apr 5, 2022 | Auto DebuggingCode Generation | CodeCode Available | 2 |
| RPN 2: On Interdependence Function Learning Towards Unifying and Advancing CNN, RNN, GNN, and Transformer | Nov 17, 2024 | | CodeCode Available | 2 |
| TIPS: Text-Image Pretraining with Spatial Awareness | Oct 21, 2024 | Depth EstimationImage Captioning | CodeCode Available | 2 |
| Equivariance and partial observations in Koopman operator theory for partial differential equations | Jul 28, 2023 | | CodeCode Available | 2 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 |
| cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning | May 28, 2025 | CAD ReconstructionLarge Language Model | CodeCode Available | 2 |
| Fast protein backbone generation with SE(3) flow matching | Oct 8, 2023 | Protein Design | CodeCode Available | 2 |