| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 |
| GSPMD: General and Scalable Parallelization for ML Computation Graphs | May 10, 2021 | Playing the Game of 2048 | CodeCode Available | 2 |
| The More You See in 2D, the More You Perceive in 3D | Apr 4, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 2 |
| SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | Jul 12, 2024 | In-Context LearningTable Detection | CodeCode Available | 2 |
| Multi-Grained Angle Representation for Remote Sensing Object Detection | Sep 7, 2022 | Objectobject-detection | CodeCode Available | 2 |
| What Makes a Good Diffusion Planner for Decision Making? | Mar 1, 2025 | Action GenerationDecision Making | CodeCode Available | 2 |
| Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information | Jun 11, 2025 | | CodeCode Available | 2 |
| 4-bit Conformer with Native Quantization Aware Training for Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| MVDream: Multi-view Diffusion for 3D Generation | Aug 31, 2023 | 3D GenerationPrompt Learning | CodeCode Available | 2 |
| Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning | Jun 14, 2024 | | CodeCode Available | 2 |
| Scaling Down Text Encoders of Text-to-Image Diffusion Models | Mar 25, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Fully Geometric Panoramic Localization | Mar 29, 2024 | Indoor LocalizationVisual Localization | CodeCode Available | 2 |
| Find Any Part in 3D | Nov 20, 2024 | 3D Part SegmentationDiversity | CodeCode Available | 2 |
| GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting | May 13, 2024 | 3D scene EditingVirtual Try-on | CodeCode Available | 2 |
| AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control | Apr 5, 2021 | Imitation LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| PaLM-E: An Embodied Multimodal Language Model | Mar 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations | Sep 22, 2016 | GPU | CodeCode Available | 2 |
| Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration | Jul 7, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| PRAM: Place Recognition Anywhere Model for Efficient Visual Localization | Apr 11, 2024 | Autonomous DrivingLandmark Recognition | CodeCode Available | 2 |
| Learning to Predict Without Looking Ahead: World Models Without Forward Prediction | Oct 29, 2019 | Model-based Reinforcement Learningreinforcement-learning | CodeCode Available | 2 |
| P2Object: Single Point Supervised Object Detection and Instance Segmentation | Apr 10, 2025 | Instance SegmentationMultiple Instance Learning | CodeCode Available | 2 |
| The Revolution of Multimodal Large Language Models: A Survey | Feb 19, 2024 | Image GenerationInstruction Following | CodeCode Available | 2 |
| SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views | Jun 12, 2022 | Neural RenderingSurface Reconstruction | CodeCode Available | 2 |
| RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework | Sep 18, 2024 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 |
| CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs | Nov 21, 2024 | Clone DetectionCode Search | CodeCode Available | 2 |
| Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Jan 13, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| Vikhr: Constructing a State-of-the-art Bilingual Open-Source Instruction-Following Large Language Model for Russian | May 22, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Uncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons | Jan 19, 2022 | BIG-bench Machine LearningUncertainty Quantification | CodeCode Available | 2 |
| Learning to Act from Actionless Videos through Dense Correspondences | Oct 12, 2023 | | CodeCode Available | 2 |
| Effective Long-Context Scaling of Foundation Models | Sep 27, 2023 | Continual PretrainingLanguage Modeling | CodeCode Available | 2 |
| DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer | Jun 12, 2024 | Image DehazingNonhomogeneous Image Dehazing | CodeCode Available | 2 |
| What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? | Jul 5, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Palette: Image-to-Image Diffusion Models | Nov 10, 2021 | ColorizationDenoising | CodeCode Available | 2 |
| EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks | Jan 31, 2024 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 |
| PaLM: Scaling Language Modeling with Pathways | Apr 5, 2022 | Auto DebuggingCode Generation | CodeCode Available | 2 |
| RPN 2: On Interdependence Function Learning Towards Unifying and Advancing CNN, RNN, GNN, and Transformer | Nov 17, 2024 | | CodeCode Available | 2 |
| TIPS: Text-Image Pretraining with Spatial Awareness | Oct 21, 2024 | Depth EstimationImage Captioning | CodeCode Available | 2 |
| Equivariance and partial observations in Koopman operator theory for partial differential equations | Jul 28, 2023 | | CodeCode Available | 2 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 |
| cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning | May 28, 2025 | CAD ReconstructionLarge Language Model | CodeCode Available | 2 |
| Fast protein backbone generation with SE(3) flow matching | Oct 8, 2023 | Protein Design | CodeCode Available | 2 |
| DeepMol: An Automated Machine and Deep Learning Framework for Computational Chemistr | Jun 1, 2024 | Activity PredictionAutoML | CodeCode Available | 2 |
| Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment | Jun 18, 2024 | Denoising | CodeCode Available | 2 |
| SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Oct 16, 2024 | DenoisingVideo Generation | CodeCode Available | 2 |
| Remasking Discrete Diffusion Models with Inference-Time Scaling | Mar 1, 2025 | | CodeCode Available | 2 |
| SCoralDet: Efficient real-time underwater soft coral detection with YOLO | Dec 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 2 |
| Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models | Oct 14, 2024 | | CodeCode Available | 2 |
| GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents | Mar 26, 2023 | Contrastive LearningGesture Generation | CodeCode Available | 2 |
| JourneyDB: A Benchmark for Generative Image Understanding | Jul 3, 2023 | Image CaptioningImage Comprehension | CodeCode Available | 2 |
| X-maps: Direct Depth Lookup for Event-based Structured Light Systems | Feb 15, 2024 | Depth EstimationDisparity Estimation | CodeCode Available | 2 |