| TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation | Oct 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Phenaki: Variable Length Video Generation From Open Domain Textual Description | Oct 5, 2022 | DecoderVideo Generation | CodeCode Available | 2 |
| 1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022) | Jun 23, 2022 | Data AugmentationVision and Language Navigation | CodeCode Available | 2 |
| Training-Free Consistent Text-to-Image Generation | Feb 5, 2024 | DiversityImage Generation | CodeCode Available | 2 |
| Efficient Differentiable Simulation of Articulated Bodies | Sep 16, 2021 | | CodeCode Available | 2 |
| DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding | Nov 21, 2022 | Drug Discovery | CodeCode Available | 2 |
| Multi-Interest Network with Dynamic Routing for Recommendation at Tmall | Apr 17, 2019 | ClusteringInformation Retrieval | CodeCode Available | 2 |
| Deconstructing Denoising Diffusion Models for Self-Supervised Learning | Jan 25, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy | Mar 2, 2023 | Motion Planning | CodeCode Available | 2 |
| Fine-Grained Stochastic Architecture Search | Jun 17, 2020 | Neural Architecture Searchobject-detection | CodeCode Available | 2 |
| ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes | Aug 22, 2023 | 3D Semantic SegmentationNovel View Synthesis | CodeCode Available | 2 |
| Foundational Challenges in Assuring Alignment and Safety of Large Language Models | Apr 15, 2024 | | CodeCode Available | 2 |
| Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction? No, Let's Improve It With Action-union Learning | Oct 22, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 2 |
| Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA | May 9, 2025 | | CodeCode Available | 2 |
| Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis | Mar 14, 2023 | 3D Point Cloud ClassificationAll | CodeCode Available | 2 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 |
| VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? | Apr 9, 2024 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | May 21, 2024 | | CodeCode Available | 2 |
| InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs | May 18, 2020 | AttributeFace Generation | CodeCode Available | 2 |
| Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery | Apr 3, 2025 | Field Boundary DelineationInstance Segmentation | CodeCode Available | 2 |
| Flexible Isosurface Extraction for Gradient-Based Mesh Optimization | Aug 10, 2023 | | CodeCode Available | 2 |
| Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction | Mar 7, 2021 | Depth EstimationDepth Prediction | CodeCode Available | 2 |
| TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials | Apr 17, 2025 | Articles | CodeCode Available | 2 |
| Unified Structure Generation for Universal Information Extraction | Mar 23, 2022 | Aspect-Based Sentiment Analysis (ABSA)UIE | CodeCode Available | 2 |
| Customization Assistant for Text-to-image Generation | Dec 5, 2023 | DescriptiveImage Generation | CodeCode Available | 2 |
| IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis | May 31, 2022 | 3D-Aware Image SynthesisImage Generation | CodeCode Available | 2 |
| PiEEG-16 to Measure 16 EEG Channels with Raspberry Pi for Brain-Computer Interfaces and EEG devices | Sep 13, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 2 |
| GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks | Apr 24, 2025 | Atomic ForcesComputational Efficiency | CodeCode Available | 2 |
| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Voice Separation with an Unknown Number of Multiple Speakers | Feb 29, 2020 | Speech Separation | CodeCode Available | 2 |
| Differentiable Augmentation for Data-Efficient GAN Training | Jun 18, 2020 | Image Generation | CodeCode Available | 2 |
| MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | Sep 26, 2024 | Large Language ModelModel Compression | CodeCode Available | 2 |
| ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback | May 23, 2025 | | CodeCode Available | 2 |
| MPNet: Masked and Permuted Pre-training for Language Understanding | Apr 20, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Task-Customized Mixture of Adapters for General Image Fusion | Mar 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting | Jun 6, 2024 | Computational EfficiencyData Integration | CodeCode Available | 2 |
| PyGAD: An Intuitive Genetic Algorithm Python Library | Jun 11, 2021 | | CodeCode Available | 2 |
| Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning | May 23, 2023 | Image GenerationOptical Flow Estimation | CodeCode Available | 2 |
| Modular Primitives for High-Performance Differentiable Rendering | Nov 6, 2020 | AttributeInverse Rendering | CodeCode Available | 2 |
| Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review | Jul 21, 2020 | Deep Learning | CodeCode Available | 2 |
| LidarDM: Generative LiDAR Simulation in a Generated World | Apr 3, 2024 | Autonomous DrivingPoint Cloud Generation | CodeCode Available | 2 |
| ClipCap: CLIP Prefix for Image Captioning | Nov 18, 2021 | Image CaptioningLanguage Modeling | CodeCode Available | 2 |
| End to End Learning for Self-Driving Cars | Apr 25, 2016 | Lane DetectionSelf-Driving Cars | CodeCode Available | 2 |
| Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking | Sep 8, 2021 | BenchmarkingDiversity | CodeCode Available | 2 |
| L4acados: Learning-based models for acados, applied to Gaussian process-based predictive control | Nov 28, 2024 | Computational EfficiencyGaussian Processes | CodeCode Available | 2 |
| MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting | Oct 10, 2024 | 3D ReconstructionDynamic Reconstruction | CodeCode Available | 2 |
| Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Neural Speech Synthesis with Transformer Network | Sep 19, 2018 | DecoderMachine Translation | CodeCode Available | 2 |
| End-To-End Memory Networks | Mar 31, 2015 | Language ModelingLanguage Modelling | CodeCode Available | 2 |