| GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | Jun 3, 2024 | 3D Object DetectionImage-to-Image Translation | CodeCode Available | 2 | 5 |
| Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Jun 14, 2024 | Memorization | CodeCode Available | 2 | 5 |
| H3WB: Human3.6M 3D WholeBody Dataset and Benchmark | Nov 28, 2022 | 3D Facial Landmark Localization3D Hand Pose Estimation | CodeCode Available | 2 | 5 |
| PointPillars: Fast Encoders for Object Detection from Point Clouds | Dec 14, 2018 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| VTimeLLM: Empower LLM to Grasp Video Moments | Nov 30, 2023 | Dense Video CaptioningTemporal Relation Extraction | CodeCode Available | 2 | 5 |
| SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories | Mar 11, 2025 | Decision MakingInteractive Segmentation | CodeCode Available | 2 | 5 |
| Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding | Nov 15, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors | Mar 24, 2022 | Image GenerationSemantic Segmentation | CodeCode Available | 2 | 5 |
| GRES: Generalized Referring Expression Segmentation | Jun 1, 2023 | Generalized Referring Expression SegmentationReferring Expression | CodeCode Available | 2 | 5 |
| SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization | Jun 18, 2024 | Landmark-based LipreadingLipreading | CodeCode Available | 2 | 5 |
| ConvLoRA and AdaBN based Domain Adaptation via Self-Training | Feb 7, 2024 | Domain AdaptationMulti-target Domain Adaptation | CodeCode Available | 2 | 5 |
| A Scalable Communication Protocol for Networks of Large Language Models | Oct 14, 2024 | | CodeCode Available | 2 | 5 |
| Recurrent Diffusion for Large-Scale Parameter Generation | Jan 20, 2025 | GPU | CodeCode Available | 2 | 5 |
| Universal Image Restoration Pre-training via Degradation Classification | Jan 26, 2025 | 5-Degradation Blind All-in-One Image RestorationImage Restoration | CodeCode Available | 2 | 5 |
| DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis | Mar 19, 2025 | | CodeCode Available | 2 | 5 |
| MultiBooth: Towards Generating All Your Concepts in an Image from Text | Apr 22, 2024 | AllComputational Efficiency | CodeCode Available | 2 | 5 |
| Causal Reasoning and Large Language Models: Opening a New Frontier for Causality | Apr 28, 2023 | Causal DiscoveryCommon Sense Reasoning | CodeCode Available | 2 | 5 |
| AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss | May 14, 2019 | Style TransferVoice Conversion | CodeCode Available | 2 | 5 |
| Graph-of-Thought: Utilizing Large Language Models to Solve Complex and Dynamic Business Problems | Jan 10, 2024 | Decision Making | CodeCode Available | 2 | 5 |
| A Large-Scale Chinese Short-Text Conversation Dataset | Aug 10, 2020 | Dialogue GenerationShort-Text Conversation | CodeCode Available | 2 | 5 |
| Advbox: a toolbox to generate adversarial examples that fool neural networks | Jan 13, 2020 | BIG-bench Machine LearningFace Recognition | CodeCode Available | 2 | 5 |
| Enhancing Fine-grained Sentiment Classification Exploiting Local Context Embedding | Oct 2, 2020 | Aspect-Based Sentiment Analysis (ABSA)Classification | CodeCode Available | 2 | 5 |
| Good practices for evaluation of machine learning systems | Dec 4, 2024 | | CodeCode Available | 2 | 5 |
| Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment | May 26, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning | Feb 9, 2024 | Active LearningVideo Classification | CodeCode Available | 2 | 5 |