| MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer | Mar 5, 2024 | | CodeCode Available | 2 | 5 |
| MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition | Mar 6, 2024 | Data AugmentationDeep Learning | CodeCode Available | 2 | 5 |
| VastTrack: Vast Category Visual Object Tracking | Mar 6, 2024 | ObjectObject Tracking | CodeCode Available | 2 | 5 |
| Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models | Mar 6, 2024 | MambaRecommendation Systems | CodeCode Available | 2 | 5 |
| Delving into the Trajectory Long-tail Distribution for Muti-object Tracking | Mar 7, 2024 | Data AugmentationMulti-Object Tracking | CodeCode Available | 2 | 5 |
| Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs | May 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| JAX-SPH: A Differentiable Smoothed Particle Hydrodynamics Framework | Mar 7, 2024 | Dataset Generation | CodeCode Available | 2 | 5 |
| DocDiff: Document Enhancement via Residual Diffusion Models | May 6, 2023 | DeblurringDenoising | CodeCode Available | 2 | 5 |
| OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents | Jun 21, 2023 | MMR total | CodeCode Available | 2 | 5 |
| StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models | Mar 8, 2024 | Image Generation | CodeCode Available | 2 | 5 |
| MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology | Mar 11, 2024 | MambaMultiple Instance Learning | CodeCode Available | 2 | 5 |
| Caltech Aerial RGB-Thermal Dataset in the Wild | Mar 13, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Mar 12, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 | 5 |
| Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews | Mar 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| CleanAgent: Automating Data Standardization with LLM-based Agents | Mar 13, 2024 | Code GenerationNatural Language Understanding | CodeCode Available | 2 | 5 |
| Change Point Detection with Copula Entropy based Two-Sample Test | Feb 3, 2024 | Change Point DetectionTime Series | CodeCode Available | 2 | 5 |
| Single Domain Generalization for Crowd Counting | Mar 14, 2024 | Crowd CountingDomain Generalization | CodeCode Available | 2 | 5 |
| RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception | Mar 15, 2024 | 3D Object Detection3D Object Tracking | CodeCode Available | 2 | 5 |
| Robust Shape Fitting for 3D Scene Abstraction | Mar 15, 2024 | Depth EstimationScene Parsing | CodeCode Available | 2 | 5 |
| Rethinking Features-Fused-Pyramid-Neck for Object Detection | May 19, 2025 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations | Mar 17, 2024 | Objectobject-detection | CodeCode Available | 2 | 5 |
| Neural Markov Random Field for Stereo Matching | Mar 17, 2024 | Domain GeneralizationInductive Bias | CodeCode Available | 2 | 5 |
| Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning | Mar 18, 2024 | 3DGS3D Reconstruction | CodeCode Available | 2 | 5 |
| Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail | Mar 18, 2024 | Lifelike 3D Human Generation | CodeCode Available | 2 | 5 |
| Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning | Mar 18, 2024 | class-incremental learningClass Incremental Learning | CodeCode Available | 2 | 5 |
| ViTGaze: Gaze Following with Interaction Features in Vision Transformers | Mar 19, 2024 | Gaze Target Estimation | CodeCode Available | 2 | 5 |
| ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields | Mar 18, 2024 | 3D geometryImage Generation | CodeCode Available | 2 | 5 |
| Fairness Evaluation for Uplift Modeling in the Absence of Ground Truth | Feb 12, 2024 | counterfactualDecision Making | CodeCode Available | 2 | 5 |
| SoftPatch: Unsupervised Anomaly Detection with Noisy Data | Mar 21, 2024 | Anomaly DetectionUnsupervised Anomaly Detection | CodeCode Available | 2 | 5 |
| VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality | May 15, 2025 | 3DGSGPU | CodeCode Available | 2 | 5 |
| Visually Guided Generative Text-Layout Pre-training for Document Intelligence | Mar 25, 2024 | Document Classificationdocument understanding | CodeCode Available | 2 | 5 |
| Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance | Mar 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Is Your LiDAR Placement Optimized for 3D Scene Understanding? | Mar 25, 2024 | 3D Object DetectionLIDAR Semantic Segmentation | CodeCode Available | 2 | 5 |
| Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation | Mar 20, 2025 | | CodeCode Available | 2 | 5 |
| GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM | Mar 28, 2024 | Simultaneous Localization and Mapping | CodeCode Available | 2 | 5 |
| Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis | Mar 28, 2024 | Change DetectionLanguage Modelling | CodeCode Available | 2 | 5 |
| SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors | Nov 28, 2024 | Novel View Synthesis | CodeCode Available | 2 | 5 |
| StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation | Mar 29, 2024 | Image-to-Image TranslationTranslation | CodeCode Available | 2 | 5 |
| Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting | Mar 29, 2024 | DenoisingImage Inpainting | CodeCode Available | 2 | 5 |
| A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) | Mar 31, 2024 | Collaborative FilteringRecommendation Systems | CodeCode Available | 2 | 5 |
| EGTR: Extracting Graph from Transformer for Scene Graph Generation | Apr 2, 2024 | Graph GenerationMulti-Task Learning | CodeCode Available | 2 | 5 |
| Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space | Mar 31, 2025 | Cloud RemovalDenoising | CodeCode Available | 2 | 5 |
| Test-Time Model Adaptation with Only Forward Passes | Apr 2, 2024 | modelTest-time Adaptation | CodeCode Available | 2 | 5 |
| AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution | Apr 4, 2024 | Image Super-ResolutionQuantization | CodeCode Available | 2 | 5 |
| Learning Transferable Negative Prompts for Out-of-Distribution Detection | Apr 4, 2024 | Out-of-Distribution DetectionOut of Distribution (OOD) Detection | CodeCode Available | 2 | 5 |
| ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing | Apr 5, 2024 | Image Manipulation | CodeCode Available | 2 | 5 |
| Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer | Apr 7, 2024 | 3D Human Reconstruction3D Object Reconstruction | CodeCode Available | 2 | 5 |
| LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion | Mar 30, 2024 | DiversityImage Generation | CodeCode Available | 2 | 5 |
| Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes | Apr 6, 2024 | Point Cloud Registration | CodeCode Available | 2 | 5 |
| Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation | Apr 9, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 | 5 |