| Referring Video Object Segmentation via Language-aligned Track Selection | Dec 2, 2024 | ObjectObject Tracking | CodeCode Available | 1 |
| Explainable fault and severity classification for rolling element bearings using Kolmogorov-Arnold networks | Dec 2, 2024 | ClassificationExplainable Models | CodeCode Available | 1 |
| How Much Can Time-related Features Enhance Time Series Forecasting? | Dec 2, 2024 | Computational EfficiencyTime Series | CodeCode Available | 1 |
| MambaU-Lite: A Lightweight Model based on Mamba and Integrated Channel-Spatial Attention for Skin Lesion Segmentation | Dec 2, 2024 | DiagnosticLesion Segmentation | CodeCode Available | 1 |
| COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training | Dec 2, 2024 | Self-Supervised LearningSemantic Segmentation | CodeCode Available | 1 |
| SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages | Dec 2, 2024 | Multiple-choice | CodeCode Available | 1 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model | Dec 2, 2024 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 |
| Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond | Dec 2, 2024 | Image EnhancementImage Restoration | CodeCode Available | 1 |
| Dual-Branch Graph Transformer Network for 3D Human Mesh Reconstruction from Video | Dec 2, 2024 | | CodeCode Available | 1 |
| Hiding Faces in Plain Sight: Defending DeepFakes by Disrupting Face Detection | Dec 2, 2024 | Adversarial AttackFace Detection | CodeCode Available | 1 |
| Improving Detail in Pluralistic Image Inpainting with Feature Dequantization | Dec 2, 2024 | Image InpaintingQuantization | CodeCode Available | 1 |
| PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos | Dec 2, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 1 |
| Multi-Granularity Video Object Segmentation | Dec 2, 2024 | ObjectSegmentation | CodeCode Available | 1 |
| Toward Real-Time Edge AI: Model-Agnostic Task-Oriented Communication with Visual Feature Alignment | Dec 1, 2024 | | CodeCode Available | 1 |
| Token Cropr: Faster ViTs for Quite a Few Tasks | Dec 1, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information | Dec 1, 2024 | Multiple-choice | CodeCode Available | 1 |
| SEED4D: A Synthetic Ego--Exo Dynamic 4D Data Generator, Driving Dataset and Benchmark | Dec 1, 2024 | 2k4D reconstruction | CodeCode Available | 1 |
| DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation | Dec 1, 2024 | Quantization | CodeCode Available | 1 |
| Towards Unified Molecule-Enhanced Pathology Image Representation Learning via Integrating Spatial Transcriptomics | Dec 1, 2024 | Data IntegrationRepresentation Learning | CodeCode Available | 1 |
| Free and Customizable Code Documentation with LLMs: A Fine-Tuning Approach | Dec 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models | Dec 1, 2024 | 6D Pose Estimation using RGBObject | CodeCode Available | 1 |
| EDTformer: An Efficient Decoder Transformer for Visual Place Recognition | Dec 1, 2024 | DecoderRe-Ranking | CodeCode Available | 1 |
| SyncVIS: Synchronized Video Instance Segmentation | Dec 1, 2024 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement | Dec 1, 2024 | Image EnhancementImage Reconstruction | CodeCode Available | 1 |
| Visual Modality Prompt for Adapting Vision-Language Object Detectors | Dec 1, 2024 | DecoderTranslation | CodeCode Available | 1 |
| Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild | Dec 1, 2024 | Moment RetrievalRetrieval | CodeCode Available | 1 |
| Motion-Aware Optical Camera Communication with Event Cameras | Dec 1, 2024 | | CodeCode Available | 1 |
| MambaNUT: Nighttime UAV Tracking via Mamba-based Adaptive Curriculum Learning | Dec 1, 2024 | Domain AdaptationImage Enhancement | CodeCode Available | 1 |
| Oracle-guided Dynamic User Preference Modeling for Sequential Recommendation | Dec 1, 2024 | Sequential Recommendation | CodeCode Available | 1 |
| Unified Parameter-Efficient Unlearning for LLMs | Nov 30, 2024 | parameter-efficient fine-tuning | CodeCode Available | 1 |
| DroidCall: A Dataset for LLM-powered Android Intent Invocation | Nov 30, 2024 | Natural Language Understanding | CodeCode Available | 1 |
| LineGS : 3D Line Segment Representation on 3D Gaussian Splatting | Nov 30, 2024 | 3D ReconstructionSurface Reconstruction | CodeCode Available | 1 |
| DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation | Nov 30, 2024 | DenoisingLayout Generation | CodeCode Available | 1 |
| TAROT: Targeted Data Selection via Optimal Transport | Nov 30, 2024 | motion predictionSemantic Segmentation | CodeCode Available | 1 |
| AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models | Nov 30, 2024 | | CodeCode Available | 1 |
| Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | Nov 30, 2024 | | CodeCode Available | 1 |
| Fine Tuning Large Language Models to Deliver CBT for Depression | Nov 29, 2024 | | CodeCode Available | 1 |
| PerLA: Perceptive 3D Language Assistant | Nov 29, 2024 | Dense CaptioningGraph Neural Network | CodeCode Available | 1 |
| T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Nov 29, 2024 | Data AugmentationDiversity | CodeCode Available | 1 |
| GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting | Nov 29, 2024 | 3DGSDecoder | CodeCode Available | 1 |
| Multigraph Message Passing with Bi-Directional Multi-Edge Aggregations | Nov 29, 2024 | Graph Learning | CodeCode Available | 1 |
| V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow | Nov 29, 2024 | Decoder | CodeCode Available | 1 |
| Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings | Nov 29, 2024 | Multimodal Reasoning | CodeCode Available | 1 |
| SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition | Nov 29, 2024 | Emotion RecognitionGraph Neural Network | CodeCode Available | 1 |
| On the Performance Analysis of Momentum Method: A Frequency Domain Perspective | Nov 29, 2024 | Image Classification | CodeCode Available | 1 |
| Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook | Nov 29, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 1 |
| Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation | Nov 29, 2024 | Denoising | CodeCode Available | 1 |
| Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Nov 29, 2024 | Pose Estimation | CodeCode Available | 1 |
| DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation | Nov 29, 2024 | Dataset DistillationDiversity | CodeCode Available | 1 |
| Another look at inference after prediction | Nov 29, 2024 | Prediction | CodeCode Available | 1 |