| Large Language Models Can Self-Improve in Long-context Reasoning | Nov 12, 2024 | | CodeCode Available | 2 | 5 |
| StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces | Mar 10, 2023 | AttributeSuper-Resolution | CodeCode Available | 2 | 5 |
| LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On | May 22, 2023 | Virtual Try-on | CodeCode Available | 2 | 5 |
| CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding | Apr 22, 2024 | Attribute | CodeCode Available | 2 | 5 |
| A Simple Framework for Contrastive Learning of Visual Representations | Feb 13, 2020 | Contrastive LearningImage Classification | CodeCode Available | 2 | 5 |
| LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences | Dec 2, 2024 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer | Mar 8, 2022 | Image Classificationobject-detection | CodeCode Available | 2 | 5 |
| MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection | Aug 16, 2024 | Event DetectionSound Event Detection | CodeCode Available | 2 | 5 |
| The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge | Oct 27, 2022 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 | 5 |
| KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting | Nov 26, 2020 | Knowledge GraphsRepresentation Learning | CodeCode Available | 2 | 5 |
| MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation | Nov 24, 2023 | 3D GenerationImage Generation | CodeCode Available | 2 | 5 |
| UniRGB-IR: A Unified Framework for RGB-Infrared Semantic Tasks via Adapter Tuning | Apr 26, 2024 | Multispectral Object DetectionPedestrian Detection | CodeCode Available | 2 | 5 |
| PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis | May 24, 2024 | Art AnalysisComputational Efficiency | CodeCode Available | 2 | 5 |
| LumberChunker: Long-Form Narrative Document Segmentation | Jun 25, 2024 | ChunkingForm | CodeCode Available | 2 | 5 |
| EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation | Sep 26, 2024 | Image SegmentationMamba | CodeCode Available | 2 | 5 |
| PokerBench: Training Large Language Models to become Professional Poker Players | Jan 14, 2025 | | CodeCode Available | 2 | 5 |
| LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Jan 14, 2025 | Feature CompressionLanguage Modeling | CodeCode Available | 2 | 5 |
| Geodesic Diffusion Models for Medical Image-to-Image Generation | Mar 2, 2025 | DenoisingImage Denoising | CodeCode Available | 2 | 5 |
| Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark | Mar 12, 2025 | Image RetrievalRetrieval | CodeCode Available | 2 | 5 |
| Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation | Mar 16, 2023 | DiversityGesture Generation | CodeCode Available | 2 | 5 |
| Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses | Feb 3, 2021 | DecoderSpeech Denoising | CodeCode Available | 2 | 5 |
| Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data | Feb 8, 2024 | Action RecognitionMamba | CodeCode Available | 2 | 5 |
| rPPG-Toolbox: Deep Remote PPG Toolbox | Oct 3, 2022 | BenchmarkingData Augmentation | CodeCode Available | 2 | 5 |
| R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | Jan 18, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| Explaining Explanations: Axiomatic Feature Interactions for Deep Networks | Feb 10, 2020 | | CodeCode Available | 2 | 5 |