| Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking | Mar 20, 2024 | 3D Multi-Object TrackingCPU | CodeCode Available | 2 |
| Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation | Mar 20, 2024 | Semantic SegmentationWeakly supervised Semantic Segmentation | CodeCode Available | 2 |
| H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation | Mar 20, 2024 | Image SegmentationLesion Segmentation | CodeCode Available | 2 |
| Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment | Mar 20, 2024 | Action Quality AssessmentAction Quality Assessment Report Generation | CodeCode Available | 2 |
| Tuning-Free Image Customization with Image and Text Guidance | Mar 19, 2024 | DecoderDenoising | CodeCode Available | 2 |
| Lifting Multi-View Detection and Tracking to the Bird's Eye View | Mar 19, 2024 | 3D Object RecognitionMulti-Object Tracking | CodeCode Available | 2 |
| JaxUED: A simple and useable UED library in Jax | Mar 19, 2024 | CPU | CodeCode Available | 2 |
| Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs | Mar 19, 2024 | Few-Shot LearningSelf-Supervised Learning | CodeCode Available | 2 |
| VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning | Mar 19, 2024 | BenchmarkingImage Captioning | CodeCode Available | 2 |
| Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization | Mar 19, 2024 | Quantization | CodeCode Available | 2 |
| Task-Customized Mixture of Adapters for General Image Fusion | Mar 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Advancing Time Series Classification with Multimodal Language Modeling | Mar 19, 2024 | ClassificationLanguage Modeling | CodeCode Available | 2 |
| Cross-Domain Pre-training with Language Models for Transferable Time Series Representations | Mar 19, 2024 | Language ModellingTime Series | CodeCode Available | 2 |
| Better Call SAL: Towards Learning to Segment Anything in Lidar | Mar 19, 2024 | Panoptic SegmentationSegmentation | CodeCode Available | 2 |
| Embodied LLM Agents Learn to Cooperate in Organized Teams | Mar 19, 2024 | Decision MakingHuman Agent Collaboration | CodeCode Available | 2 |
| Optimal Flow Matching: Learning Straight Trajectories in Just One Step | Mar 19, 2024 | | CodeCode Available | 2 |
| You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs | Mar 19, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Generative Enhancement for 3D Medical Images | Mar 19, 2024 | counterfactualImage Generation | CodeCode Available | 2 |
| FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis | Mar 19, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning | Mar 19, 2024 | Inductive BiasReinforcement Learning (RL) | CodeCode Available | 2 |
| Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models | Mar 19, 2024 | Instruction Followingvisual instruction following | CodeCode Available | 2 |
| Discover and Mitigate Multiple Biased Subgroups in Image Classifiers | Mar 19, 2024 | Dimensionality ReductionSubgroup Discovery | CodeCode Available | 2 |
| ViTGaze: Gaze Following with Interaction Features in Vision Transformers | Mar 19, 2024 | Gaze Target Estimation | CodeCode Available | 2 |
| Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery | Mar 18, 2024 | Instance SegmentationNeRF | CodeCode Available | 2 |
| Continual Forgetting for Pre-trained Vision Models | Mar 18, 2024 | Continual ForgettingFace Recognition | CodeCode Available | 2 |
| Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt | Mar 18, 2024 | AttributeDecoder | CodeCode Available | 2 |
| RouterBench: A Benchmark for Multi-LLM Routing System | Mar 18, 2024 | | CodeCode Available | 2 |
| LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Mar 18, 2024 | | CodeCode Available | 2 |
| BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation | Mar 18, 2024 | Decision MakingScene Segmentation | CodeCode Available | 2 |
| Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems | Mar 18, 2024 | Machine TranslationTranslation | CodeCode Available | 2 |
| CRS-Diff: Controllable Remote Sensing Image Generation with Diffusion Model | Mar 18, 2024 | Image Generation | CodeCode Available | 2 |
| LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Graph Neural Networks for Learning Equivariant Representations of Neural Networks | Mar 18, 2024 | | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control | Mar 18, 2024 | Instruction FollowingMinecraft | CodeCode Available | 2 |
| ReGenNet: Towards Human Action-Reaction Synthesis | Mar 18, 2024 | Decoder | CodeCode Available | 2 |
| How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | Mar 18, 2024 | Decision Making | CodeCode Available | 2 |
| Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language Models | Mar 18, 2024 | 4kPosition | CodeCode Available | 2 |
| SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction | Mar 18, 2024 | Autonomous Vehiclesmotion prediction | CodeCode Available | 2 |
| GaussNav: Gaussian Splatting for Visual Navigation | Mar 18, 2024 | 3DGSVisual Navigation | CodeCode Available | 2 |
| HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation | Mar 18, 2024 | Scene Graph Generation | CodeCode Available | 2 |
| DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation | Mar 18, 2024 | Feature EngineeringImage Manipulation | CodeCode Available | 2 |
| ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields | Mar 18, 2024 | 3D geometryImage Generation | CodeCode Available | 2 |
| Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning | Mar 18, 2024 | class-incremental learningClass Incremental Learning | CodeCode Available | 2 |
| Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail | Mar 18, 2024 | Lifelike 3D Human Generation | CodeCode Available | 2 |
| Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning | Mar 18, 2024 | 3DGS3D Reconstruction | CodeCode Available | 2 |
| A Versatile Framework for Multi-scene Person Re-identification | Mar 17, 2024 | Data AugmentationPerson Re-Identification | CodeCode Available | 2 |
| Bilateral Propagation Network for Depth Completion | Mar 17, 2024 | Depth Completion | CodeCode Available | 2 |
| Stylized Face Sketch Extraction via Generative Prior with Limited Data | Mar 17, 2024 | Face Sketch Synthesis | CodeCode Available | 2 |
| DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation | Mar 17, 2024 | Semantic SegmentationWeakly supervised Semantic Segmentation | CodeCode Available | 2 |