| NetTrack: Tracking Highly Dynamic Objects with a Net | Mar 17, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 |
| Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model | Mar 17, 2024 | Image RestorationZero-shot Generalization | CodeCode Available | 2 |
| Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks | Mar 17, 2024 | 3D Molecule Generation | CodeCode Available | 2 |
| BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis | Mar 17, 2024 | 3D GenerationText to 3D | CodeCode Available | 2 |
| MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data | Mar 17, 2024 | Image RetrievalRetrieval | CodeCode Available | 2 |
| Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework | Mar 17, 2024 | AllData Augmentation | CodeCode Available | 2 |
| Neural Markov Random Field for Stereo Matching | Mar 17, 2024 | Domain GeneralizationInductive Bias | CodeCode Available | 2 |
| CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations | Mar 17, 2024 | Objectobject-detection | CodeCode Available | 2 |
| SelfIE: Self-Interpretation of Large Language Model Embeddings | Mar 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection | Mar 16, 2024 | channel selectionobject-detection | CodeCode Available | 2 |
| DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark | Mar 16, 2024 | | CodeCode Available | 2 |
| MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations | Mar 16, 2024 | Intent RecognitionMultimodal Intent Recognition | CodeCode Available | 2 |
| Boosting Flow-based Generative Super-Resolution Models via Learned Prior | Mar 16, 2024 | Image GenerationImage Super-Resolution | CodeCode Available | 2 |
| A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment | Mar 16, 2024 | Image Quality Assessment | CodeCode Available | 2 |
| Fast Sparse View Guided NeRF Update for Object Reconfigurations | Mar 16, 2024 | NeRF | CodeCode Available | 2 |
| ScanTalk: 3D Talking Heads from Unregistered Scans | Mar 16, 2024 | | CodeCode Available | 2 |
| MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections | Mar 16, 2024 | 3D ReconstructionDenoising | CodeCode Available | 2 |
| NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices | Mar 15, 2024 | Activity RecognitionEdge-computing | CodeCode Available | 2 |
| Revisiting Adversarial Training under Long-Tailed Distributions | Mar 15, 2024 | Adversarial DefenseData Augmentation | CodeCode Available | 2 |
| Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising | Mar 15, 2024 | DenoisingHyperspectral Image Denoising | CodeCode Available | 2 |
| Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives | Mar 15, 2024 | Motion Synthesis | CodeCode Available | 2 |
| Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | Mar 15, 2024 | Articles | CodeCode Available | 2 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding | Mar 15, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges | Mar 15, 2024 | | CodeCode Available | 2 |
| DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models | Mar 15, 2024 | RAGRetrieval | CodeCode Available | 2 |
| MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage | Mar 15, 2024 | Music Transcription | CodeCode Available | 2 |
| Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | Mar 15, 2024 | Object | CodeCode Available | 2 |
| VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Mar 15, 2024 | EgoSchemaForm | CodeCode Available | 2 |
| BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics | Mar 15, 2024 | Audio ClassificationClassification | CodeCode Available | 2 |
| Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection | Mar 15, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 2 |
| Robust Shape Fitting for 3D Scene Abstraction | Mar 15, 2024 | Depth EstimationScene Parsing | CodeCode Available | 2 |
| RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception | Mar 15, 2024 | 3D Object Detection3D Object Tracking | CodeCode Available | 2 |
| GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping | Mar 14, 2024 | Contrastive LearningNeRF | CodeCode Available | 2 |
| PosSAM: Panoptic Open-vocabulary Segment Anything | Mar 14, 2024 | DecoderOpen Vocabulary Panoptic Segmentation | CodeCode Available | 2 |
| What Was Your Prompt? A Remote Keylogging Attack on AI Assistants | Mar 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity | Mar 14, 2024 | In-Context Learning | CodeCode Available | 2 |
| E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection | Mar 14, 2024 | Autonomous DrivingObject | CodeCode Available | 2 |
| Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Mar 14, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models | Mar 14, 2024 | | CodeCode Available | 2 |
| OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments | Mar 14, 2024 | Zero-Shot Learning | CodeCode Available | 2 |
| VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation | Mar 14, 2024 | Image SegmentationMamba | CodeCode Available | 2 |
| Faceptor: A Generalist Model for Face Perception | Mar 14, 2024 | Age EstimationAttribute | CodeCode Available | 2 |
| Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference | Mar 14, 2024 | Text Generation | CodeCode Available | 2 |
| AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting | Mar 14, 2024 | | CodeCode Available | 2 |
| Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph | Mar 14, 2024 | 3D Generation3DGS | CodeCode Available | 2 |
| RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems | Mar 14, 2024 | DecoderQuestion Answering | CodeCode Available | 2 |
| MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models | Mar 14, 2024 | 3D Face AnimationDiversity | CodeCode Available | 2 |
| Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts | Mar 14, 2024 | DenoisingMixture-of-Experts | CodeCode Available | 2 |
| CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Mar 14, 2024 | ClassificationCrowd Counting | CodeCode Available | 2 |