| Frustratingly Easy Test-Time Adaptation of Vision-Language Models | May 28, 2024 | Test-time Adaptation | CodeCode Available | 2 |
| REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment | May 28, 2024 | Image to 3DObject | CodeCode Available | 2 |
| Color Shift Estimation-and-Correction for Image Enhancement | May 28, 2024 | Exposure CorrectionImage Enhancement | CodeCode Available | 2 |
| TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation | May 28, 2024 | Machine Translationspeech-recognition | CodeCode Available | 2 |
| ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention | May 28, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention | May 28, 2024 | GPUMamba | CodeCode Available | 2 |
| Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving | May 28, 2024 | Autonomous DrivingBilevel Optimization | CodeCode Available | 2 |
| Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment | May 28, 2024 | | CodeCode Available | 2 |
| Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | May 28, 2024 | | CodeCode Available | 2 |
| AutoPSV: Automated Process-Supervised Verifier | May 27, 2024 | | CodeCode Available | 2 |
| NoteLLM-2: Multimodal Large Representation Models for Recommendation | May 27, 2024 | In-Context Learning | CodeCode Available | 2 |
| Multi-Behavior Generative Recommendation | May 27, 2024 | Sequential Recommendation | CodeCode Available | 2 |
| BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments | May 27, 2024 | AI AgentBayesian Optimization | CodeCode Available | 2 |
| Memorize What Matters: Emergent Scene Decomposition from Multitraverse | May 27, 2024 | 3D ReconstructionNeural Rendering | CodeCode Available | 2 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 |
| Saturn: Sample-efficient Generative Molecular Design using Memory Manipulation | May 27, 2024 | Data AugmentationDrug Discovery | CodeCode Available | 2 |
| Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows | May 27, 2024 | Computational EfficiencyDe-aliasing | CodeCode Available | 2 |
| DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models | May 27, 2024 | | CodeCode Available | 2 |
| Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs | May 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | May 27, 2024 | DecoderLanguage Modeling | CodeCode Available | 2 |
| Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning | May 27, 2024 | Question AnsweringRAG | CodeCode Available | 2 |
| TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction | May 27, 2024 | MambaPrediction | CodeCode Available | 2 |
| Position: Foundation Agents as the Paradigm Shift for Decision Making | May 27, 2024 | Decision MakingPosition | CodeCode Available | 2 |
| MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities | May 27, 2024 | Autonomous DrivingOut-of-Distribution Detection | CodeCode Available | 2 |
| Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning | May 27, 2024 | Gym halfcheetah-mediumGym halfcheetah-medium-expert | CodeCode Available | 2 |
| EASI-Tex: Edge-Aware Mesh Texturing from Single Image | May 27, 2024 | | CodeCode Available | 2 |
| Autoformalizing Euclidean Geometry | May 27, 2024 | Math | CodeCode Available | 2 |
| Are Self-Attentions Effective for Time Series Forecasting? | May 27, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training | May 27, 2024 | | CodeCode Available | 2 |
| Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models | May 27, 2024 | SegmentationSemantic correspondence | CodeCode Available | 2 |
| VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models | May 27, 2024 | Object | CodeCode Available | 2 |
| DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos | May 27, 2024 | 3DGSAutonomous Vehicles | CodeCode Available | 2 |
| Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth | May 27, 2024 | | CodeCode Available | 2 |
| M^3CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought | May 26, 2024 | | CodeCode Available | 2 |
| Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge | May 26, 2024 | ClassificationEdge Classification | CodeCode Available | 2 |
| Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation | May 26, 2024 | feature selectionMixture-of-Experts | CodeCode Available | 2 |
| AdaFisher: Adaptive Second Order Optimization via Fisher Information | May 26, 2024 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| LoQT: Low-Rank Adapters for Quantized Pretraining | May 26, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| A Survey of Multimodal Large Language Model from A Data-centric Perspective | May 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | May 26, 2024 | | CodeCode Available | 2 |
| Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians | May 26, 2024 | 3D ReconstructionSimultaneous Localization and Mapping | CodeCode Available | 2 |
| Crafting Interpretable Embeddings by Asking LLMs Questions | May 26, 2024 | Question Answering | CodeCode Available | 2 |
| KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge | May 26, 2024 | Graph EmbeddingInformativeness | CodeCode Available | 2 |
| MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting | May 26, 2024 | MambaState Space Models | CodeCode Available | 2 |
| REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation | May 25, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution | May 25, 2024 | Attribute | CodeCode Available | 2 |
| Continuous Temporal Domain Generalization | May 25, 2024 | Domain Generalization | CodeCode Available | 2 |
| MoEUT: Mixture-of-Experts Universal Transformers | May 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization | May 25, 2024 | continuous-controlContinuous Control | CodeCode Available | 2 |
| Underwater Image Enhancement by Diffusion Model with Customized CLIP-Classifier | May 25, 2024 | Image EnhancementImage Generation | CodeCode Available | 2 |