| RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback | Feb 6, 2024 | reinforcement-learningReinforcement Learning (RL) | CodeCode Available | 2 |
| U-shaped Vision Mamba for Single Image Dehazing | Feb 6, 2024 | Image DehazingImage Restoration | CodeCode Available | 2 |
| MOMENT: A Family of Open Time-series Foundation Models | Feb 6, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection | Feb 6, 2024 | 3D Object DetectionDenoising | CodeCode Available | 2 |
| CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model | Feb 6, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K | Feb 6, 2024 | 16kBenchmarking | CodeCode Available | 2 |
| Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | Feb 6, 2024 | | CodeCode Available | 2 |
| QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning | Feb 6, 2024 | Image GenerationModel Compression | CodeCode Available | 2 |
| A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation | Feb 6, 2024 | Out-of-Distribution GeneralizationPrompt Learning | CodeCode Available | 2 |
| Linear-time Minimum Bayes Risk Decoding with Reference Aggregation | Feb 6, 2024 | Text Generation | CodeCode Available | 2 |
| Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback | Feb 6, 2024 | Video-based Generative Performance Benchmarking | CodeCode Available | 2 |
| Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses | Feb 6, 2024 | | CodeCode Available | 2 |
| Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods | Feb 5, 2024 | | CodeCode Available | 2 |
| 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes | Feb 5, 2024 | GPUNovel View Synthesis | CodeCode Available | 2 |
| InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions | Feb 5, 2024 | Video Generation | CodeCode Available | 2 |
| Rethinking Optimization and Architecture for Tiny Language Models | Feb 5, 2024 | Language Modelling | CodeCode Available | 2 |
| HASSOD: Hierarchical Adaptive Self-Supervised Object Detection | Feb 5, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives | Feb 5, 2024 | Continual LearningMulti-Task Learning | CodeCode Available | 2 |
| Position: What Can Large Language Models Tell Us about Time Series Analysis | Feb 5, 2024 | Decision MakingPosition | CodeCode Available | 2 |
| Light and Optimal Schrödinger Bridge Matching | Feb 5, 2024 | | CodeCode Available | 2 |
| Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models | Feb 5, 2024 | Data AugmentationData Poisoning | CodeCode Available | 2 |
| Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector | Feb 5, 2024 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| Guidance with Spherical Gaussian Constraint for Conditional Diffusion | Feb 5, 2024 | Denoising | CodeCode Available | 2 |
| How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning | Feb 5, 2024 | In-Context LearningMetric Learning | CodeCode Available | 2 |
| nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model | Feb 5, 2024 | 3D Medical Imaging SegmentationImage Segmentation | CodeCode Available | 2 |
| FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition | Feb 5, 2024 | Action RecognitionOpen Vocabulary Action Recognition | CodeCode Available | 2 |
| See More Details: Efficient Image Super-Resolution by Experts Mining | Feb 5, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective | Feb 5, 2024 | Anomaly DetectionTime Series | CodeCode Available | 2 |
| Flora: Low-Rank Adapters Are Secretly Gradient Compressors | Feb 5, 2024 | | CodeCode Available | 2 |
| Graph-enhanced Large Language Models in Asynchronous Plan Reasoning | Feb 5, 2024 | | CodeCode Available | 2 |
| Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models | Feb 5, 2024 | Medical Diagnosis | CodeCode Available | 2 |
| Training-Free Consistent Text-to-Image Generation | Feb 5, 2024 | DiversityImage Generation | CodeCode Available | 2 |
| Large Language Models are Geographically Biased | Feb 5, 2024 | Fairness | CodeCode Available | 2 |
| Retrieval-Augmented Score Distillation for Text-to-3D Generation | Feb 5, 2024 | 3D Generation3D geometry | CodeCode Available | 2 |
| LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Robot Trajectron: Trajectory Prediction-based Shared Control for Robot Manipulation | Feb 4, 2024 | PositionRobot Manipulation | CodeCode Available | 2 |
| GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Minusformer: Improving Time Series Forecasting by Progressively Learning Residuals | Feb 4, 2024 | Ensemble LearningTime Series | CodeCode Available | 2 |
| KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion | Feb 4, 2024 | In-Context LearningKnowledge Graph Completion | CodeCode Available | 2 |
| Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning | Feb 4, 2024 | Contact-rich ManipulationZero-shot Generalization | CodeCode Available | 2 |
| Jailbreaking Attack against Multimodal Large Language Model | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Federated Learning with New Knowledge: Fundamentals, Advances, and Futures | Feb 3, 2024 | Federated LearningPrivacy Preserving | CodeCode Available | 2 |
| Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models | Feb 3, 2024 | Instruction FollowingSafety Alignment | CodeCode Available | 2 |
| More Agents Is All You Need | Feb 3, 2024 | All | CodeCode Available | 2 |
| Affordable Generative Agents | Feb 3, 2024 | | CodeCode Available | 2 |
| Change Point Detection with Copula Entropy based Two-Sample Test | Feb 3, 2024 | Change Point DetectionTime Series | CodeCode Available | 2 |
| EffiBench: Benchmarking the Efficiency of Automatically Generated Code | Feb 3, 2024 | BenchmarkingCode Completion | CodeCode Available | 2 |
| ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation | Feb 3, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning | Feb 3, 2024 | Link PredictionNode Classification | CodeCode Available | 2 |
| Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance | Feb 3, 2024 | Denoising | CodeCode Available | 2 |