| An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection | Jun 10, 2024 | Backdoor AttackCode Completion | CodeCode Available | 2 |
| ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization | Jun 10, 2024 | | CodeCode Available | 2 |
| Recurrent Context Compression: Efficiently Expanding the Context Window of LLM | Jun 10, 2024 | Long-Context UnderstandingQuestion Answering | CodeCode Available | 2 |
| VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft | Jun 9, 2024 | ManagementMinecraft | CodeCode Available | 2 |
| Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language | Jun 9, 2024 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 2 |
| Hello Again! LLM-powered Personalized Agent for Long-term Dialogue | Jun 9, 2024 | Response GenerationRetrieval | CodeCode Available | 2 |
| How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States | Jun 9, 2024 | Safety Alignment | CodeCode Available | 2 |
| Binarized Diffusion Model for Image Super-Resolution | Jun 9, 2024 | AttributeBinarization | CodeCode Available | 2 |
| F-LMM: Grounding Frozen Large Multimodal Models | Jun 9, 2024 | General KnowledgeInstruction Following | CodeCode Available | 2 |
| A DeNoising FPN With Transformer R-CNN for Tiny Object Detection | Jun 9, 2024 | Contrastive LearningDenoising | CodeCode Available | 2 |
| WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark | Jun 9, 2024 | text-to-speechText to Speech | CodeCode Available | 2 |
| Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering | Jun 9, 2024 | Decoder | CodeCode Available | 2 |
| Attention as a Hypernetwork | Jun 9, 2024 | | CodeCode Available | 2 |
| Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples | Jun 9, 2024 | ARCDiversity | CodeCode Available | 2 |
| Medical Vision Generalist: Unifying Medical Imaging Tasks in Context | Jun 8, 2024 | Conditional Image GenerationDenoising | CodeCode Available | 2 |
| LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models | Jun 7, 2024 | | CodeCode Available | 2 |
| Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models | Jun 7, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning | Jun 7, 2024 | Instruction FollowingMath | CodeCode Available | 2 |
| Predictive Dynamic Fusion | Jun 7, 2024 | Decision Making | CodeCode Available | 2 |
| The Russian Legislative Corpus | Jun 7, 2024 | | CodeCode Available | 2 |
| Spectrum: Targeted Training on Signal to Noise Ratio | Jun 7, 2024 | GPU | CodeCode Available | 2 |
| Hibou: A Family of Foundational Vision Transformers for Pathology | Jun 7, 2024 | Diagnosticwhole slide images | CodeCode Available | 2 |
| Mixed-Curvature Decision Trees and Random Forests | Jun 7, 2024 | | CodeCode Available | 2 |
| LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model | Jun 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting | Jun 7, 2024 | motion retargeting | CodeCode Available | 2 |
| Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks | Jun 7, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 2 |
| Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning | Jun 7, 2024 | Binary Classification | CodeCode Available | 2 |
| 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | Jun 7, 2024 | Hallucination | CodeCode Available | 2 |
| MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models | Jun 7, 2024 | FADText-to-Music Generation | CodeCode Available | 2 |
| MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks | Jun 7, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis | Jun 7, 2024 | Audio Synthesis | CodeCode Available | 2 |
| Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting | Jun 6, 2024 | Computational EfficiencyData Integration | CodeCode Available | 2 |
| Tool-Planner: Task Planning with Clusters across Multiple Tools | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| GenAI Arena: An Open Evaluation Platform for Generative Models | Jun 6, 2024 | Image GenerationInstruction Following | CodeCode Available | 2 |
| Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data | Jun 6, 2024 | DenoisingLanguage Modeling | CodeCode Available | 2 |
| Parameter-Inverted Image Pyramid Networks | Jun 6, 2024 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| Simplified and Generalized Masked Diffusion for Discrete Data | Jun 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BLSP-Emo: Towards Empathetic Large Speech-Language Models | Jun 6, 2024 | Emotion RecognitionInstruction Following | CodeCode Available | 2 |
| Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning | Jun 6, 2024 | Multi-Task LearningVulnerability Detection | CodeCode Available | 2 |
| How Far Can We Compress Instant-NGP-Based NeRF? | Jun 6, 2024 | NeRF | CodeCode Available | 2 |
| MAIRA-2: Grounded Radiology Report Generation | Jun 6, 2024 | Text Generation | CodeCode Available | 2 |
| UltraMedical: Building Specialized Generalists in Biomedicine | Jun 6, 2024 | | CodeCode Available | 2 |
| Evaluating the World Model Implicit in a Generative Model | Jun 6, 2024 | Logical Reasoningmodel | CodeCode Available | 2 |
| Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis | Jun 6, 2024 | Conditional Text-to-Image SynthesisImage Generation | CodeCode Available | 2 |
| DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data | Jun 6, 2024 | 3D GenerationText to 3D | CodeCode Available | 2 |
| Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction | Jun 6, 2024 | NeRF | CodeCode Available | 2 |
| Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement Learning | Jun 6, 2024 | Multi-agent Reinforcement Learning | CodeCode Available | 2 |
| CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection | Jun 6, 2024 | Change DetectionMamba | CodeCode Available | 2 |
| VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling | Jun 6, 2024 | DiversityMusic Generation | CodeCode Available | 2 |