| Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography | May 20, 2024 | Breast Cancer DetectionDiversity | CodeCode Available | 2 |
| ProtT3: Protein-to-Text Generation for Text-based Protein Understanding | May 21, 2024 | Property PredictionQuestion Answering | CodeCode Available | 2 |
| ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles | May 22, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| Efficient Visual State Space Model for Image Deblurring | May 23, 2024 | DeblurringImage Deblurring | CodeCode Available | 2 |
| OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance | Nov 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models | May 23, 2024 | Benchmarking | CodeCode Available | 2 |
| Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | May 24, 2024 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 |
| Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | May 24, 2024 | Atari GamesMathematical Reasoning | CodeCode Available | 2 |
| Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models | May 24, 2024 | Image GenerationMachine Unlearning | CodeCode Available | 2 |
| KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge | May 26, 2024 | Graph EmbeddingInformativeness | CodeCode Available | 2 |
| Yuan 2.0-M32: Mixture of Experts with Attention Router | May 28, 2024 | ARCMath | CodeCode Available | 2 |
| Multi-Behavior Generative Recommendation | May 27, 2024 | Sequential Recommendation | CodeCode Available | 2 |
| NoteLLM-2: Multimodal Large Representation Models for Recommendation | May 27, 2024 | In-Context Learning | CodeCode Available | 2 |
| Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment | May 28, 2024 | cross-modal alignment | CodeCode Available | 2 |
| XTrack: Multimodal Training Boosts RGB-X Video Object Trackers | May 28, 2024 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 |
| Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving | May 28, 2024 | Autonomous DrivingBilevel Optimization | CodeCode Available | 2 |
| Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models | May 28, 2024 | AllComputational Efficiency | CodeCode Available | 2 |
| Easy Problems That LLMs Get Wrong | May 30, 2024 | Common Sense ReasoningLogical Reasoning | CodeCode Available | 2 |
| SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | May 31, 2024 | | CodeCode Available | 2 |
| Hybrid Fourier Score Distillation for Efficient One Image to 3D Object Generation | May 31, 2024 | 3D GenerationImage Generation | CodeCode Available | 2 |
| Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | May 31, 2024 | Red Teaming | CodeCode Available | 2 |
| BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models | Jun 19, 2023 | Instruction FollowingText Generation | CodeCode Available | 2 |
| DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs | Jun 3, 2024 | ManagementQuantization | CodeCode Available | 2 |
| Audio Mamba: Bidirectional State Space Model for Audio Representation Learning | Jun 5, 2024 | Audio ClassificationClassification | CodeCode Available | 2 |
| Poisoning Attacks and Defenses in Recommender Systems: A Survey | Jun 3, 2024 | Recommendation SystemsSurvey | CodeCode Available | 2 |
| Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control | Jul 19, 2024 | | CodeCode Available | 2 |
| Evaluating the World Model Implicit in a Generative Model | Jun 6, 2024 | Logical Reasoningmodel | CodeCode Available | 2 |
| How Far Can We Compress Instant-NGP-Based NeRF? | Jun 6, 2024 | NeRF | CodeCode Available | 2 |
| BLSP-Emo: Towards Empathetic Large Speech-Language Models | Jun 6, 2024 | Emotion RecognitionInstruction Following | CodeCode Available | 2 |
| MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models | Jun 7, 2024 | FADText-to-Music Generation | CodeCode Available | 2 |
| QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jun 11, 2024 | | CodeCode Available | 2 |
| Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning | Jun 7, 2024 | Binary Classification | CodeCode Available | 2 |
| MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models | Jun 10, 2024 | Language Modelling | CodeCode Available | 2 |
| GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection | Jun 11, 2024 | Anomaly DetectionDenoising | CodeCode Available | 2 |
| D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval | Jan 1, 2024 | Image RetrievalRetrieval | CodeCode Available | 2 |
| LVBench: An Extreme Long Video Understanding Benchmark | Jun 12, 2024 | Decision MakingVideo Understanding | CodeCode Available | 2 |
| Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | Jun 12, 2024 | Image Compression | CodeCode Available | 2 |
| Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding Network | Jun 12, 2024 | Image Dehazing | CodeCode Available | 2 |
| Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees | Jun 11, 2024 | | CodeCode Available | 2 |
| Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification | Jun 13, 2024 | Node ClassificationNode Property Prediction | CodeCode Available | 2 |
| On Softmax Direct Preference Optimization for Recommendation | Jun 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Fredformer: Frequency Debiased Transformer for Time Series Forecasting | Jun 13, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Understanding Hallucinations in Diffusion Models through Mode Interpolation | Jun 13, 2024 | HallucinationImage Generation | CodeCode Available | 2 |
| Toward Controlled Generation of Text | Mar 2, 2017 | AttributeSentence | CodeCode Available | 2 |
| Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Extracting Prompts by Inverting LLM Outputs | May 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes | Jul 16, 2024 | Human Instance SegmentationInstance Segmentation | CodeCode Available | 2 |
| Optimal Transport Aggregation for Visual Place Recognition | Nov 27, 2023 | Re-RankingVisual Place Recognition | CodeCode Available | 2 |
| Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions | Jun 27, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending | Oct 16, 2023 | Attribute | CodeCode Available | 2 |