| DiffCLIP: Few-shot Language-driven Multimodal Classifier | Dec 10, 2024 | Few-Shot Learning | CodeCode Available | 1 |
| Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation | Dec 10, 2024 | FADMusic Generation | CodeCode Available | 1 |
| ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | Dec 10, 2024 | DenoisingImage Generation | CodeCode Available | 1 |
| RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution | Dec 10, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 1 |
| Bridging the Gap for Test-Time Multimodal Sentiment Analysis | Dec 10, 2024 | Multimodal Sentiment AnalysisPseudo Label | CodeCode Available | 1 |
| The Pitfalls of Memorization: When Memorization Hurts Generalization | Dec 10, 2024 | Memorization | CodeCode Available | 1 |
| Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer | Dec 10, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 1 |
| ArtFormer: Controllable Generation of Diverse 3D Articulated Objects | Dec 10, 2024 | | CodeCode Available | 1 |
| Neural Garment Dynamic Super-Resolution | Dec 9, 2024 | Super-Resolution | CodeCode Available | 1 |
| Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information | Dec 9, 2024 | Camera Pose EstimationComputational Efficiency | CodeCode Available | 1 |
| PyPulse: A Python Library for Biosignal Imputation | Dec 9, 2024 | Imputation | CodeCode Available | 1 |
| PolytopeWalk: Sparse MCMC Sampling over Polytopes | Dec 9, 2024 | Uncertainty Quantification | CodeCode Available | 1 |
| Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation | Dec 9, 2024 | Object LocalizationVision and Language Navigation | CodeCode Available | 1 |
| From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding | Dec 9, 2024 | | CodeCode Available | 1 |
| Understanding Gradient Descent through the Training Jacobian | Dec 9, 2024 | | CodeCode Available | 1 |
| iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models | Dec 9, 2024 | | CodeCode Available | 1 |
| Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs | Dec 9, 2024 | Mamba | CodeCode Available | 1 |
| Source Separation & Automatic Transcription for Music | Dec 9, 2024 | Music TranscriptionSpeech Enhancement | CodeCode Available | 1 |
| Continual Learning for Segment Anything Model Adaptation | Dec 9, 2024 | Continual Learningmodel | CodeCode Available | 1 |
| ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models | Dec 9, 2024 | Graph GenerationScene Graph Generation | CodeCode Available | 1 |
| XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder | Dec 9, 2024 | Brain Tumor SegmentationDecoder | CodeCode Available | 1 |
| AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation | Dec 9, 2024 | | CodeCode Available | 1 |
| Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters | Dec 9, 2024 | Image GenerationNavigate | CodeCode Available | 1 |
| VOPy: A Framework for Black-box Vector Optimization | Dec 9, 2024 | | CodeCode Available | 1 |
| I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token | Dec 9, 2024 | World Knowledge | CodeCode Available | 1 |
| Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment | Dec 9, 2024 | Sound Source Localization | CodeCode Available | 1 |
| ECGtizer: a fully automated digitizing and signal recovery pipeline for electrocardiograms | Dec 9, 2024 | Diagnostic | CodeCode Available | 1 |
| Enhanced Multi-Object Tracking Using Pose-based Virtual Markers in 3x3 Basketball | Dec 9, 2024 | Active LearningMulti-Object Tracking | CodeCode Available | 1 |
| GenAI4UQ: A Software for Inverse Uncertainty Quantification Using Conditional Generative Models | Dec 9, 2024 | parameter estimationUncertainty Quantification | CodeCode Available | 1 |
| Digital Transformation in the Water Distribution System based on the Digital Twins Concept | Dec 9, 2024 | Decision MakingScheduling | CodeCode Available | 1 |
| Ranking-aware adapter for text-driven image ordering with CLIP | Dec 9, 2024 | Age EstimationImage Quality Assessment | CodeCode Available | 1 |
| Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation | Dec 9, 2024 | Domain AdaptationImage Segmentation | CodeCode Available | 1 |
| AutoReason: Automatic Few-Shot Reasoning Decomposition | Dec 9, 2024 | StrategyQA | CodeCode Available | 1 |
| LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Multi-Behavior Recommendation with Personalized Directed Acyclic Behavior Graphs | Dec 9, 2024 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power Systems | Dec 9, 2024 | BenchmarkingPrediction | CodeCode Available | 1 |
| MCP-MedSAM: A Powerful Lightweight Medical Segment Anything Model Trained with a Single GPU in Just One Day | Dec 8, 2024 | GPUImage Segmentation | CodeCode Available | 1 |
| KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models | Dec 8, 2024 | Instruction FollowingNatural Language Understanding | CodeCode Available | 1 |
| Post-hoc Probabilistic Vision-Language Models | Dec 8, 2024 | Active LearningUncertainty Quantification | CodeCode Available | 1 |
| DapperFL: Domain Adaptive Federated Learning with Model Fusion Pruning for Edge Devices | Dec 8, 2024 | Edge-computingFederated Learning | CodeCode Available | 1 |
| [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs | Dec 8, 2024 | | CodeCode Available | 1 |
| TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model | Dec 8, 2024 | Cell Detection | CodeCode Available | 1 |
| Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models | Dec 8, 2024 | | CodeCode Available | 1 |
| BiDM: Pushing the Limit of Quantization for Diffusion Models | Dec 8, 2024 | BinarizationImage Generation | CodeCode Available | 1 |
| FlexDiT: Dynamic Token Density Control for Diffusion Transformer | Dec 8, 2024 | Computational EfficiencyDenoising | CodeCode Available | 1 |
| On Diffusion Posterior Sampling via Sequential Monte Carlo for Zero-Shot Scaffolding of Protein Motifs | Dec 8, 2024 | | CodeCode Available | 1 |
| Multispecies Animal Re-ID Using a Large Community-Curated Dataset | Dec 7, 2024 | | CodeCode Available | 1 |
| KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models | Dec 7, 2024 | Multi-hop Question AnsweringNavigate | CodeCode Available | 1 |
| CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images | Dec 7, 2024 | Segmentation | CodeCode Available | 1 |
| Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes | Dec 7, 2024 | Quantization | CodeCode Available | 1 |