| Frequency-Based Alignment of EEG and Audio Signals Using Contrastive Learning and SincNet for Auditory Attention Detection | Mar 6, 2025 | Contrastive LearningEEG | CodeCode Available | 1 |
| ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images | Mar 6, 2025 | 3D Place RecognitionLoop Closure Detection | CodeCode Available | 1 |
| L^2M: Mutual Information Scaling Law for Long-Context Language Modeling | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| GBT-SAM: Adapting a Foundational Deep Learning Model for Generalizable Brain Tumor Segmentation via Efficient Integration of Multi-Parametric MRI Data | Mar 6, 2025 | Brain Tumor SegmentationImage Segmentation | CodeCode Available | 1 |
| Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments | Mar 6, 2025 | Image SegmentationManagement | CodeCode Available | 1 |
| WeakMedSAM: Weakly-Supervised Medical Image Segmentation via SAM with Sub-Class Exploration and Prompt Affinity Mining | Mar 6, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 1 |
| Question-Aware Gaussian Experts for Audio-Visual Question Answering | Mar 6, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization | Mar 5, 2025 | Autonomous Driving | CodeCode Available | 1 |
| DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering | Mar 5, 2025 | 3D Question Answering (3D-QA)Question Answering | CodeCode Available | 1 |
| Unified Human Localization and Trajectory Prediction with Monocular Vision | Mar 5, 2025 | PredictionTrajectory Prediction | CodeCode Available | 1 |
| DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles | Mar 5, 2025 | Domain AdaptationImage to text | CodeCode Available | 1 |
| SEAL: Safety Enhanced Trajectory Planning and Control Framework for Quadrotor Flight in Complex Environments | Mar 5, 2025 | Model Predictive ControlTrajectory Planning | CodeCode Available | 1 |
| DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Mar 5, 2025 | 3D Object DetectionBEV Segmentation | CodeCode Available | 1 |
| LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models | Mar 5, 2025 | | CodeCode Available | 1 |
| PiEEG kit -- bioscience Lab in home for your Brain and Body | Mar 5, 2025 | EEG | CodeCode Available | 1 |
| Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Mar 5, 2025 | DecoderVideo Compression | CodeCode Available | 1 |
| Cross-modal Causal Relation Alignment for Video Question Grounding | Mar 5, 2025 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model | Mar 5, 2025 | Image InpaintingSegmentation | CodeCode Available | 1 |
| UnPuzzle: A Unified Framework for Pathology Image Analysis | Mar 5, 2025 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Mar 5, 2025 | Computational EfficiencyDescriptive | CodeCode Available | 1 |
| The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models | Mar 5, 2025 | | CodeCode Available | 1 |
| DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data | Mar 5, 2025 | Computational EfficiencyDimensionality Reduction | CodeCode Available | 1 |
| CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP | Mar 5, 2025 | Adversarial RobustnessImage-text matching | CodeCode Available | 1 |
| State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models | Mar 5, 2025 | parameter-efficient fine-tuningState Space Models | CodeCode Available | 1 |
| Distilling Dataset into Neural Field | Mar 5, 2025 | Dataset Distillation | CodeCode Available | 1 |
| Improving LLM Safety Alignment with Dual-Objective Optimization | Mar 5, 2025 | Safety Alignment | CodeCode Available | 1 |
| Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers | Mar 5, 2025 | Relation | CodeCode Available | 1 |
| Optimizing for the Shortest Path in Denoising Diffusion Model | Mar 5, 2025 | Denoising | CodeCode Available | 1 |
| Bridging Molecular Graphs and Large Language Models | Mar 5, 2025 | Few-Shot Learning | CodeCode Available | 1 |
| Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence | Mar 5, 2025 | | CodeCode Available | 1 |
| MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving | Mar 5, 2025 | Automated Theorem ProvingTransfer Learning | CodeCode Available | 1 |
| Wyckoff Transformer: Generation of Symmetric Crystals | Mar 4, 2025 | Inductive BiasProperty Prediction | CodeCode Available | 1 |
| Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization | Mar 4, 2025 | | CodeCode Available | 1 |
| SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models | Mar 4, 2025 | Image Description | CodeCode Available | 1 |
| Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation | Mar 4, 2025 | Column Type AnnotationIn-Context Learning | CodeCode Available | 1 |
| Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts | Mar 4, 2025 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| Multimodal AI predicts clinical outcomes of drug combinations from preclinical data | Mar 4, 2025 | Large Language Model | CodeCode Available | 1 |
| InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model | Mar 4, 2025 | es-enLanguage Modeling | CodeCode Available | 1 |
| Monocular visual simultaneous localization and mapping: (r)evolution from geometry to deep learning-based pipelines | Mar 4, 2025 | Simultaneous Localization and Mapping | CodeCode Available | 1 |
| Words or Vision: Do Vision-Language Models Have Blind Faith in Text? | Mar 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SeqFusion: Sequential Fusion of Pre-Trained Models for Zero-Shot Time-Series Forecasting | Mar 4, 2025 | Time SeriesTime Series Forecasting | CodeCode Available | 1 |
| XFMamba: Cross-Fusion Mamba for Multi-View Medical Image Classification | Mar 4, 2025 | Classificationimage-classification | CodeCode Available | 1 |
| The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models | Mar 4, 2025 | | CodeCode Available | 1 |
| Diverse Controllable Diffusion Policy with Signal Temporal Logic | Mar 4, 2025 | | CodeCode Available | 1 |
| SAGE: Steering and Refining Dialog Generation with State-Action Augmentation | Mar 4, 2025 | Dialogue GenerationEmotional Intelligence | CodeCode Available | 1 |
| LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset | Mar 4, 2025 | Classificationobject-detection | CodeCode Available | 1 |
| MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation | Mar 4, 2025 | Font GenerationMixture-of-Experts | CodeCode Available | 1 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 |
| ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models | Mar 4, 2025 | Image Generation | CodeCode Available | 1 |
| Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs | Mar 4, 2025 | | CodeCode Available | 1 |