| Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space | Mar 14, 2025 | | CodeCode Available | 1 |
| A Novel Decomposed Feature-Oriented Framework for Open-Set Semantic Segmentation on LiDAR Data | Mar 14, 2025 | Anomaly DetectionDecoder | CodeCode Available | 1 |
| Image-Goal Navigation Using Refined Feature Guidance and Scene Graph Enhancement | Mar 14, 2025 | | CodeCode Available | 1 |
| Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation | Mar 14, 2025 | DeblurringDepth Estimation | CodeCode Available | 1 |
| Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models | Mar 14, 2025 | Audio TaggingEvent Detection | CodeCode Available | 1 |
| Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages | Mar 14, 2025 | parameter-efficient fine-tuning | CodeCode Available | 1 |
| Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models | Mar 14, 2025 | Image GenerationNovel View Synthesis | CodeCode Available | 1 |
| CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control | Mar 14, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 |
| BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model | Mar 14, 2025 | Autonomous DrivingData Augmentation | CodeCode Available | 1 |
| GMG: A Video Prediction Method Based on Global Focus and Motion Guided | Mar 14, 2025 | Video PredictionWeather Forecasting | CodeCode Available | 1 |
| VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search | Mar 13, 2025 | Image RetrievalMath | CodeCode Available | 1 |
| Interpretable Image Classification via Non-parametric Part Prototype Learning | Mar 13, 2025 | image-classificationImage Classification | CodeCode Available | 1 |
| Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM | Mar 13, 2025 | Autonomous DrivingDecoder | CodeCode Available | 1 |
| OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding | Mar 13, 2025 | ObjectVideo Grounding | CodeCode Available | 1 |
| OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary | Mar 13, 2025 | Out-of-Distribution DetectionOut of Distribution (OOD) Detection | CodeCode Available | 1 |
| Label Unbalance in High-frequency Trading | Mar 13, 2025 | Deep Learning | CodeCode Available | 1 |
| Enhancing Facial Privacy Protection via Weakening Diffusion Purification | Mar 13, 2025 | Face Recognition | CodeCode Available | 1 |
| Large-scale Pre-training for Grounded Video Caption Generation | Mar 13, 2025 | Caption Generation | CodeCode Available | 1 |
| From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM | Mar 13, 2025 | Translation | CodeCode Available | 1 |
| VisTai: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan | Mar 13, 2025 | BenchmarkingDialogue Generation | CodeCode Available | 1 |
| Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction | Mar 13, 2025 | Medical Image AnalysisSuper-Resolution | CodeCode Available | 1 |
| OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction | Mar 13, 2025 | Autonomous DrivingNavigate | CodeCode Available | 1 |
| UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? | Mar 13, 2025 | | CodeCode Available | 1 |
| StableFusion: Continual Video Retrieval via Frame Adaptation | Mar 13, 2025 | Continual LearningMixture-of-Experts | CodeCode Available | 1 |
| TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention | Mar 13, 2025 | HallucinationObject Hallucination | CodeCode Available | 1 |
| EFC++: Elastic Feature Consolidation with Prototype Re-balancing for Cold Start Exemplar-free Incremental Learning | Mar 13, 2025 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 |
| MetricGrids: Arbitrary Nonlinear Approximation with Elementary Metric Grids based Implicit Neural Representation | Mar 13, 2025 | Decoder | CodeCode Available | 1 |
| Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | Mar 13, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Low Complexity Point Tracking of the Myocardium in 2D Echocardiography | Mar 13, 2025 | GPUPoint Tracking | CodeCode Available | 1 |
| Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology | Mar 13, 2025 | Contrastive Learning | CodeCode Available | 1 |
| AI-assisted Early Detection of Pancreatic Ductal Adenocarcinoma on Contrast-enhanced CT | Mar 13, 2025 | | CodeCode Available | 1 |
| CoSTA: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing | Mar 13, 2025 | | CodeCode Available | 1 |
| Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings | Mar 13, 2025 | Speaker Identificationspeech-recognition | CodeCode Available | 1 |
| A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection | Mar 13, 2025 | object-detectionObject Detection | CodeCode Available | 1 |
| KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception | Mar 13, 2025 | Video Quality AssessmentVisual Question Answering (VQA) | CodeCode Available | 1 |
| Mamba time series forecasting with uncertainty quantification | Mar 13, 2025 | MambaProbabilistic Time Series Forecasting | CodeCode Available | 1 |
| AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption | Mar 13, 2025 | Image Generation | CodeCode Available | 1 |
| How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game | Mar 13, 2025 | Multimodal ReasoningQuestion Answering | CodeCode Available | 1 |
| Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation | Mar 13, 2025 | Earth Observation | CodeCode Available | 1 |
| Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks | Mar 13, 2025 | Federated LearningPrivacy Preserving | CodeCode Available | 1 |
| TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models | Mar 13, 2025 | | CodeCode Available | 1 |
| An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation | Mar 13, 2025 | MuJoCo | CodeCode Available | 1 |
| RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation | Mar 13, 2025 | 3D Object Detectionobject-detection | CodeCode Available | 1 |
| High-Resolution Uplink Sensing in Millimeter-Wave ISAC Systems | Mar 13, 2025 | Integrated sensing and communicationISAC | CodeCode Available | 1 |
| Image Quality Assessment: From Human to Machine Preference | Mar 13, 2025 | Image Quality Assessment | CodeCode Available | 1 |
| Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification | Mar 13, 2025 | ClusteringDiversity | CodeCode Available | 1 |
| ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs | Mar 13, 2025 | | CodeCode Available | 1 |
| The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation | Mar 13, 2025 | | CodeCode Available | 1 |
| ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning | Mar 13, 2025 | Image RetrievalRetrieval | CodeCode Available | 1 |
| Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement | Mar 12, 2025 | Graph Representation LearningNode Classification | CodeCode Available | 1 |