| RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection | Dec 17, 2024 | 3D Object DetectionDecoder | CodeCode Available | 1 |
| Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training | Dec 17, 2024 | MambaToken Reduction | CodeCode Available | 1 |
| Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration | Dec 17, 2024 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 |
| CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics | Dec 17, 2024 | Objectobject-detection | CodeCode Available | 1 |
| Differential Alignment for Domain Adaptive Object Detection | Dec 17, 2024 | Objectobject-detection | CodeCode Available | 1 |
| ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding | Dec 17, 2024 | cross-modal alignment | CodeCode Available | 1 |
| CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models | Dec 17, 2024 | Multimodal Reasoning | CodeCode Available | 1 |
| Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning | Dec 17, 2024 | Anomaly Detection | CodeCode Available | 1 |
| GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models | Dec 17, 2024 | Long-range modeling | CodeCode Available | 1 |
| MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants | Dec 17, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| XPath Agent: An Efficient XPath Programming Agent Based on LLM for Web Crawler | Dec 17, 2024 | | CodeCode Available | 1 |
| Assessing the Limitations of Large Language Models in Clinical Fact Decomposition | Dec 17, 2024 | Fact VerificationSentence | CodeCode Available | 1 |
| RCLMuFN: Relational Context Learning and Multiplex Fusion Network for Multimodal Sarcasm Detection | Dec 17, 2024 | Sarcasm Detection | CodeCode Available | 1 |
| TimeCHEAT: A Channel Harmony Strategy for Irregularly Sampled Multivariate Time Series Analysis | Dec 17, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 1 |
| DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation | Dec 17, 2024 | Contrastive LearningImage Segmentation | CodeCode Available | 1 |
| A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis | Dec 17, 2024 | DiagnosticSpecificity | CodeCode Available | 1 |
| ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation | Dec 17, 2024 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script | Dec 17, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 |
| MT-LENS: An all-in-one Toolkit for Better Machine Translation Evaluation | Dec 16, 2024 | AllBenchmarking | CodeCode Available | 1 |
| Re-Attentional Controllable Video Diffusion Editing | Dec 16, 2024 | DenoisingVideo Editing | CodeCode Available | 1 |
| Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings | Dec 16, 2024 | Disaster Responsegeo-localization | CodeCode Available | 1 |
| Universal Domain Adaptive Object Detection via Dual Probabilistic Alignment | Dec 16, 2024 | Domain Adaptationobject-detection | CodeCode Available | 1 |
| SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models | Dec 16, 2024 | Instruction Following | CodeCode Available | 1 |
| TS-SatFire: A Multi-Task Satellite Image Time-Series Dataset for Wildfire Detection and Prediction | Dec 16, 2024 | Earth ObservationFire Detection | CodeCode Available | 1 |
| Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves | Dec 16, 2024 | Transfer Learning | CodeCode Available | 1 |
| SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types | Dec 16, 2024 | Question Answering | CodeCode Available | 1 |
| Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture | Dec 16, 2024 | Mixture-of-ExpertsPosition | CodeCode Available | 1 |
| Beyond Graph Convolution: Multimodal Recommendation with Topology-aware MLPs | Dec 16, 2024 | Multimodal RecommendationRecommendation Systems | CodeCode Available | 1 |
| 3D^2-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling | Dec 16, 2024 | 3D ReconstructionDenoising | CodeCode Available | 1 |
| GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training | Dec 16, 2024 | Geometry Problem Solving | CodeCode Available | 1 |
| AMI-Net: Adaptive Mask Inpainting Network for Industrial Anomaly Detection and Localization | Dec 16, 2024 | Anomaly Detection | CodeCode Available | 1 |
| SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval | Dec 16, 2024 | FormInformation Retrieval | CodeCode Available | 1 |
| Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation | Dec 16, 2024 | DiversitySemantic Segmentation | CodeCode Available | 1 |
| Data-driven Precipitation Nowcasting Using Satellite Imagery | Dec 16, 2024 | Precipitation Forecasting | CodeCode Available | 1 |
| Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments | Dec 16, 2024 | Clinical KnowledgeCollege Medicine | CodeCode Available | 1 |
| Conditional Diffusion Models Based Conditional Independence Testing | Dec 16, 2024 | | CodeCode Available | 1 |
| MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models | Dec 16, 2024 | Quantization | CodeCode Available | 1 |
| RAG Playground: A Framework for Systematic Evaluation of Retrieval Strategies and Prompt Engineering in RAG Systems | Dec 16, 2024 | Prompt EngineeringRAG | CodeCode Available | 1 |
| Does VLM Classification Benefit from LLM Description Semantics? | Dec 16, 2024 | Classificationimage-classification | CodeCode Available | 1 |
| Region-Based Optimization in Continual Learning for Audio Deepfake Detection | Dec 16, 2024 | Audio Deepfake DetectionContinual Learning | CodeCode Available | 1 |
| IDEA-Bench: How Far are Generative Models from Professional Designing? | Dec 16, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP | Dec 16, 2024 | Few-Shot Learning | CodeCode Available | 1 |
| Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces | Dec 16, 2024 | AllDrug Design | CodeCode Available | 1 |
| Aligning Visual and Semantic Interpretability through Visually Grounded Concept Bottleneck Models | Dec 16, 2024 | Specificity | CodeCode Available | 1 |
| Deep Random Features for Scalable Interpolation of Spatiotemporal Data | Dec 16, 2024 | Earth ObservationGaussian Processes | CodeCode Available | 1 |
| Relation-Guided Adversarial Learning for Data-free Knowledge Transfer | Dec 16, 2024 | Data-free Knowledge DistillationData Free Quantization | CodeCode Available | 1 |
| StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors | Dec 16, 2024 | DiversityText to 3D | CodeCode Available | 1 |
| IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation | Dec 16, 2024 | Image Generation | CodeCode Available | 1 |
| RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement | Dec 16, 2024 | Reinforcement Learning (RL) | CodeCode Available | 1 |
| Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising | Dec 16, 2024 | DenoisingOptical Flow Estimation | CodeCode Available | 1 |