| TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification | Mar 9, 2025 | Robot NavigationSTS | CodeCode Available | 1 |
| Dynamic Dictionary Learning for Remote Sensing Image Segmentation | Mar 9, 2025 | Dictionary LearningImage Segmentation | CodeCode Available | 1 |
| TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Mar 9, 2025 | Action LocalizationBoundary Detection | CodeCode Available | 1 |
| M^3amba: CLIP-driven Mamba Model for Multi-modal Remote Sensing Classification | Mar 9, 2025 | Computational EfficiencyHyperspectral Image Classification | CodeCode Available | 1 |
| Dynamic Updates for Language Adaptation in Visual-Language Tracking | Mar 9, 2025 | Large Language Model | CodeCode Available | 1 |
| Online Dense Point Tracking with Streaming Memory | Mar 9, 2025 | Optical Flow EstimationPoint Tracking | CodeCode Available | 1 |
| Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning | Mar 9, 2025 | Federated Learning | CodeCode Available | 1 |
| Sign Language Translation using Frame and Event Stream: Benchmark Dataset and Algorithms | Mar 9, 2025 | Sign Language TranslationTranslation | CodeCode Available | 1 |
| Dynamics-Invariant Quadrotor Control using Scale-Aware Deep Reinforcement Learning | Mar 9, 2025 | Deep Reinforcement Learningreinforcement-learning | CodeCode Available | 1 |
| One-Step Diffusion Model for Image Motion-Deblurring | Mar 9, 2025 | DeblurringDenoising | CodeCode Available | 1 |
| QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation | Mar 9, 2025 | QuantizationVideo Generation | CodeCode Available | 1 |
| BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling | Mar 8, 2025 | Meta-LearningTime Series | CodeCode Available | 1 |
| CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset | Mar 8, 2025 | Multiple-choice | CodeCode Available | 1 |
| PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model | Mar 8, 2025 | DenoisingImage Generation | CodeCode Available | 1 |
| Secure On-Device Video OOD Detection Without Backpropagation | Mar 8, 2025 | Autonomous DrivingFederated Learning | CodeCode Available | 1 |
| Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models | Mar 8, 2025 | | CodeCode Available | 1 |
| SRM-Hair: Single Image Head Mesh Reconstruction via 3D Morphable Hair | Mar 8, 2025 | | CodeCode Available | 1 |
| DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation | Mar 8, 2025 | Video Generation | CodeCode Available | 1 |
| Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations | Mar 8, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| GeoLangBind: Unifying Earth Observation with Agglomerative Vision-Language Foundation Models | Mar 8, 2025 | Earth Observation | CodeCode Available | 1 |
| Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices | Mar 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior | Mar 8, 2025 | Dichotomous Image SegmentationImage Segmentation | CodeCode Available | 1 |
| Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning | Mar 8, 2025 | Deep Reinforcement LearningRepresentation Learning | CodeCode Available | 1 |
| HealthiVert-GAN: A Novel Framework of Pseudo-Healthy Vertebral Image Synthesis for Interpretable Compression Fracture Grading | Mar 8, 2025 | Computed Tomography (CT)Diagnostic | CodeCode Available | 1 |
| STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification | Mar 8, 2025 | DisentanglementPseudo Label | CodeCode Available | 1 |
| VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion | Mar 8, 2025 | 3D Semantic Scene CompletionAutonomous Driving | CodeCode Available | 1 |
| Spatial Distillation based Distribution Alignment (SDDA) for Cross-Headset EEG Classification | Mar 7, 2025 | Brain Computer InterfaceDomain Adaptation | CodeCode Available | 1 |
| Strategy Coopetition Explains the Emergence and Transience of In-Context Learning | Mar 7, 2025 | In-Context Learning | CodeCode Available | 1 |
| Ontology Generation using Large Language Models | Mar 7, 2025 | | CodeCode Available | 1 |
| FastMap: Fast Queries Initialization Based Vectorized HD Map Reconstruction Framework | Mar 7, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 1 |
| AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data | Mar 7, 2025 | DiversityFairness | CodeCode Available | 1 |
| AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications | Mar 7, 2025 | Program Synthesis | CodeCode Available | 1 |
| Semantic Shift Estimation via Dual-Projection and Classifier Reconstruction for Exemplar-Free Class-Incremental Learning | Mar 7, 2025 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 |
| Novel Object 6D Pose Estimation with a Single Reference View | Mar 7, 2025 | 6D Pose EstimationPose Estimation | CodeCode Available | 1 |
| MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification | Mar 7, 2025 | Action RecognitionEEG | CodeCode Available | 1 |
| QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution | Mar 7, 2025 | DenoisingImage Super-Resolution | CodeCode Available | 1 |
| When can we get away with using the two-way fixed effects regression? | Mar 7, 2025 | EconometricsNavigate | CodeCode Available | 1 |
| On a Connection Between Imitation Learning and RLHF | Mar 7, 2025 | Imitation Learning | CodeCode Available | 1 |
| AVA: Attentive VLM Agent for Mastering StarCraft II | Mar 7, 2025 | Retrieval-augmented GenerationSMAC | CodeCode Available | 1 |
| RocketEval: Efficient Automated LLM Evaluation via Grading Checklist | Mar 7, 2025 | | CodeCode Available | 1 |
| FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data | Mar 7, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing | Mar 7, 2025 | Bayesian Inference | CodeCode Available | 1 |
| Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation | Mar 6, 2025 | DecoderGPU | CodeCode Available | 1 |
| WeakMedSAM: Weakly-Supervised Medical Image Segmentation via SAM with Sub-Class Exploration and Prompt Affinity Mining | Mar 6, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 1 |
| ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images | Mar 6, 2025 | 3D Place RecognitionLoop Closure Detection | CodeCode Available | 1 |
| The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation | Mar 6, 2025 | Semantic CompressionVideo Generation | CodeCode Available | 1 |
| TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge | Mar 6, 2025 | Predictionregression | CodeCode Available | 1 |
| Subgraph Federated Learning for Local Generalization | Mar 6, 2025 | Federated Learning | CodeCode Available | 1 |
| Question-Aware Gaussian Experts for Audio-Visual Question Answering | Mar 6, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments | Mar 6, 2025 | Image SegmentationManagement | CodeCode Available | 1 |