| Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data | Jul 3, 2025 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection | Jul 3, 2025 | Face Swapping | CodeCode Available | 1 |
| Cautious Next Token Prediction | Jul 3, 2025 | Prediction | CodeCode Available | 1 |
| Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection | Jul 3, 2025 | | CodeCode Available | 1 |
| Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics | Jul 3, 2025 | image-classificationImage Classification | CodeCode Available | 1 |
| Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer | Jul 2, 2025 | | CodeCode Available | 1 |
| DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy | Jul 2, 2025 | Data AugmentationGeneralized Referring Expression Segmentation | CodeCode Available | 1 |
| Re-examining the Legendre-Gauss-Lobatto Pseudospectral Methods for Optimal Control | Jul 2, 2025 | | CodeCode Available | 1 |
| Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment | Jul 1, 2025 | Action RecognitionOne-Shot 3D Action Recognition | CodeCode Available | 1 |
| TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality | Jul 1, 2025 | Drug Design | CodeCode Available | 1 |
| LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling | Jul 1, 2025 | Image RestorationUnified Image Restoration | CodeCode Available | 1 |
| LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs | Jul 1, 2025 | Large Language Model | CodeCode Available | 1 |
| UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions | Jul 1, 2025 | Domain AdaptationObject Tracking | CodeCode Available | 1 |
| Real-Time Inverse Kinematics for Generating Multi-Constrained Movements of Virtual Human Characters | Jul 1, 2025 | | CodeCode Available | 1 |
| Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation | Jun 30, 2025 | Autonomous NavigationComputational Efficiency | CodeCode Available | 1 |
| Dataset Distillation via Vision-Language Category Prototype | Jun 30, 2025 | Dataset DistillationDescriptive | CodeCode Available | 1 |
| FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation | Jun 30, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 1 |
| Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking | Jun 30, 2025 | MambaObject Tracking | CodeCode Available | 1 |
| Refine Any Object in Any Scene | Jun 30, 2025 | Novel View SynthesisObject | CodeCode Available | 1 |
| Datasets for Fairness in Language Models: An In-Depth Survey | Jun 29, 2025 | Fairness | CodeCode Available | 1 |
| CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation | Jun 29, 2025 | Organ Segmentation | CodeCode Available | 1 |
| SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting | Jun 29, 2025 | 3D ReconstructionScene Understanding | CodeCode Available | 1 |
| CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation | Jun 29, 2025 | Image GenerationImage-to-Image Translation | CodeCode Available | 1 |
| TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure | Jun 29, 2025 | Music Generation | CodeCode Available | 1 |
| Where, What, Why: Towards Explainable Driver Attention Prediction | Jun 29, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder | Jun 28, 2025 | Image SegmentationLarge Language Model | CodeCode Available | 1 |
| UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields | Jun 27, 2025 | Hyperspectral UnmixingMaterial Segmentation | CodeCode Available | 1 |
| CaO_2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation | Jun 27, 2025 | Dataset Distillation | CodeCode Available | 1 |
| Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability | Jun 26, 2025 | Domain GeneralizationRobust Object Detection | CodeCode Available | 1 |
| DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion | Jun 26, 2025 | 3D Reconstruction | CodeCode Available | 1 |
| Post-training for Deepfake Speech Detection | Jun 26, 2025 | Face SwappingSelf-Supervised Learning | CodeCode Available | 1 |
| FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing | Jun 26, 2025 | | CodeCode Available | 1 |
| Out-of-Distribution Semantic Occupancy Prediction | Jun 26, 2025 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 1 |
| Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform | Jun 26, 2025 | | CodeCode Available | 1 |
| AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification | Jun 26, 2025 | Brain Computer InterfaceEEG | CodeCode Available | 1 |
| Ad-Hoc Human-AI Coordination Challenge | Jun 26, 2025 | | CodeCode Available | 1 |
| Learning to Skip the Middle Layers of Transformers | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions | Jun 26, 2025 | BenchmarkingDrug Design | CodeCode Available | 1 |
| ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation | Jun 26, 2025 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 1 |
| Exploring the Design Space of 3D MLLMs for CT Report Generation | Jun 26, 2025 | | CodeCode Available | 1 |
| IRanker: Towards Ranking Foundation Model | Jun 25, 2025 | GSM8Kmodel | CodeCode Available | 1 |
| Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision | Jun 25, 2025 | Contrastive LearningFeature Correlation | CodeCode Available | 1 |
| High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data | Jun 25, 2025 | Earth Observation | CodeCode Available | 1 |
| OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport | Jun 25, 2025 | Multiple Instance LearningSurvival Prediction | CodeCode Available | 1 |
| ReCode: Updating Code API Knowledge with Reinforcement Learning | Jun 25, 2025 | Code Generationreinforcement-learning | CodeCode Available | 1 |
| PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models | Jun 25, 2025 | | CodeCode Available | 1 |
| GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Recursive KalmanNet: Analyse des capacités de généralisation d'un réseau de neurones récurrent guidé par un filtre de Kalman | Jun 25, 2025 | | CodeCode Available | 1 |
| The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Jun 25, 2025 | Multi-agent Reinforcement LearningNavigate | CodeCode Available | 1 |
| Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration | Jun 25, 2025 | | CodeCode Available | 1 |