| A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules | Mar 17, 2025 | | CodeCode Available | 1 |
| Sampling Innovation-Based Adaptive Compressive Sensing | Mar 17, 2025 | Compressive SensingImage Reconstruction | CodeCode Available | 1 |
| Atlas: Multi-Scale Attention Improves Long Context Image Modeling | Mar 16, 2025 | | CodeCode Available | 1 |
| GS-I^3: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images | Mar 16, 2025 | 3DGSComputational Efficiency | CodeCode Available | 1 |
| Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | Mar 16, 2025 | Autonomous DrivingRAG | CodeCode Available | 1 |
| BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis | Mar 16, 2025 | 3D Semantic SegmentationData Augmentation | CodeCode Available | 1 |
| EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera | Mar 16, 2025 | Gesture Recognition | CodeCode Available | 1 |
| SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models | Mar 16, 2025 | Drug Discovery | CodeCode Available | 1 |
| Modality-Composable Diffusion Policy via Inference-Time Distribution-level Composition | Mar 16, 2025 | | CodeCode Available | 1 |
| History-Aware Transformation of ReID Features for Multiple Object Tracking | Mar 16, 2025 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 1 |
| Exploring Contextual Attribute Density in Referring Expression Counting | Mar 16, 2025 | AttributeReferring Expression | CodeCode Available | 1 |
| Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma? | Mar 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network | Mar 16, 2025 | Emotion Recognition | CodeCode Available | 1 |
| DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement | Mar 16, 2025 | Image Enhancement | CodeCode Available | 1 |
| VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility | Mar 16, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 |
| LLM-Driven Multi-step Translation from C to Rust using Static Analysis | Mar 16, 2025 | Translation | CodeCode Available | 1 |
| Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition | Mar 16, 2025 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Semi-Decision-Focused Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization | Mar 16, 2025 | Portfolio Optimization | CodeCode Available | 1 |
| Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models | Mar 16, 2025 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation | Mar 16, 2025 | Image GenerationSpecificity | CodeCode Available | 1 |
| TERL: Large-Scale Multi-Target Encirclement Using Transformer-Enhanced Reinforcement Learning | Mar 16, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 1 |
| Hyperbolic Safety-Aware Vision-Language Models | Mar 15, 2025 | | CodeCode Available | 1 |
| Revisiting Training-Inference Trigger Intensity in Backdoor Attacks | Mar 15, 2025 | | CodeCode Available | 1 |
| QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution | Mar 15, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 1 |
| Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing | Mar 15, 2025 | Emotion Recognition | CodeCode Available | 1 |
| SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning | Mar 15, 2025 | Decision MakingManagement | CodeCode Available | 1 |
| Bench2FreeAD: A Benchmark for Vision-based End-to-end Navigation in Unstructured Robotic Environments | Mar 15, 2025 | Autonomous DrivingRobot Navigation | CodeCode Available | 1 |
| Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training | Mar 15, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 1 |
| Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis | Mar 15, 2025 | | CodeCode Available | 1 |
| SEAL: Semantic Aware Image Watermarking | Mar 15, 2025 | | CodeCode Available | 1 |
| O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models | Mar 15, 2025 | | CodeCode Available | 1 |
| 3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction | Mar 15, 2025 | 3D ReconstructionAutonomous Driving | CodeCode Available | 1 |
| Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction | Mar 14, 2025 | Semantic SegmentationVideo Reconstruction | CodeCode Available | 1 |
| CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning | Mar 14, 2025 | Long-Context Understanding | CodeCode Available | 1 |
| UStyle: Waterbody Style Transfer of Underwater Scenes by Depth-Guided Feature Synthesis | Mar 14, 2025 | Style Transfer | CodeCode Available | 1 |
| Observation-only learning of neural mapping schemes for gappy satellite-derived ocean colour parameters | Mar 14, 2025 | | CodeCode Available | 1 |
| MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Mar 14, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | CodeCode Available | 1 |
| DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation | Mar 14, 2025 | 3D geometryAutonomous Driving | CodeCode Available | 1 |
| Integrating Dynamical Systems Modeling with Spatiotemporal scRNA-seq Data Analysis | Mar 14, 2025 | Time Series | CodeCode Available | 1 |
| Variational Bayesian Personalized Ranking | Mar 14, 2025 | Collaborative FilteringContrastive Learning | CodeCode Available | 1 |
| LuSeg: Efficient Negative and Positive Obstacles Segmentation via Contrast-Driven Multi-Modal Feature Fusion on the Lunar | Mar 14, 2025 | Contrastive LearningSegmentation | CodeCode Available | 1 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning | Mar 14, 2025 | Machine Unlearning | CodeCode Available | 1 |
| A Survey of Cross-domain Graph Learning: Progress and Future Directions | Mar 14, 2025 | Graph LearningSurvey | CodeCode Available | 1 |
| GNNs as Predictors of Agentic Workflow Performances | Mar 14, 2025 | BenchmarkingPosition | CodeCode Available | 1 |
| GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior | Mar 14, 2025 | | CodeCode Available | 1 |
| Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty? | Mar 14, 2025 | Attribute | CodeCode Available | 1 |
| APLA: A Simple Adaptation Method for Vision Transformers | Mar 14, 2025 | ClassificationGPU | CodeCode Available | 1 |
| Similarity-Aware Token Pruning: Your VLM but Faster | Mar 14, 2025 | | CodeCode Available | 1 |