| LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | Mar 18, 2025 | Automated Feature EngineeringFeature Engineering | CodeCode Available | 2 |
| Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting | Mar 18, 2025 | Instance SegmentationObject | CodeCode Available | 2 |
| LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection | Mar 18, 2025 | Computational Efficiencyobject-detection | CodeCode Available | 2 |
| Reinforcement learning-based motion imitation for physiologically plausible musculoskeletal motor control | Mar 18, 2025 | Humanoid ControlMotion Synthesis | CodeCode Available | 2 |
| PENCIL: Long Thoughts with Short Memory | Mar 18, 2025 | | CodeCode Available | 2 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models | Mar 17, 2025 | Computational EfficiencyHallucination | CodeCode Available | 2 |
| HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model | Mar 17, 2025 | Image SegmentationSegmentation | CodeCode Available | 2 |
| ViSpeak: Visual Instruction Feedback in Streaming Videos | Mar 17, 2025 | Streaming video understandingVideo Understanding | CodeCode Available | 2 |
| All You Need to Know About Training Image Retrieval Models | Mar 17, 2025 | AllImage Retrieval | CodeCode Available | 2 |
| Free-form language-based robotic reasoning and grasping | Mar 17, 2025 | FormRobotic Grasping | CodeCode Available | 2 |
| DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry | Mar 17, 2025 | valid | CodeCode Available | 2 |
| Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation | Mar 17, 2025 | Domain AdaptationDomain Generalization | CodeCode Available | 2 |
| GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching | Mar 17, 2025 | Autonomous DrivingImage Generation | CodeCode Available | 2 |
| φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation | Mar 17, 2025 | | CodeCode Available | 2 |
| Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation | Mar 17, 2025 | | CodeCode Available | 2 |
| Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception | Mar 17, 2025 | Future predictionScene Generation | CodeCode Available | 2 |
| Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation | Mar 17, 2025 | Data InteractionScene Understanding | CodeCode Available | 2 |
| TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM | Mar 17, 2025 | Video Grounding | CodeCode Available | 2 |
| Multi-modal Time Series Analysis: A Tutorial and Survey | Mar 17, 2025 | SurveyTime Series | CodeCode Available | 2 |
| RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars | Mar 17, 2025 | | CodeCode Available | 2 |
| WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes | Mar 17, 2025 | 3D Reconstruction4D reconstruction | CodeCode Available | 2 |
| Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process | Mar 17, 2025 | Anomaly Detection | CodeCode Available | 2 |
| DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding | Mar 17, 2025 | Domain GeneralizationMultimodal Reasoning | CodeCode Available | 2 |
| MTGS: Multi-Traversal Gaussian Splatting | Mar 16, 2025 | NavigateNovel View Synthesis | CodeCode Available | 2 |
| AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration | Mar 16, 2025 | Camera Calibration | CodeCode Available | 2 |
| Discovering uncertainty: Gaussian constitutive neural networks with correlated weights | Mar 16, 2025 | | CodeCode Available | 2 |
| RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds | Mar 16, 2025 | GPU | CodeCode Available | 2 |
| AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding | Mar 16, 2025 | Video Understanding | CodeCode Available | 2 |
| STEVE: A Step Verification Pipeline for Computer-use Agent Training | Mar 16, 2025 | | CodeCode Available | 2 |
| MambaIC: State Space Models for High-Performance Learned Image Compression | Mar 16, 2025 | Image CompressionState Space Models | CodeCode Available | 2 |
| Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View | Mar 16, 2025 | 3D Scene ReconstructionDecoder | CodeCode Available | 2 |
| A Comprehensive Survey on Knowledge Distillation | Mar 15, 2025 | Knowledge DistillationSurvey | CodeCode Available | 2 |
| ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object | Mar 15, 2025 | Domain AdaptationInteractive Segmentation | CodeCode Available | 2 |
| SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering | Mar 15, 2025 | Scene GenerationVideo Generation | CodeCode Available | 2 |
| Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection | Mar 15, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Datrics Text2SQL. A Framework for Natural Language to SQL Query Generation | Mar 15, 2025 | Natural Language QueriesRAG | CodeCode Available | 2 |
| A Survey on Federated Fine-tuning of Large Language Models | Mar 15, 2025 | Federated Learningparameter-efficient fine-tuning | CodeCode Available | 2 |
| FastVID: Dynamic Density Pruning for Fast Video Large Language Models | Mar 14, 2025 | | CodeCode Available | 2 |
| Towards a Unified Copernicus Foundation Model for Earth Vision | Mar 14, 2025 | Earth Observation | CodeCode Available | 2 |
| Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery | Mar 14, 2025 | Management | CodeCode Available | 2 |
| Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models | Mar 14, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption | Mar 14, 2025 | Full reference image quality assessmentFull-Reference Image Quality Assessment | CodeCode Available | 2 |
| Generative Modeling for Mathematical Discovery | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards | Mar 14, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| Cloud2BIM: An open-source automatic pipeline for efficient conversion of large-scale point clouds into IFC format | Mar 14, 2025 | | CodeCode Available | 2 |
| How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook | Mar 14, 2025 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| AQUA-SLAM: Tightly-Coupled Underwater Acoustic-Visual-Inertial SLAM with Sensor Calibration | Mar 14, 2025 | Simultaneous Localization and Mapping | CodeCode Available | 2 |
| TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing | Mar 14, 2025 | | CodeCode Available | 2 |
| RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors | Mar 13, 2025 | 3DGS | CodeCode Available | 2 |