| Modifying Large Language Model Post-Training for Diverse Creative Writing | Mar 21, 2025 | DiversityLanguage Modeling | CodeCode Available | 2 |
| Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID | Mar 21, 2025 | | CodeCode Available | 2 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 |
| Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping | Mar 21, 2025 | GPUMotion Estimation | CodeCode Available | 2 |
| Dereflection Any Image with Diffusion Priors and Diversified Data | Mar 21, 2025 | DiversityReflection Removal | CodeCode Available | 2 |
| Learning Multi-Level Features with Matryoshka Sparse Autoencoders | Mar 21, 2025 | | CodeCode Available | 2 |
| CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 |
| OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Mar 21, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow | Mar 21, 2025 | DiagnosticLogical Reasoning | CodeCode Available | 2 |
| Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images | Mar 21, 2025 | Image SegmentationMamba | CodeCode Available | 2 |
| Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting | Mar 21, 2025 | | CodeCode Available | 2 |
| NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes | Mar 20, 2025 | Scene Generation | CodeCode Available | 2 |
| Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens | Mar 20, 2025 | 3D Generation | CodeCode Available | 2 |
| Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning | Mar 20, 2025 | ClassificationFew-Shot Learning | CodeCode Available | 2 |
| Ultra-Resolution Adaptation with Ease | Mar 20, 2025 | 2k4k | CodeCode Available | 2 |
| Single Image Iterative Subject-driven Generation and Editing | Mar 20, 2025 | Image Generation | CodeCode Available | 2 |
| IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes | Mar 20, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 2 |
| Mixture of Lookup Experts | Mar 20, 2025 | Mixture-of-Experts | CodeCode Available | 2 |
| DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables | Mar 20, 2025 | Color Image DenoisingDenoising | CodeCode Available | 2 |
| DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding | Mar 20, 2025 | GPU | CodeCode Available | 2 |
| SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer | Mar 20, 2025 | DecoderMamba | CodeCode Available | 2 |
| Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures | Mar 20, 2025 | DeblurringZero-shot Generalization | CodeCode Available | 2 |
| EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation | Mar 20, 2025 | Optical Flow EstimationVideo Frame Interpolation | CodeCode Available | 2 |
| Tokenize Image as a Set | Mar 20, 2025 | Image Generation | CodeCode Available | 2 |
| WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching | Mar 20, 2025 | Speech Synthesis | CodeCode Available | 2 |
| Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation | Mar 20, 2025 | | CodeCode Available | 2 |
| Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model | Mar 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis | Mar 20, 2025 | Document Layout AnalysisDocument Summarization | CodeCode Available | 2 |
| M3: 3D-Spatial MultiModal Memory | Mar 20, 2025 | Feature Splatting | CodeCode Available | 2 |
| Rapid patient-specific neural networks for intraoperative X-ray to volume registration | Mar 20, 2025 | | CodeCode Available | 2 |
| The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation | Mar 19, 2025 | Change DetectionEarth Observation | CodeCode Available | 2 |
| Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching | Mar 19, 2025 | Image-text matchingText Matching | CodeCode Available | 2 |
| LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning | Mar 19, 2025 | Instruction FollowingMultimodal Reasoning | CodeCode Available | 2 |
| High-Order Control Barrier Functions: Insights and a Truncated Taylor-Based Formulation | Mar 19, 2025 | Collision Avoidance | CodeCode Available | 2 |
| Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology | Mar 19, 2025 | Cross-Modal RetrievalDiagnostic | CodeCode Available | 2 |
| VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning | Mar 19, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis | Mar 19, 2025 | | CodeCode Available | 2 |
| Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels | Mar 18, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing | Mar 18, 2025 | DenoisingMotion Generation | CodeCode Available | 2 |
| LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Mar 18, 2025 | compressed sensingVideo Generation | CodeCode Available | 2 |
| DAPO: An Open-Source LLM Reinforcement Learning System at Scale | Mar 18, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning | Mar 18, 2025 | Autonomous DrivingMotion Planning | CodeCode Available | 2 |
| Where do Large Vision-Language Models Look at when Answering Questions? | Mar 18, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| Advances in 4D Generation: A Survey | Mar 18, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 2 |
| DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal | Mar 18, 2025 | | CodeCode Available | 2 |
| LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection | Mar 18, 2025 | Computational Efficiencyobject-detection | CodeCode Available | 2 |
| Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting | Mar 18, 2025 | Instance SegmentationObject | CodeCode Available | 2 |
| Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models | Mar 18, 2025 | AnatomyAttribute | CodeCode Available | 2 |
| PET-MAD, a universal interatomic potential for advanced materials modeling | Mar 18, 2025 | Diversity | CodeCode Available | 2 |