| U-REPA: Aligning Diffusion U-Nets to ViTs | Mar 24, 2025 | Image Generation | CodeCode Available | 1 |
| Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Mar 24, 2025 | BenchmarkingHumanitarian | CodeCode Available | 1 |
| Panorama Generation From NFoV Image Done Right | Mar 24, 2025 | distortion correction | CodeCode Available | 1 |
| Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations | Mar 24, 2025 | cross-modal alignmentImage Classification | CodeCode Available | 1 |
| xKV: Cross-Layer SVD for KV-Cache Compression | Mar 24, 2025 | | CodeCode Available | 1 |
| TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos | Mar 24, 2025 | Game State ReconstructionMulti-Object Tracking | CodeCode Available | 1 |
| SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking | Mar 24, 2025 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 1 |
| Efficient Self-Supervised Adaptation for Medical Image Analysis | Mar 24, 2025 | GPUMedical Image Analysis | CodeCode Available | 1 |
| Global-Local Tree Search in VLMs for 3D Indoor Scene Generation | Mar 24, 2025 | Common Sense ReasoningObject | CodeCode Available | 1 |
| LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty | Mar 24, 2025 | Machine UnlearningMemorization | CodeCode Available | 1 |
| SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction | Mar 24, 2025 | Video GenerationVideo Prediction | CodeCode Available | 1 |
| Minimum Volume Conformal Sets for Multivariate Regression | Mar 24, 2025 | Conformal PredictionPrediction | CodeCode Available | 1 |
| Context-Enhanced Memory-Refined Transformer for Online Action Detection | Mar 24, 2025 | Action DetectionDecoder | CodeCode Available | 1 |
| Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models | Mar 24, 2025 | | CodeCode Available | 1 |
| Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models | Mar 24, 2025 | Image GenerationSuper-Resolution | CodeCode Available | 1 |
| Do Your Best and Get Enough Rest for Continual Learning | Mar 24, 2025 | Continual LearningIncremental Learning | CodeCode Available | 1 |
| AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration | Mar 24, 2025 | | CodeCode Available | 1 |
| WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation | Mar 24, 2025 | ArticlesInformativeness | CodeCode Available | 1 |
| Sun-Shine: A Large Language Model for Tibetan Culture | Mar 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization | Mar 24, 2025 | NavigateScheduling | CodeCode Available | 1 |
| AMD-Hummingbird: Towards an Efficient Text-to-Video Model | Mar 24, 2025 | Computational EfficiencyVideo Generation | CodeCode Available | 1 |
| CoMP: Continual Multimodal Pre-training for Vision Foundation Models | Mar 24, 2025 | cross-modal alignment | CodeCode Available | 1 |
| PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model | Mar 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining | Mar 24, 2025 | Rain Removal | CodeCode Available | 1 |
| Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Mar 24, 2025 | DiversityLarge Language Model | CodeCode Available | 1 |
| Bootstrapped Model Predictive Control | Mar 24, 2025 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition | Mar 24, 2025 | | CodeCode Available | 1 |
| Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning | Mar 24, 2025 | Contrastive Learning | CodeCode Available | 1 |
| Equivariant Image Modeling | Mar 24, 2025 | Image GenerationZero-shot Generalization | CodeCode Available | 1 |
| CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI | Mar 24, 2025 | Synthetic Image Detection | CodeCode Available | 1 |
| LookAhead Tuning: Safer Language Models via Partial Answer Previews | Mar 24, 2025 | PositionSafety Alignment | CodeCode Available | 1 |
| Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition | Mar 24, 2025 | Contrastive LearningScene Text Recognition | CodeCode Available | 1 |
| InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment | Mar 24, 2025 | | CodeCode Available | 1 |
| MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning | Mar 24, 2025 | parameter-efficient fine-tuningRepresentation Learning | CodeCode Available | 1 |
| Language Model Uncertainty Quantification with Attention Chain | Mar 24, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 |
| FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images | Mar 24, 2025 | 3D CanonicalizationZero-shot Generalization | CodeCode Available | 1 |
| Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness | Mar 24, 2025 | BenchmarkingSemantic Segmentation | CodeCode Available | 1 |
| TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering | Mar 24, 2025 | Inverse Rendering | CodeCode Available | 1 |
| LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning | Mar 23, 2025 | Continual LearningExemplar-Free | CodeCode Available | 1 |
| SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction | Mar 23, 2025 | Prediction | CodeCode Available | 1 |
| Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities | Mar 23, 2025 | | CodeCode Available | 1 |
| MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps | Mar 23, 2025 | Scene SegmentationVideo Understanding | CodeCode Available | 1 |
| M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving | Mar 23, 2025 | Autonomous DrivingDecoder | CodeCode Available | 1 |
| PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation | Mar 23, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 1 |
| GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks | Mar 23, 2025 | BenchmarkingHallucination | CodeCode Available | 1 |
| PHT-CAD: Efficient CAD Parametric Primitive Analysis with Progressive Hierarchical Tuning | Mar 23, 2025 | ARC | CodeCode Available | 1 |
| HyperNOs: Automated and Parallel Library for Neural Operators Research | Mar 23, 2025 | Hyperparameter Optimization | CodeCode Available | 1 |
| End-to-End Implicit Neural Representations for Classification | Mar 23, 2025 | ClassificationNeRF | CodeCode Available | 1 |
| DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation | Mar 23, 2025 | 3D Face Animation | CodeCode Available | 1 |
| Real-World Remote Sensing Image Dehazing: Benchmark and Baseline | Mar 23, 2025 | Image DehazingSelf-Supervised Learning | CodeCode Available | 1 |