| Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait | Mar 17, 2025 | Computational EfficiencyDiversity | CodeCode Available | 3 |
| A Survey on the Optimization of Large Language Model-based Agents | Mar 16, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression | Mar 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 |
| Falcon: A Remote Sensing Vision-Language Foundation Model | Mar 14, 2025 | Image Captioningimage-classification | CodeCode Available | 3 |
| Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering | Mar 14, 2025 | Audio Question AnsweringQuestion Answering | CodeCode Available | 3 |
| Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Mar 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction | Mar 13, 2025 | Autonomous DrivingSurface Reconstruction | CodeCode Available | 3 |
| PyGDA: A Python Library for Graph Domain Adaptation | Mar 13, 2025 | Domain AdaptationGRAPH DOMAIN ADAPTATION | CodeCode Available | 3 |
| GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Mar 13, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 3 |
| SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment | Mar 12, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 3 |
| RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification | Mar 12, 2025 | Audio Signal RecognitionClassification | CodeCode Available | 3 |
| MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | Mar 12, 2025 | ChunkingComputational Efficiency | CodeCode Available | 3 |
| BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes | Mar 11, 2025 | Point Cloud Registration | CodeCode Available | 3 |
| nnInteractive: Redefining 3D Promptable Segmentation | Mar 11, 2025 | BenchmarkingInteractive Segmentation | CodeCode Available | 3 |
| DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness | Mar 11, 2025 | Diversity | CodeCode Available | 3 |
| Robust Latent Matters: Boosting Image Generation with Sampling Error | Mar 11, 2025 | BenchmarkingImage Generation | CodeCode Available | 3 |
| AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning | Mar 10, 2025 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 3 |
| From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers | Mar 10, 2025 | | CodeCode Available | 3 |
| PE3R: Perception-Efficient 3D Reconstruction | Mar 10, 2025 | 3D ReconstructionZero-shot Generalization | CodeCode Available | 3 |
| Motion Anything: Any to Motion Generation | Mar 10, 2025 | Motion GenerationMotion Synthesis | CodeCode Available | 3 |
| Automated Movie Generation via Multi-Agent CoT Planning | Mar 10, 2025 | Video Generation | CodeCode Available | 3 |
| CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution | Mar 10, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 |
| AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP | Mar 9, 2025 | Anomaly DetectionAnomaly Localization | CodeCode Available | 3 |
| Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning | Mar 8, 2025 | Reranking | CodeCode Available | 3 |