| A Survey on Human Interaction Motion Generation | Mar 17, 2025 | Human DynamicsMotion Generation | CodeCode Available | 3 |
| SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression | Mar 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| A Survey on the Optimization of Large Language Model-based Agents | Mar 16, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 |
| Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering | Mar 14, 2025 | Audio Question AnsweringQuestion Answering | CodeCode Available | 3 |
| Falcon: A Remote Sensing Vision-Language Foundation Model | Mar 14, 2025 | Image Captioningimage-classification | CodeCode Available | 3 |
| Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Mar 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction | Mar 13, 2025 | Autonomous DrivingSurface Reconstruction | CodeCode Available | 3 |
| PyGDA: A Python Library for Graph Domain Adaptation | Mar 13, 2025 | Domain AdaptationGRAPH DOMAIN ADAPTATION | CodeCode Available | 3 |
| GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Mar 13, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 3 |
| MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | Mar 12, 2025 | ChunkingComputational Efficiency | CodeCode Available | 3 |
| RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification | Mar 12, 2025 | Audio Signal RecognitionClassification | CodeCode Available | 3 |
| SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment | Mar 12, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 3 |
| BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes | Mar 11, 2025 | Point Cloud Registration | CodeCode Available | 3 |
| nnInteractive: Redefining 3D Promptable Segmentation | Mar 11, 2025 | BenchmarkingInteractive Segmentation | CodeCode Available | 3 |
| Robust Latent Matters: Boosting Image Generation with Sampling Error | Mar 11, 2025 | BenchmarkingImage Generation | CodeCode Available | 3 |
| DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness | Mar 11, 2025 | Diversity | CodeCode Available | 3 |
| Motion Anything: Any to Motion Generation | Mar 10, 2025 | Motion GenerationMotion Synthesis | CodeCode Available | 3 |
| AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning | Mar 10, 2025 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 3 |
| PE3R: Perception-Efficient 3D Reconstruction | Mar 10, 2025 | 3D ReconstructionZero-shot Generalization | CodeCode Available | 3 |
| Automated Movie Generation via Multi-Agent CoT Planning | Mar 10, 2025 | Video Generation | CodeCode Available | 3 |
| From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers | Mar 10, 2025 | | CodeCode Available | 3 |
| CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution | Mar 10, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 |
| AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP | Mar 9, 2025 | Anomaly DetectionAnomaly Localization | CodeCode Available | 3 |
| Learning and discovering multiple solutions using physics-informed neural networks with random initialization and deep ensemble | Mar 8, 2025 | Uncertainty Quantification | CodeCode Available | 3 |
| GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images | Mar 8, 2025 | cross-modal alignmentDiagnostic | CodeCode Available | 3 |
| Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning | Mar 8, 2025 | Reranking | CodeCode Available | 3 |
| Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs | Mar 8, 2025 | | CodeCode Available | 3 |
| GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving | Mar 7, 2025 | Autonomous DrivingDenoising | CodeCode Available | 3 |
| MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Mar 7, 2025 | Video Generation | CodeCode Available | 3 |
| Simulating the Real World: A Unified Survey of Multimodal Generative Models | Mar 6, 2025 | 3D GenerationSurvey | CodeCode Available | 3 |
| SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing | Mar 6, 2025 | ArticlesSurvey | CodeCode Available | 3 |
| L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | Mar 6, 2025 | | CodeCode Available | 3 |
| EgoLife: Towards Egocentric Life Assistant | Mar 5, 2025 | Question AnsweringVideo Understanding | CodeCode Available | 3 |
| Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems | Mar 5, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| All-atom Diffusion Transformers: Unified generative modelling of molecules and materials | Mar 5, 2025 | AllUnconditional Crystal Generation | CodeCode Available | 3 |
| OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale | Mar 4, 2025 | Text to SQLText-To-SQL | CodeCode Available | 3 |
| Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection | Mar 4, 2025 | Anomaly DetectionMulti-class Anomaly Detection | CodeCode Available | 3 |
| Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation | Mar 4, 2025 | Contact-rich ManipulationImitation Learning | CodeCode Available | 3 |
| A Phylogenetic Approach to Genomic Language Modeling | Mar 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding | Mar 4, 2025 | HumanEvalmbpp | CodeCode Available | 3 |
| Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models | Mar 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures | Mar 3, 2025 | Crack SegmentationMamba | CodeCode Available | 3 |
| MUSt3R: Multi-view Network for Stereo 3D Reconstruction | Mar 3, 2025 | 3D ReconstructionArticles | CodeCode Available | 3 |
| UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Mar 3, 2025 | Instance SegmentationReasoning Segmentation | CodeCode Available | 3 |
| Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs | Mar 3, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training | Mar 3, 2025 | 3DGSGPU | CodeCode Available | 3 |
| Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation | Mar 3, 2025 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization | Mar 3, 2025 | | CodeCode Available | 3 |
| Proteina: Scaling Flow-based Protein Structure Generative Models | Mar 2, 2025 | Protein Design | CodeCode Available | 3 |