| Universal Narrative Model: an Author-centric Storytelling Framework for Generative AI | Mar 5, 2025 | | CodeCode Available | 2 |
| MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | Mar 5, 2025 | | CodeCode Available | 2 |
| BANet: Bilateral Aggregation Network for Mobile Stereo Matching | Mar 5, 2025 | Stereo Matching | CodeCode Available | 2 |
| BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving | Mar 5, 2025 | Autonomous DrivingMotion Planning | CodeCode Available | 2 |
| Golden Cudgel Network for Real-Time Semantic Segmentation | Mar 5, 2025 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 |
| BHViT: Binarized Hybrid Vision Transformer | Mar 4, 2025 | BinarizationQuantization | CodeCode Available | 2 |
| WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation | Mar 4, 2025 | Hallucination | CodeCode Available | 2 |
| ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish | Mar 4, 2025 | Activity PredictionMultivariate Time Series Forecasting | CodeCode Available | 2 |
| Technique Inference Engine: A Recommender Model to Support Cyber Threat Hunting | Mar 4, 2025 | | CodeCode Available | 2 |
| MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments | Mar 4, 2025 | 2D Panoptic SegmentationGraph Generation | CodeCode Available | 2 |
| MPO: Boosting LLM Agents with Meta Plan Optimization | Mar 4, 2025 | | CodeCode Available | 2 |
| DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models | Mar 4, 2025 | DiversityGPU | CodeCode Available | 2 |
| LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications | Mar 4, 2025 | Action Generation | CodeCode Available | 2 |
| h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform | Mar 4, 2025 | | CodeCode Available | 2 |
| Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs | Mar 4, 2025 | | CodeCode Available | 2 |
| Composed Multi-modal Retrieval: A Survey of Approaches and Applications | Mar 3, 2025 | Cross-Modal RetrievalData Augmentation | CodeCode Available | 2 |
| AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning | Mar 3, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution | Mar 3, 2025 | Autonomous DrivingImage Super-Resolution | CodeCode Available | 2 |
| An Approach for Air Drawing Using Background Subtraction and Contour Extraction | Mar 3, 2025 | Hand DetectionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning | Mar 3, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Large-Scale Data Selection for Instruction Tuning | Mar 3, 2025 | | CodeCode Available | 2 |
| Interactive Debugging and Steering of Multi-Agent AI Systems | Mar 3, 2025 | AI Agent | CodeCode Available | 2 |
| Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG | Mar 3, 2025 | RAGRetrieval | CodeCode Available | 2 |
| Forgetting Transformer: Softmax Attention with a Forget Gate | Mar 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Liger: Linearizing Large Language Models to Gated Recurrent Structures | Mar 3, 2025 | | CodeCode Available | 2 |
| Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation | Mar 3, 2025 | Representation LearningRetrieval | CodeCode Available | 2 |
| Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator | Mar 3, 2025 | Image Generation | CodeCode Available | 2 |
| FlowDec: A flow-based full-band general audio codec with high perceptual quality | Mar 3, 2025 | FAD | CodeCode Available | 2 |
| MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism | Mar 3, 2025 | Object Detection | CodeCode Available | 2 |
| Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas | Mar 3, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization based on CFD | Mar 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking | Mar 2, 2025 | Fact CheckingFact Verification | CodeCode Available | 2 |
| Patch-wise Structural Loss for Time Series Forecasting | Mar 2, 2025 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization | Mar 2, 2025 | Cross-Modal Person Re-IdentificationPerson Re-Identification | CodeCode Available | 2 |
| Predictive Data Selection: The Data That Predicts Is the Data That Teaches | Mar 2, 2025 | | CodeCode Available | 2 |
| Geodesic Diffusion Models for Medical Image-to-Image Generation | Mar 2, 2025 | DenoisingImage Denoising | CodeCode Available | 2 |
| Streaming Video Question-Answering with In-context Video KV-Cache Retrieval | Mar 1, 2025 | GPUQuestion Answering | CodeCode Available | 2 |
| LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement | Mar 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning | Mar 1, 2025 | Scene Understanding | CodeCode Available | 2 |
| Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and Quality | Mar 1, 2025 | Image EnhancementImage Generation | CodeCode Available | 2 |
| UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search | Mar 1, 2025 | Neural Architecture SearchSpeech Enhancement | CodeCode Available | 2 |
| Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions | Mar 1, 2025 | Information RetrievalRAG | CodeCode Available | 2 |
| PodAgent: A Comprehensive Framework for Podcast Generation | Mar 1, 2025 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 |
| Adaptive Rectangular Convolution for Remote Sensing Pansharpening | Mar 1, 2025 | Pansharpening | CodeCode Available | 2 |
| What Makes a Good Diffusion Planner for Decision Making? | Mar 1, 2025 | Action GenerationDecision Making | CodeCode Available | 2 |
| Remasking Discrete Diffusion Models with Inference-Time Scaling | Mar 1, 2025 | | CodeCode Available | 2 |
| BodyGen: Advancing Towards Efficient Embodiment Co-Design | Mar 1, 2025 | | CodeCode Available | 2 |
| UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection | Feb 28, 2025 | Anomaly DetectionImage Classification | CodeCode Available | 2 |
| SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models | Feb 28, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| Neural Posterior Estimation for Cataloging Astronomical Images with Spatially Varying Backgrounds and Point Spread Functions | Feb 28, 2025 | Variational Inference | CodeCode Available | 2 |