| OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation | Dec 12, 2024 | | CodeCode Available | 2 |
| Auto-Regressive Moving Diffusion Models for Time Series Forecasting | Dec 12, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Elevating Flow-Guided Video Inpainting with Reference Generation | Dec 12, 2024 | 2kVideo Inpainting | CodeCode Available | 2 |
| Phi-4 Technical Report | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MaskTerial: A Foundation Model for Automated 2D Material Flake Detection | Dec 12, 2024 | Instance SegmentationSemantic Segmentation | CodeCode Available | 2 |
| MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction | Dec 12, 2024 | 3D ReconstructionMotion Estimation | CodeCode Available | 2 |
| Owl-1: Omni World Model for Consistent Long Video Generation | Dec 12, 2024 | Video Generation | CodeCode Available | 2 |
| Doe-1: Closed-Loop Autonomous Driving with Large World Model | Dec 12, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning | Dec 12, 2024 | Decision Making | CodeCode Available | 2 |
| Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation | Dec 12, 2024 | Image AugmentationImage Generation | CodeCode Available | 2 |
| MPAX: Mathematical Programming in JAX | Dec 12, 2024 | | CodeCode Available | 2 |
| Foundational Large Language Models for Materials Research | Dec 12, 2024 | Domain AdaptationModel Selection | CodeCode Available | 2 |
| DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving | Dec 12, 2024 | 4D reconstructionAutonomous Driving | CodeCode Available | 2 |
| Diffusion Predictive Control with Constraints | Dec 12, 2024 | Denoising | CodeCode Available | 2 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GPD-1: Generative Pre-training for Driving | Dec 11, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming | Dec 11, 2024 | Text to 3DText-to-Image Generation | CodeCode Available | 2 |
| Predicting Human Brain States with Transformer | Dec 11, 2024 | Language ModellingMusic Generation | CodeCode Available | 2 |
| ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement | Dec 11, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation | Dec 11, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion | Dec 11, 2024 | | CodeCode Available | 2 |
| SAFIRE: Segment Any Forged Image Region | Dec 11, 2024 | | CodeCode Available | 2 |
| LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Dec 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek | Dec 11, 2024 | Dependency ParsingMorphological Tagging | CodeCode Available | 2 |
| SegFace: Face Segmentation of Long-Tail Classes | Dec 11, 2024 | Face ParsingFace Swapping | CodeCode Available | 2 |
| Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting | Dec 11, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data | Dec 10, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 2 |
| MAGE: A Multi-Agent Engine for Automated RTL Code Generation | Dec 10, 2024 | Code GenerationNavigate | CodeCode Available | 2 |
| Video Motion Transfer with Diffusion Transformers | Dec 10, 2024 | Denoising | CodeCode Available | 2 |
| Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly | Dec 10, 2024 | | CodeCode Available | 2 |
| FlashRNN: Optimizing Traditional RNNs on Modern Hardware | Dec 10, 2024 | GPULogical Reasoning | CodeCode Available | 2 |
| Maya: An Instruction Finetuned Multilingual Multimodal Model | Dec 10, 2024 | model | CodeCode Available | 2 |
| Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | Dec 10, 2024 | Video Generation | CodeCode Available | 2 |
| BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Dec 10, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 2 |
| Granite Guardian | Dec 10, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| Pix2Poly: A Sequence Prediction Method for End-to-end Polygonal Building Footprint Extraction from Remote Sensing Imagery | Dec 10, 2024 | DecoderExtracting Buildings In Remote Sensing Images | CodeCode Available | 2 |
| DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Dec 10, 2024 | AllAutonomous Driving | CodeCode Available | 2 |
| From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos | Dec 10, 2024 | 3D ReconstructionNovel View Synthesis | CodeCode Available | 2 |
| Bridging the Divide: Reconsidering Softmax and Linear Attention | Dec 9, 2024 | | CodeCode Available | 2 |
| Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels | Dec 9, 2024 | All | CodeCode Available | 2 |
| How to Merge Your Multimodal Models Over Time? | Dec 9, 2024 | | CodeCode Available | 2 |
| Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation | Dec 9, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis | Dec 9, 2024 | Gesture GenerationRAG | CodeCode Available | 2 |
| Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video | Dec 9, 2024 | 3DGS4D reconstruction | CodeCode Available | 2 |
| Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | Dec 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity | Dec 9, 2024 | Anomaly Detectiontext annotation | CodeCode Available | 2 |
| ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks | Dec 9, 2024 | GPUImitation Learning | CodeCode Available | 2 |
| Splatter-360: Generalizable 360^ Gaussian Splatting for Wide-baseline Panoramic Images | Dec 9, 2024 | 3DGSNeRF | CodeCode Available | 2 |
| MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization | Dec 9, 2024 | Visual Question Answering (VQA) | CodeCode Available | 2 |
| ProcessBench: Identifying Process Errors in Mathematical Reasoning | Dec 9, 2024 | GSM8KMath | CodeCode Available | 2 |