| The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities | Jan 23, 2025 | General KnowledgeInstruction Following | CodeCode Available | 3 |
| HAC++: Towards 100X Compression of 3D Gaussian Splatting | Jan 21, 2025 | 3DGSAttribute | CodeCode Available | 3 |
| VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Jan 21, 2025 | Image GenerationInstruction Following | CodeCode Available | 3 |
| How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks? | Jan 20, 2025 | Computed Tomography (CT)GPU | CodeCode Available | 3 |
| CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Jan 20, 2025 | Video GenerationVirtual Try-on | CodeCode Available | 3 |
| The OpenLAM Challenges | Jan 20, 2025 | valid | CodeCode Available | 3 |
| CoverM: Read alignment statistics for metagenomics | Jan 20, 2025 | Computational Efficiency | CodeCode Available | 3 |
| Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption | Jan 18, 2025 | Infrared And Visible Image Fusion | CodeCode Available | 3 |
| Universal Actions for Enhanced Embodied Foundation Models | Jan 17, 2025 | | CodeCode Available | 3 |
| A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks | Jan 17, 2025 | Survey | CodeCode Available | 3 |
| X-Dyna: Expressive Dynamic Human Image Animation | Jan 17, 2025 | Image Animation | CodeCode Available | 3 |
| Foundations of Large Language Models | Jan 16, 2025 | | CodeCode Available | 3 |
| OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking | Jan 16, 2025 | ArticlesRetrieval-augmented Generation | CodeCode Available | 3 |
| DEFOM-Stereo: Depth Foundation Model Based Stereo Matching | Jan 16, 2025 | Depth EstimationDisparity Estimation | CodeCode Available | 3 |
| Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations | Jan 15, 2025 | | CodeCode Available | 3 |
| FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors | Jan 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Do generative video models understand physical principles? | Jan 14, 2025 | Video Generation | CodeCode Available | 3 |
| In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR | Jan 14, 2025 | Knowledge GraphsLanguage Modeling | CodeCode Available | 3 |
| Lifelong Learning of Large Language Model based Agents: A Roadmap | Jan 13, 2025 | Incremental LearningLanguage Modeling | CodeCode Available | 3 |
| A General Framework for Inference-time Scaling and Steering of Diffusion Models | Jan 12, 2025 | Protein Design | CodeCode Available | 3 |
| ELIZA Reanimated: The world's first chatbot restored on the world's first time sharing system | Jan 12, 2025 | Chatbot | CodeCode Available | 3 |
| LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs | Jan 10, 2025 | 4kVisual Reasoning | CodeCode Available | 3 |
| Valley2: Exploring Multimodal Models with Scalable Vision-Language Design | Jan 10, 2025 | Image CaptioningLanguage Modeling | CodeCode Available | 3 |
| BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response | Jan 10, 2025 | AllBuilding change detection for remote sensing images | CodeCode Available | 3 |
| Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Jan 9, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 3 |
| 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering | Jan 9, 2025 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| RadGPT: Constructing 3D Image-Text Tumor Datasets | Jan 8, 2025 | AI AgentAnatomy | CodeCode Available | 3 |
| GLiREL -- Generalist Model for Zero-Shot Relation Extraction | Jan 6, 2025 | modelnamed-entity-recognition | CodeCode Available | 3 |
| LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases | Jan 6, 2025 | FairnessLanguage Modeling | CodeCode Available | 3 |
| Visual Large Language Models for Generalized and Specialized Applications | Jan 6, 2025 | Ethics | CodeCode Available | 3 |
| The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features | Jan 6, 2025 | Feature EngineeringTime Series | CodeCode Available | 3 |
| Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera | Jan 5, 2025 | Data AugmentationDepth Estimation | CodeCode Available | 3 |
| UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility | Jan 4, 2025 | | CodeCode Available | 3 |
| ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle | Jan 4, 2025 | Pose Estimation | CodeCode Available | 3 |
| Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Jan 3, 2025 | Recommendation SystemsWorld Knowledge | CodeCode Available | 3 |
| JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Jan 3, 2025 | 3D ReconstructionFace Generation | CodeCode Available | 3 |
| CryptoMamba: Leveraging State Space Models for Accurate Bitcoin Price Prediction | Jan 2, 2025 | MambaState Space Models | CodeCode Available | 3 |
| MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization | Jan 2, 2025 | Contrastive LearningKey Detection | CodeCode Available | 3 |
| VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging | Jan 1, 2025 | Interactive SegmentationSegmentation | CodeCode Available | 3 |
| Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization | Jan 1, 2025 | News RetrievalRetrieval | CodeCode Available | 3 |
| Dataset Distillation with Neural Characteristic Function: A Minmax Perspective | Jan 1, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 3 |
| VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM | Dec 31, 2024 | ObjectVideo Understanding | CodeCode Available | 3 |
| DiC: Rethinking Conv3x3 Designs in Diffusion Models | Dec 31, 2024 | Decoder | CodeCode Available | 3 |
| STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes | Dec 31, 2024 | Dynamic ReconstructionScene Flow Estimation | CodeCode Available | 3 |
| Efficiently Serving LLM Reasoning Programs with Certaindex | Dec 30, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 3 |
| SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection | Dec 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation | Dec 30, 2024 | Video GenerationVideo Quality Assessment | CodeCode Available | 3 |
| Towards Visual Grounding: A Survey | Dec 28, 2024 | Phrase GroundingReferring Expression | CodeCode Available | 3 |
| Calibre: Towards Fair and Accurate Personalized Federated Learning with Self-Supervised Learning | Dec 28, 2024 | FairnessFederated Learning | CodeCode Available | 3 |
| DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Dec 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |