| ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | Jan 11, 2025 | Drug Discovery | CodeCode Available | 2 |
| Test-time Alignment of Diffusion Models without Reward Over-optimization | Jan 10, 2025 | Diversity | CodeCode Available | 2 |
| VideoRAG: Retrieval-Augmented Generation over Video Corpus | Jan 10, 2025 | RAGResponse Generation | CodeCode Available | 2 |
| AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery | Jan 10, 2025 | | CodeCode Available | 2 |
| xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement | Jan 10, 2025 | MambaSpeech Enhancement | CodeCode Available | 2 |
| TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios | Jan 10, 2025 | Aerial Scene ClassificationCPU | CodeCode Available | 2 |
| Russian Financial Statements Database: A firm-level collection of the universe of financial statements | Jan 10, 2025 | Imputation | CodeCode Available | 2 |
| Do we actually understand the impact of renewables on electricity prices? A causal inference approach | Jan 10, 2025 | Causal Inference | CodeCode Available | 2 |
| ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | Jan 9, 2025 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 2 |
| Mechanistic understanding and validation of large AI models with SemanticLens | Jan 9, 2025 | Decision Making | CodeCode Available | 2 |
| FOCUS: Towards Universal Foreground Segmentation | Jan 9, 2025 | Camouflaged Object SegmentationDefocus Blur Detection | CodeCode Available | 2 |
| UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation | Jan 9, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models | Jan 9, 2025 | Cell SegmentationDataset Generation | CodeCode Available | 2 |
| MambaHSI: Spatial-Spectral Mamba for Hyperspectral Image Classification | Jan 9, 2025 | ClassificationHyperspectral Image Classification | CodeCode Available | 2 |
| V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer | Jan 9, 2025 | | CodeCode Available | 2 |
| FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching | Jan 9, 2025 | Audio Super-ResolutionComputational Efficiency | CodeCode Available | 2 |
| OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? | Jan 9, 2025 | BenchmarkingVideo Understanding | CodeCode Available | 2 |
| Generative AI for Cel-Animation: A Survey | Jan 8, 2025 | ColorizationLayout Design | CodeCode Available | 2 |
| LLM4SR: A Survey on Large Language Models for Scientific Research | Jan 8, 2025 | Survey | CodeCode Available | 2 |
| InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | Jan 8, 2025 | | CodeCode Available | 2 |
| Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems | Jan 8, 2025 | Bayesian InferenceVariational Inference | CodeCode Available | 2 |
| FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency | Jan 8, 2025 | Novel View SynthesisSurface Reconstruction | CodeCode Available | 2 |
| OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Jan 8, 2025 | DecoderEmotional Speech Synthesis | CodeCode Available | 2 |
| URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Jan 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| A Plug-and-Play Bregman ADMM Module for Inferring Event Branches in Temporal Point Processes | Jan 8, 2025 | Point Processes | CodeCode Available | 2 |