| ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | Jan 11, 2025 | Drug Discovery | CodeCode Available | 2 |
| TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios | Jan 10, 2025 | Aerial Scene ClassificationCPU | CodeCode Available | 2 |
| Do we actually understand the impact of renewables on electricity prices? A causal inference approach | Jan 10, 2025 | Causal Inference | CodeCode Available | 2 |
| Test-time Alignment of Diffusion Models without Reward Over-optimization | Jan 10, 2025 | Diversity | CodeCode Available | 2 |
| xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement | Jan 10, 2025 | MambaSpeech Enhancement | CodeCode Available | 2 |
| Russian Financial Statements Database: A firm-level collection of the universe of financial statements | Jan 10, 2025 | Imputation | CodeCode Available | 2 |
| VideoRAG: Retrieval-Augmented Generation over Video Corpus | Jan 10, 2025 | RAGResponse Generation | CodeCode Available | 2 |
| AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery | Jan 10, 2025 | | CodeCode Available | 2 |
| FOCUS: Towards Universal Foreground Segmentation | Jan 9, 2025 | Camouflaged Object SegmentationDefocus Blur Detection | CodeCode Available | 2 |
| V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer | Jan 9, 2025 | | CodeCode Available | 2 |
| OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? | Jan 9, 2025 | BenchmarkingVideo Understanding | CodeCode Available | 2 |
| FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching | Jan 9, 2025 | Audio Super-ResolutionComputational Efficiency | CodeCode Available | 2 |
| UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation | Jan 9, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models | Jan 9, 2025 | Cell SegmentationDataset Generation | CodeCode Available | 2 |
| Mechanistic understanding and validation of large AI models with SemanticLens | Jan 9, 2025 | Decision Making | CodeCode Available | 2 |
| ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | Jan 9, 2025 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 2 |
| MambaHSI: Spatial-Spectral Mamba for Hyperspectral Image Classification | Jan 9, 2025 | ClassificationHyperspectral Image Classification | CodeCode Available | 2 |
| OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Jan 8, 2025 | DecoderEmotional Speech Synthesis | CodeCode Available | 2 |
| A Plug-and-Play Bregman ADMM Module for Inferring Event Branches in Temporal Point Processes | Jan 8, 2025 | Point Processes | CodeCode Available | 2 |
| Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems | Jan 8, 2025 | Bayesian InferenceVariational Inference | CodeCode Available | 2 |
| TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training | Jan 8, 2025 | State Space Models | CodeCode Available | 2 |
| MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration | Jan 8, 2025 | DeblurringDenoising | CodeCode Available | 2 |
| URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Jan 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Generative AI for Cel-Animation: A Survey | Jan 8, 2025 | ColorizationLayout Design | CodeCode Available | 2 |
| FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency | Jan 8, 2025 | Novel View SynthesisSurface Reconstruction | CodeCode Available | 2 |
| FrontierNet: Learning Visual Cues to Explore | Jan 8, 2025 | Object Discovery | CodeCode Available | 2 |
| Grokking at the Edge of Numerical Stability | Jan 8, 2025 | | CodeCode Available | 2 |
| LLM4SR: A Survey on Large Language Models for Scientific Research | Jan 8, 2025 | Survey | CodeCode Available | 2 |
| InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | Jan 8, 2025 | | CodeCode Available | 2 |
| Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition | Jan 7, 2025 | Graph LearningNode Classification | CodeCode Available | 2 |
| Realistic Test-Time Adaptation of Vision-Language Models | Jan 7, 2025 | Test-time Adaptation | CodeCode Available | 2 |
| Deep Learning-based Compression Detection for explainable Face Image Quality Assessment | Jan 7, 2025 | Face Image QualityFace Image Quality Assessment | CodeCode Available | 2 |
| MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems | Jan 7, 2025 | RAGRetrieval | CodeCode Available | 2 |
| Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Jan 7, 2025 | DiversityText-to-Video Generation | CodeCode Available | 2 |
| LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Jan 7, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 2 |
| LightGNN: Simple Graph Neural Network for Recommendation | Jan 6, 2025 | Computational EfficiencyGraph Neural Network | CodeCode Available | 2 |
| Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction | Jan 6, 2025 | | CodeCode Available | 2 |
| PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | Jan 6, 2025 | Decision Making | CodeCode Available | 2 |
| Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots | Jan 6, 2025 | Deep Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks | Jan 6, 2025 | DecoderQuantization | CodeCode Available | 2 |
| Revolutionizing Encrypted Traffic Classification with MH-Net: A Multi-View Heterogeneous Graph Model | Jan 5, 2025 | Contrastive LearningTraffic Classification | CodeCode Available | 2 |
| LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models | Jan 5, 2025 | Decision MakingRAG | CodeCode Available | 2 |
| Punch Out Model Synthesis: A Stochastic Algorithm for Constraint Based Tiling Generation | Jan 5, 2025 | | CodeCode Available | 2 |
| Test-time Computing: from System-1 Thinking to System-2 Thinking | Jan 5, 2025 | | CodeCode Available | 2 |
| DepthMaster: Taming Diffusion Models for Monocular Depth Estimation | Jan 5, 2025 | DenoisingDepth Estimation | CodeCode Available | 2 |
| DiffGraph: Heterogeneous Graph Diffusion Model | Jan 4, 2025 | DenoisingGraph Generation | CodeCode Available | 2 |
| Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers | Jan 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph | Jan 4, 2025 | TextVQA | CodeCode Available | 2 |
| Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies | Jan 4, 2025 | Edge-computingKnowledge Distillation | CodeCode Available | 2 |
| GNSS/GPS Spoofing and Jamming Identification Using Machine Learning and Deep Learning | Jan 4, 2025 | Deep Learning | CodeCode Available | 2 |