| The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis | Feb 13, 2025 | Safety Alignment | CodeCode Available | 3 |
| MetaDE: Evolving Differential Evolution by Differential Evolution | Feb 13, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| MDCrow: Automating Molecular Dynamics Workflows with Large Language Models | Feb 13, 2025 | | CodeCode Available | 3 |
| Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning | Feb 12, 2025 | RAGText to SQL | CodeCode Available | 3 |
| Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Feb 12, 2025 | cross-modal alignmentmultimodal generation | CodeCode Available | 3 |
| FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents | Feb 11, 2025 | | CodeCode Available | 3 |
| GENERator: A Long-Context Generative Genomic Foundation Model | Feb 11, 2025 | model | CodeCode Available | 3 |
| Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving | Feb 11, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 3 |
| EVEv2: Improved Baselines for Encoder-Free Vision-Language Models | Feb 10, 2025 | Decoder | CodeCode Available | 3 |
| Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Feb 10, 2025 | Math | CodeCode Available | 3 |
| History-Guided Video Diffusion | Feb 10, 2025 | Video Generation | CodeCode Available | 3 |
| PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map | Feb 9, 2025 | | CodeCode Available | 3 |
| Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding | Feb 9, 2025 | Image CaptioningImage-text Retrieval | CodeCode Available | 3 |
| ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy | Feb 8, 2025 | Q-LearningSafe Exploration | CodeCode Available | 3 |
| FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Feb 7, 2025 | Computational EfficiencyText-to-Video Generation | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks | Feb 7, 2025 | Benchmarking | CodeCode Available | 3 |
| VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Feb 7, 2025 | HallucinationPosition | CodeCode Available | 3 |
| MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot | Feb 6, 2025 | DiagnosticLarge Language Model | CodeCode Available | 3 |
| ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | Feb 6, 2025 | Image SegmentationSegmentation | CodeCode Available | 3 |
| Ola: Pushing the Frontiers of Omni-Modal Language Model | Feb 6, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 3 |
| Multi-agent Architecture Search via Agentic Supernet | Feb 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Demystifying Long Chain-of-Thought Reasoning in LLMs | Feb 5, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation | Feb 4, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 |
| ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization | Feb 4, 2025 | Quantization | CodeCode Available | 3 |
| Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries | Feb 4, 2025 | GPU | CodeCode Available | 3 |
| Flow Q-Learning | Feb 4, 2025 | Action GenerationD4RL | CodeCode Available | 3 |
| mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition | Feb 3, 2025 | Audio-Visual Speech RecognitionDecoder | CodeCode Available | 3 |
| GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation | Feb 3, 2025 | Graph Neural NetworkKnowledge Graphs | CodeCode Available | 3 |
| Safety at Scale: A Comprehensive Survey of Large Model Safety | Feb 2, 2025 | Autonomous DrivingData Poisoning | CodeCode Available | 3 |
| Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective | Feb 2, 2025 | Multi-Task Learning | CodeCode Available | 3 |
| OneForecast: A Universal Framework for Global and Regional Weather Forecasting | Feb 1, 2025 | Weather Forecasting | CodeCode Available | 3 |
| MambaGlue: Fast and Robust Local Feature Matching With Mamba | Feb 1, 2025 | Mamba | CodeCode Available | 3 |
| M+: Extending MemoryLLM with Scalable Long-Term Memory | Feb 1, 2025 | 16kGPU | CodeCode Available | 3 |
| Rethinking Early Stopping: Refine, Then Calibrate | Jan 31, 2025 | Decision Making | CodeCode Available | 3 |
| Test-Time Training Scaling Laws for Chemical Exploration in Drug Design | Jan 31, 2025 | Drug DesignDrug Discovery | CodeCode Available | 3 |
| Partially Rewriting a Transformer in Natural Language | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Decoding-based Regression | Jan 31, 2025 | Density Estimationregression | CodeCode Available | 3 |
| Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models | Jan 30, 2025 | Action RecognitionDomain Adaptation | CodeCode Available | 3 |
| LLMs can see and hear without any training | Jan 30, 2025 | Audio captioningImage Generation | CodeCode Available | 3 |
| Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation | Jan 29, 2025 | | CodeCode Available | 3 |
| Molecular Fingerprints Are Strong Models for Peptide Function Prediction | Jan 29, 2025 | Graph ClassificationGraph Regression | CodeCode Available | 3 |
| Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting | Jan 28, 2025 | SpecificityTime Series | CodeCode Available | 3 |
| DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | Jan 28, 2025 | 3D Generation | CodeCode Available | 3 |
| Deformable Beta Splatting | Jan 27, 2025 | 3DGSNovel View Synthesis | CodeCode Available | 3 |
| Parametric Retrieval Augmented Generation | Jan 27, 2025 | Domain AdaptationRAG | CodeCode Available | 3 |
| MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents | Jan 24, 2025 | Benchmarking | CodeCode Available | 3 |
| HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Jan 24, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 |
| OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia | Jan 23, 2025 | Emotion RecognitionEvent Detection | CodeCode Available | 3 |
| The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities | Jan 23, 2025 | General KnowledgeInstruction Following | CodeCode Available | 3 |