| Retrieval-augmented generation in multilingual settings | Jul 1, 2024 | Prompt EngineeringRAG | CodeCode Available | 3 |
| Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective | Jul 9, 2024 | Information RetrievalRetrieval | CodeCode Available | 3 |
| A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | Jul 11, 2024 | Motion GenerationSurvey | CodeCode Available | 3 |
| LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models | Jul 12, 2024 | Image EnhancementLow-Light Image Enhancement | CodeCode Available | 3 |
| An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases | Jul 15, 2024 | Attributecounterfactual | CodeCode Available | 3 |
| Learning Dynamics of LLM Finetuning | Jul 15, 2024 | Hallucination | CodeCode Available | 3 |
| Reinforcement Learning Meets Visual Odometry | Jul 22, 2024 | Decision Makingreinforcement-learning | CodeCode Available | 3 |
| Comgra: A Tool for Analyzing and Debugging Neural Networks | Jul 31, 2024 | | CodeCode Available | 3 |
| ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models | Jul 31, 2024 | Domain GeneralizationPrompt Learning | CodeCode Available | 3 |
| VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | Aug 12, 2024 | | CodeCode Available | 3 |
| SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners | Aug 29, 2024 | Segmentation | CodeCode Available | 3 |
| VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters | Aug 30, 2024 | Image ReconstructionTime Series | CodeCode Available | 3 |
| Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching | Sep 5, 2024 | | CodeCode Available | 3 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Jun 19, 2024 | Spatial Reasoning | CodeCode Available | 3 |
| Colorful Diffuse Intrinsic Image Decomposition in the Wild | Sep 20, 2024 | Color ConstancyIntrinsic Image Decomposition | CodeCode Available | 3 |
| Generative Modeling of Molecular Dynamics Trajectories | Sep 26, 2024 | | CodeCode Available | 3 |
| SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios | Oct 2, 2024 | Speech EnhancementSpeech Separation | CodeCode Available | 3 |
| Multi-Level Speaker Representation for Target Speaker Extraction | Oct 21, 2024 | Target Speaker Extraction | CodeCode Available | 3 |
| PDL: A Declarative Prompt Programming Language | Oct 24, 2024 | RAG | CodeCode Available | 3 |
| Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders | Oct 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Nov 20, 2024 | Computational EfficiencyPosition | CodeCode Available | 3 |
| OSDFace: One-Step Diffusion Model for Face Restoration | Nov 26, 2024 | Face RecognitionGenerative Adversarial Network | CodeCode Available | 3 |
| CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos | Nov 26, 2024 | Common Sense ReasoningImitation Learning | CodeCode Available | 3 |
| Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking | May 16, 2025 | BenchmarkingManagement | CodeCode Available | 3 |
| Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications | Dec 3, 2024 | BenchmarkingDisaster Response | CodeCode Available | 3 |
| Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Dec 11, 2024 | Pose EstimationVisual Localization | CodeCode Available | 3 |
| Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance | Dec 17, 2024 | Image GenerationObject | CodeCode Available | 3 |
| CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Dec 20, 2024 | 8kGPU | CodeCode Available | 3 |
| UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility | Jan 4, 2025 | | CodeCode Available | 3 |
| LLMs can see and hear without any training | Jan 30, 2025 | Audio captioningImage Generation | CodeCode Available | 3 |
| LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs | Jan 10, 2025 | 4kVisual Reasoning | CodeCode Available | 3 |
| PETR: Position Embedding Transformation for Multi-View 3D Object Detection | Mar 10, 2022 | 3D Object DetectionObject | CodeCode Available | 3 |
| EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models | Aug 14, 2023 | knowledge editing | CodeCode Available | 3 |
| Improved Denoising Diffusion Probabilistic Models | Feb 18, 2021 | DenoisingImage Generation | CodeCode Available | 3 |
| Pareto Front Approximation for Multi-Objective Session-Based Recommender Systems | Jul 23, 2024 | Recommendation Systems | CodeCode Available | 3 |
| Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving | Feb 11, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 3 |
| Stonefish: Supporting Machine Learning Research in Marine Robotics | Feb 17, 2025 | Optical Flow Estimation | CodeCode Available | 3 |
| Soundwave: Less is More for Speech-Text Alignment in LLMs | Feb 18, 2025 | | CodeCode Available | 3 |
| Slamming: Training a Speech Language Model on One GPU in a Day | Feb 19, 2025 | GPULanguage Modeling | CodeCode Available | 3 |
| AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay | Feb 24, 2025 | | CodeCode Available | 3 |
| Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs | Feb 24, 2025 | Computer Security | CodeCode Available | 3 |
| Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| CrossOver: 3D Scene Cross-Modal Alignment | Feb 20, 2025 | cross-modal alignmentObject | CodeCode Available | 3 |
| Harnessing Multiple Large Language Models: A Survey on LLM Ensemble | Feb 25, 2025 | Survey | CodeCode Available | 3 |
| BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction | Feb 26, 2025 | BenchmarkingTime Series | CodeCode Available | 3 |
| GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving | Mar 7, 2025 | Autonomous DrivingDenoising | CodeCode Available | 3 |
| Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering | Mar 14, 2025 | Audio Question AnsweringQuestion Answering | CodeCode Available | 3 |
| Falcon: A Remote Sensing Vision-Language Foundation Model | Mar 14, 2025 | Image Captioningimage-classification | CodeCode Available | 3 |
| A Survey on Latent Reasoning | Jul 8, 2025 | Survey | CodeCode Available | 3 |
| Vision-Speech Models: Teaching Speech Models to Converse about Images | Mar 19, 2025 | parameter-efficient fine-tuning | CodeCode Available | 3 |