| NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | May 2, 2024 | modelparameter-efficient fine-tuning | CodeCode Available | 4 |
| OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning | May 2, 2024 | Autonomous Drivingcounterfactual | CodeCode Available | 4 |
| Self-Play Preference Optimization for Language Model Alignment | May 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| RAPIDFlow: Recurrent Adaptable Pyramids with Iterative Decoding for Efficient Optical Flow Estimation | May 1, 2024 | Optical Flow Estimation | CodeCode Available | 4 |
| Visual Mamba: A Survey and New Outlooks | Apr 29, 2024 | MambaSurvey | CodeCode Available | 4 |
| A Survey on Diffusion Models for Time Series and Spatio-Temporal Data | Apr 29, 2024 | Anomaly DetectionImputation | CodeCode Available | 4 |
| Hallucination of Multimodal Large Language Models: A Survey | Apr 29, 2024 | HallucinationSurvey | CodeCode Available | 4 |
| Mamba-FETrack: Frame-Event Tracking via State Space Model | Apr 28, 2024 | GPUMamba | CodeCode Available | 4 |
| MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | Apr 26, 2024 | 2kQuestion Answering | CodeCode Available | 4 |
| PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning | Apr 25, 2024 | Dense CaptioningMVBench | CodeCode Available | 4 |
| Continual Learning of Large Language Models: A Comprehensive Survey | Apr 25, 2024 | Continual LearningSurvey | CodeCode Available | 4 |
| A Survey on Visual Mamba | Apr 24, 2024 | Image RegistrationImage Restoration | CodeCode Available | 4 |
| Autonomous LLM-driven research from data to human-verifiable research papers | Apr 24, 2024 | scientific discovery | CodeCode Available | 4 |
| FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent | Apr 23, 2024 | Novel View SynthesisOptical Flow Estimation | CodeCode Available | 4 |
| SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation | Apr 22, 2024 | Image Generation | CodeCode Available | 4 |
| Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | Apr 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| StyleBooth: Image Style Editing with Multimodal Instruction | Apr 18, 2024 | | CodeCode Available | 4 |
| AgentKit: Structured LLM Reasoning with Dynamic Graphs | Apr 17, 2024 | | CodeCode Available | 4 |
| State Space Model for New-Generation Network Alternative to Transformers: A Survey | Apr 15, 2024 | | CodeCode Available | 4 |
| Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models | Apr 15, 2024 | Image GenerationImage Restoration | CodeCode Available | 4 |
| Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length | Apr 12, 2024 | State Space Models | CodeCode Available | 4 |
| JetMoE: Reaching Llama2 Performance with 0.1M Dollars | Apr 11, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback | Apr 11, 2024 | SSIM | CodeCode Available | 4 |
| RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | Apr 11, 2024 | Language Modelling | CodeCode Available | 4 |
| A Foundation Model for Zero-shot Logical Query Reasoning | Apr 10, 2024 | Complex Query AnsweringKnowledge Graph Completion | CodeCode Available | 4 |
| Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Apr 10, 2024 | Book summarizationLanguage Modeling | CodeCode Available | 4 |
| FLEX: FLEXible Federated Learning Framework | Apr 9, 2024 | Federated Learning | CodeCode Available | 4 |
| Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Apr 9, 2024 | | CodeCode Available | 4 |
| No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation | Apr 5, 2024 | Few-Shot LearningScene Segmentation | CodeCode Available | 4 |
| Sailor: Open Language Models for South-East Asia | Apr 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model | Apr 4, 2024 | 2D Semantic SegmentationAttribute | CodeCode Available | 4 |
| MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | Apr 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| AutoWebGLM: A Large Language Model-based Web Navigating Agent | Apr 4, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 4 |
| The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark | Apr 3, 2024 | EEGMotor Imagery | CodeCode Available | 4 |
| Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization | Apr 2, 2024 | RAGRetrieval | CodeCode Available | 4 |
| CameraCtrl: Enabling Camera Control for Text-to-Video Generation | Apr 2, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 4 |
| A Survey on Large Language Model-Based Game Agents | Apr 2, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 4 |
| SnAG: Scalable and Accurate Video Grounding | Apr 2, 2024 | Video GroundingVideo Understanding | CodeCode Available | 4 |
| PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning | Mar 31, 2024 | Binary ClassificationDeep Learning | CodeCode Available | 4 |
| End-to-End Autonomous Driving through V2X Cooperation | Mar 31, 2024 | Autonomous Driving | CodeCode Available | 4 |
| Croissant: A Metadata Format for ML-Ready Datasets | Mar 28, 2024 | FrictionManagement | CodeCode Available | 4 |
| JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models | Mar 28, 2024 | | CodeCode Available | 4 |
| Tiny Machine Learning: Progress and Futures | Mar 28, 2024 | Deep Learning | CodeCode Available | 4 |
| Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models | Mar 28, 2024 | | CodeCode Available | 4 |
| Long-form factuality in large language models | Mar 27, 2024 | 16kForm | CodeCode Available | 4 |
| BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | Mar 27, 2024 | ArticlesLanguage Modeling | CodeCode Available | 4 |
| TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos | Mar 26, 2024 | 3D Human Pose Estimation | CodeCode Available | 4 |
| Deepfake Generation and Detection: A Benchmark and Survey | Mar 26, 2024 | AttributeFace Generation | CodeCode Available | 4 |
| Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians | Mar 26, 2024 | NeRFNeural Rendering | CodeCode Available | 4 |
| DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing | Mar 26, 2024 | 3D ReconstructionDepth Estimation | CodeCode Available | 4 |