| WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models | Jan 25, 2024 | | CodeCode Available | 5 |
| SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation | Jan 24, 2024 | text-to-speechText to Speech | CodeCode Available | 5 |
| Differentiable Tree Search Network | Jan 22, 2024 | Decision MakingInductive Bias | CodeCode Available | 5 |
| Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | Jan 22, 2024 | Diffusion Personalization Tuning FreeImage Generation | CodeCode Available | 5 |
| Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Jan 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 5 |
| OMG-Seg: Is One Model Good Enough For All Segmentation? | Jan 18, 2024 | AllDecoder | CodeCode Available | 5 |
| Scalable Pre-training of Large Autoregressive Image Models | Jan 16, 2024 | Image Classification | CodeCode Available | 5 |
| SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | Jan 16, 2024 | Image Generation | CodeCode Available | 5 |
| Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis | Jan 16, 2024 | 3D ReconstructionFace Generation | CodeCode Available | 5 |
| Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding | Jan 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Secrets of RLHF in Large Language Models Part II: Reward Modeling | Jan 11, 2024 | Contrastive LearningMeta-Learning | CodeCode Available | 5 |
| DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | Jan 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 |
| Extreme Compression of Large Language Models via Additive Quantization | Jan 11, 2024 | CPUGPU | CodeCode Available | 5 |
| Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Jan 10, 2024 | Task Planning | CodeCode Available | 5 |
| Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects | Jan 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions | Jan 7, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 5 |
| Latte: Latent Diffusion Transformer for Video Generation | Jan 5, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | Jan 5, 2024 | image-classificationImage Classification | CodeCode Available | 5 |
| Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting | Jan 2, 2024 | Autonomous DrivingNeRF | CodeCode Available | 5 |
| A Comprehensive Study of Knowledge Editing for Large Language Models | Jan 2, 2024 | knowledge editingModel Editing | CodeCode Available | 5 |
| Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | Jan 2, 2024 | | CodeCode Available | 5 |
| UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition | Jan 1, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 5 |
| Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models | Jan 1, 2024 | Code Generationparameter-efficient fine-tuning | CodeCode Available | 5 |
| Point Transformer V3: Simpler Faster Stronger | Jan 1, 2024 | Representation Learning | CodeCode Available | 5 |
| VGGSfM: Visual Geometry Grounded Deep Structure From Motion | Jan 1, 2024 | Camera CalibrationPoint Tracking | CodeCode Available | 5 |
| Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling | Jan 1, 2024 | NeRF | CodeCode Available | 5 |
| GenCast: Diffusion-based ensemble forecasting for medium-range weather | Dec 25, 2023 | Decision MakingWeather Forecasting | CodeCode Available | 5 |
| DUSt3R: Geometric 3D Vision Made Easy | Dec 21, 2023 | 3D ReconstructionCamera Calibration | CodeCode Available | 5 |
| AppAgent: Multimodal Agents as Smartphone Users | Dec 21, 2023 | Navigate | CodeCode Available | 5 |
| StarVector: Generating Scalable Vector Graphics Code from Images and Text | Dec 17, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 5 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 |
| MobileSAMv2: Faster Segment Anything to Everything | Dec 15, 2023 | DecoderKnowledge Distillation | CodeCode Available | 5 |
| CogAgent: A Visual Language Model for GUI Agents | Dec 14, 2023 | Language Modeling | CodeCode Available | 5 |
| Weakly Supervised Detection of Hallucinations in LLM Activations | Dec 5, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |
| TaskWeaver: A Code-First Agent Framework | Nov 29, 2023 | Natural Language Understanding | CodeCode Available | 5 |
| Human Gaussian Splatting: Real-time Rendering of Animatable Avatars | Nov 28, 2023 | | CodeCode Available | 5 |
| Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | Nov 28, 2023 | Electrical EngineeringExperimental Design | CodeCode Available | 5 |
| Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following | Nov 28, 2023 | AttributeDenoising | CodeCode Available | 5 |
| MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI | Nov 27, 2023 | Complex Query AnsweringLogical Reasoning | CodeCode Available | 5 |
| Structure-Aware Sparse-View X-ray 3D Reconstruction | Nov 18, 2023 | 3D ReconstructionCT Reconstruction | CodeCode Available | 5 |
| Instruction-Following Evaluation for Large Language Models | Nov 14, 2023 | Instruction Following | CodeCode Available | 5 |
| LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models | Nov 8, 2023 | 8kGPU | CodeCode Available | 5 |
| CogVLM: Visual Expert for Pretrained Language Models | Nov 6, 2023 | 1 Image, 2*2 StitchingFS-MEVQA | CodeCode Available | 5 |
| VideoCrafter1: Open Diffusion Models for High-Quality Video Generation | Oct 30, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| Zephyr: Direct Distillation of LM Alignment | Oct 25, 2023 | 2D Cyclist DetectionFew-Shot Learning | CodeCode Available | 5 |
| MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning | Oct 24, 2023 | | CodeCode Available | 5 |
| Wonder3D: Single Image to 3D using Cross-Domain Diffusion | Oct 23, 2023 | 3D geometryImage to 3D | CodeCode Available | 5 |
| NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails | Oct 16, 2023 | Dialogue ManagementManagement | CodeCode Available | 5 |
| CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving | Oct 11, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | Oct 11, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |