| Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | Apr 3, 2023 | ChatbotLanguage Modeling | CodeCode Available | 4 |
| Video Seal: Open and Efficient Video Watermarking | Dec 12, 2024 | Video CompressionVideo Editing | CodeCode Available | 4 |
| MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization | Aug 5, 2024 | | CodeCode Available | 4 |
| TimeGPT-1 | Oct 5, 2023 | Deep LearningTime Series | CodeCode Available | 4 |
| Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Jul 2, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 4 |
| ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models | Apr 19, 2022 | FairnessFew-Shot Image Classification | CodeCode Available | 4 |
| UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer | Apr 15, 2025 | Image Animation | CodeCode Available | 4 |
| AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving | Feb 22, 2023 | Deep Learning | CodeCode Available | 4 |
| ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback | Apr 11, 2024 | SSIM | CodeCode Available | 4 |
| VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks | May 18, 2023 | DecoderLanguage Modeling | CodeCode Available | 4 |
| Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert | Apr 18, 2023 | Audio GenerationExpressive Speech Synthesis | CodeCode Available | 4 |
| Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | Feb 25, 2024 | Code GenerationHumanEval | CodeCode Available | 4 |
| A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs | Jan 20, 2025 | DiversityImage Generation | CodeCode Available | 4 |
| Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves | Nov 7, 2023 | | CodeCode Available | 4 |
| AnyDoor: Zero-shot Object-level Image Customization | Jul 18, 2023 | ObjectVirtual Try-on | CodeCode Available | 4 |
| GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS | Aug 2, 2024 | GPUNavigate | CodeCode Available | 4 |
| S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving | May 30, 2024 | 3DGS3D Reconstruction | CodeCode Available | 4 |
| Tracking Everything Everywhere All at Once | Jun 8, 2023 | AllMotion Estimation | CodeCode Available | 4 |
| Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion | Jan 27, 2023 | GPUImage Generation | CodeCode Available | 4 |
| sbi reloaded: a toolkit for simulation-based inference workflows | Nov 26, 2024 | Bayesian InferenceDiagnostic | CodeCode Available | 4 |
| Enhancing Chat Language Models by Scaling High-quality Instructional Conversations | May 23, 2023 | Diversity | CodeCode Available | 4 |
| Zero-shot forecasting of chaotic systems | Sep 24, 2024 | AttributeIn-Context Learning | CodeCode Available | 4 |
| One Embedder, Any Task: Instruction-Finetuned Text Embeddings | Dec 19, 2022 | Information RetrievalLearning Word Embeddings | CodeCode Available | 4 |
| JetMoE: Reaching Llama2 Performance with 0.1M Dollars | Apr 11, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models | Jan 31, 2025 | Caption GenerationLanguage Modeling | CodeCode Available | 4 |
| A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective | May 8, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models | Feb 16, 2023 | Image GenerationStyle Transfer | CodeCode Available | 4 |
| Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding | Jun 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators | Mar 23, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 4 |
| HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal | Feb 6, 2024 | Red Teaming | CodeCode Available | 4 |
| MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | May 30, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| Generalizable Humanoid Manipulation with 3D Diffusion Policies | Oct 14, 2024 | Camera CalibrationPoint Cloud Segmentation | CodeCode Available | 4 |
| LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Sep 4, 2024 | Question AnsweringSentence | CodeCode Available | 4 |
| No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Oct 31, 2024 | 3D ReconstructionGeneralizable Novel View Synthesis | CodeCode Available | 4 |
| Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models | Jul 30, 2023 | HallucinationPrompt Engineering | CodeCode Available | 4 |
| Multimodal Chain-of-Thought Reasoning in Language Models | Feb 2, 2023 | HallucinationLanguage Modelling | CodeCode Available | 4 |
| Efficient Automated Deep Learning for Time Series Forecasting | May 11, 2022 | AutoMLBayesian Optimization | CodeCode Available | 4 |
| SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM | Dec 4, 2023 | Camera Pose EstimationNovel View Synthesis | CodeCode Available | 4 |
| Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | Feb 23, 2023 | Code CompletionComputer Security | CodeCode Available | 4 |
| Lean Workbook: A large-scale Lean problem set formalized from natural language math problems | Jun 6, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| GeoCalib: Learning Single-image Calibration with Geometric Optimization | Sep 10, 2024 | 3D geometryVisual Localization | CodeCode Available | 4 |
| ManimML: Communicating Machine Learning Architectures with Animation | Jun 29, 2023 | | CodeCode Available | 4 |
| Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving | Jun 6, 2024 | Autonomous DrivingBench2Drive | CodeCode Available | 4 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 |
| SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models | Feb 13, 2025 | Question AnsweringRAG | CodeCode Available | 4 |
| Reasoning with Language Model is Planning with World Model | May 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Sep 17, 2024 | Conditional Image GenerationDepth Estimation | CodeCode Available | 4 |
| DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks | May 7, 2024 | BinarizationDeblurring | CodeCode Available | 4 |
| PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | Sep 30, 2023 | GPU | CodeCode Available | 4 |
| Flamingo: a Visual Language Model for Few-Shot Learning | Apr 29, 2022 | Few-Shot LearningGenerative Visual Question Answering | CodeCode Available | 4 |