| R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | May 5, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 3 |
| A Survey on the Optimization of Large Language Model-based Agents | Mar 16, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| SOAP: Style-Omniscient Animatable Portraits | May 8, 2025 | Image to 3D | CodeCode Available | 3 |
| wgatools: an ultrafast toolkit for manipulating whole genome alignments | Sep 13, 2024 | | CodeCode Available | 3 |
| Detecting Twenty-thousand Classes using Image-level Supervision | Jan 7, 2022 | Cross-Domain Few-Shot Object Detectionimage-classification | CodeCode Available | 3 |
| VidTok: A Versatile and Open-Source Video Tokenizer | Dec 17, 2024 | QuantizationSSIM | CodeCode Available | 3 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 |
| StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On | Dec 4, 2023 | Semantic correspondenceVirtual Try-on | CodeCode Available | 3 |
| A General Framework for Inference-time Scaling and Steering of Diffusion Models | Jan 12, 2025 | Protein Design | CodeCode Available | 3 |
| Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis | Oct 10, 2024 | Feature CompressionImage Generation | CodeCode Available | 3 |
| GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images | Mar 8, 2025 | cross-modal alignmentDiagnostic | CodeCode Available | 3 |
| Unified Source-Free Domain Adaptation | Mar 12, 2024 | Domain AdaptationLanguage Modelling | CodeCode Available | 3 |
| A Python library for efficient computation of molecular fingerprints | Mar 27, 2024 | Drug DiscoveryMolecular Property Prediction | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Feb 5, 2024 | 3D Semantic SegmentationCamera Pose Estimation | CodeCode Available | 3 |
| MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization | Jan 2, 2025 | Contrastive LearningKey Detection | CodeCode Available | 3 |
| Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | Feb 19, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| ROLAND: Graph Learning Framework for Dynamic Graphs | Aug 15, 2022 | Graph LearningGraph Representation Learning | CodeCode Available | 3 |
| DiC: Rethinking Conv3x3 Designs in Diffusion Models | Dec 31, 2024 | Decoder | CodeCode Available | 3 |
| LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Oct 14, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 3 |
| BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | Feb 6, 2024 | BinarizationGPU | CodeCode Available | 3 |
| MotionGPT: Human Motion as a Foreign Language | Jun 26, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly | Oct 3, 2024 | RAG | CodeCode Available | 3 |
| AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation | Mar 26, 2024 | 3D Multi-Person Mesh RecoveryAll | CodeCode Available | 3 |
| Efficient Agent Training for Computer Use | May 20, 2025 | | CodeCode Available | 3 |
| Agent Workflow Memory | Sep 11, 2024 | AI AgentLanguage Modeling | CodeCode Available | 3 |
| LaViDa: A Large Diffusion Language Model for Multimodal Understanding | May 22, 2025 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| Aquila2 Technical Report | Aug 14, 2024 | Management | CodeCode Available | 3 |
| The Flan Collection: Designing Data and Methods for Effective Instruction Tuning | Jan 31, 2023 | | CodeCode Available | 3 |
| DUFOMap: Efficient Dynamic Awareness Mapping | Mar 3, 2024 | Computational Efficiency | CodeCode Available | 3 |
| Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | May 5, 2025 | AI AgentAutomatic Speech Recognition | CodeCode Available | 3 |
| UnMarker: A Universal Attack on Defensive Image Watermarking | May 14, 2024 | DeepFake DetectionDenoising | CodeCode Available | 3 |
| AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models | Oct 3, 2024 | knowledge editingModel Editing | CodeCode Available | 3 |
| PaliGemma 2: A Family of Versatile VLMs for Transfer | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | Jul 26, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks | Aug 15, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing | Apr 30, 2025 | Image Generation | CodeCode Available | 3 |
| StableIdentity: Inserting Anybody into Anywhere at First Sight | Jan 29, 2024 | 3D Generation | CodeCode Available | 3 |
| WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences | Jun 13, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation | Jan 8, 2024 | 3D GenerationText to 3D | CodeCode Available | 3 |
| From Sora What We Can See: A Survey of Text-to-Video Generation | May 17, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Diffusion-TS: Interpretable Diffusion for General Time Series Generation | Mar 4, 2024 | Audio SynthesisDecoder | CodeCode Available | 3 |
| TapeAgents: a Holistic Framework for Agent Development and Optimization | Dec 11, 2024 | | CodeCode Available | 3 |
| MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters | Oct 2, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 3 |
| DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks | Apr 15, 2025 | | CodeCode Available | 3 |
| Adversarial Cheap Talk | Nov 20, 2022 | Meta-LearningReinforcement Learning (RL) | CodeCode Available | 3 |
| Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image | Jun 6, 2024 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 3 |
| EscherNet: A Generative Model for Scalable View Synthesis | Feb 6, 2024 | 3D ReconstructionGPU | CodeCode Available | 3 |
| 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering | Jan 9, 2025 | Image GenerationText to Image Generation | CodeCode Available | 3 |