| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| DF40: Toward Next-Generation Deepfake Detection | Jun 19, 2024 | DeepFake DetectionFace Reenactment | CodeCode Available | 3 |
| Rho-1: Not All Tokens Are What You Need | Apr 11, 2024 | AllContinual Pretraining | CodeCode Available | 3 |
| multiGradICON: A Foundation Model for Multimodal Medical Image Registration | Aug 1, 2024 | AnatomyDeep Learning | CodeCode Available | 3 |
| MANTIS: Interleaved Multi-Image Instruction Tuning | May 2, 2024 | | CodeCode Available | 3 |
| GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization | Jun 24, 2024 | Image ManipulationImage Manipulation Detection | CodeCode Available | 3 |
| HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis | Jun 23, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 3 |
| AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors | Jun 26, 2024 | Diversity | CodeCode Available | 3 |
| YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation | Jul 5, 2024 | Drum TranscriptionDrum Transcription in Music (DTM) | CodeCode Available | 3 |
| EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Jun 28, 2024 | Interactive SegmentationLanguage Modeling | CodeCode Available | 3 |
| Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Jul 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Apr 9, 2025 | 2kDecision Making | CodeCode Available | 3 |
| OneRestore: A Universal Restoration Framework for Composite Degradation | Jul 5, 2024 | Image DehazingImage Restoration | CodeCode Available | 3 |
| WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks | Jul 7, 2024 | Arithmetic Reasoning | CodeCode Available | 3 |
| Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts | Jul 9, 2024 | 3D Object Editing3D Reconstruction | CodeCode Available | 3 |
| Unified Approach for Hedging Impermanent Loss of Liquidity Provision | Jul 6, 2024 | | CodeCode Available | 3 |
| Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation | Jul 10, 2024 | 3D human pose and shape estimation | CodeCode Available | 3 |
| rLLM: Relational Table Learning with LLMs | Jul 29, 2024 | ClassificationNode Classification | CodeCode Available | 3 |
| WildGaussians: 3D Gaussian Splatting in the Wild | Jul 11, 2024 | 3DGS3D Scene Reconstruction | CodeCode Available | 3 |
| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 |
| Scaling Retrieval-Based Language Models with a Trillion-Token Datastore | Jul 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Compact Language Models via Pruning and Knowledge Distillation | Jul 19, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 3 |
| PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis | Aug 2, 2022 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 3 |
| Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection | Jul 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine | Aug 6, 2024 | Medical Visual Question AnsweringOrgan Detection | CodeCode Available | 3 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 |
| NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices | Aug 19, 2024 | Optical Flow Estimation | CodeCode Available | 3 |
| ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Aug 16, 2024 | GPUModel Compression | CodeCode Available | 3 |
| LoopSplat: Loop Closure by Registering 3D Gaussian Splats | Aug 19, 2024 | 3DGSPoint Cloud Registration | CodeCode Available | 3 |
| Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Aug 19, 2024 | Image GenerationVideo Generation | CodeCode Available | 3 |
| AnyGraph: Graph Foundation Model in the Wild | Aug 20, 2024 | Graph LearningMixture-of-Experts | CodeCode Available | 3 |
| LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | Aug 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt | Sep 19, 2024 | 3DGSGPU | CodeCode Available | 3 |
| PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions | Sep 23, 2024 | Image GenerationImage Restoration | CodeCode Available | 3 |
| TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control | Sep 24, 2024 | ClusteringLanguage Modelling | CodeCode Available | 3 |
| Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts | Sep 25, 2024 | CAD ReconstructionText to 3D | CodeCode Available | 3 |
| ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation | Sep 20, 2024 | DescriptiveQuestion Answering | CodeCode Available | 3 |
| Results of the Big ANN: NeurIPS'23 competition | Sep 25, 2024 | Diversity | CodeCode Available | 3 |
| Diffusion Models are Evolutionary Algorithms | Oct 3, 2024 | DenoisingEvolutionary Algorithms | CodeCode Available | 3 |
| ControlAR: Controllable Image Generation with Autoregressive Models | Oct 3, 2024 | Image Generation | CodeCode Available | 3 |
| CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control | Oct 4, 2024 | Motion GenerationReinforcement Learning (RL) | CodeCode Available | 3 |
| DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Oct 17, 2024 | Talking Head GenerationVideo Generation | CodeCode Available | 3 |
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | Oct 23, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 3 |
| ZipNN: Lossless Compression for AI Models | Nov 7, 2024 | Model Compression | CodeCode Available | 3 |
| TEXGen: a Generative Diffusion Model for Mesh Textures | Nov 22, 2024 | modelTexture Synthesis | CodeCode Available | 3 |
| BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence | Nov 22, 2024 | 3D visual groundingVisual Grounding | CodeCode Available | 3 |
| Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | Nov 29, 2024 | Decision MakingRAG | CodeCode Available | 3 |
| TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation | Dec 4, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 3 |
| TryOffAnyone: Tiled Cloth Generation from a Dressed Person | Dec 11, 2024 | Image-to-Image TranslationVirtual Try-Off | CodeCode Available | 3 |
| InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders | Nov 13, 2024 | | CodeCode Available | 3 |