| Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Sep 25, 2024 | GPUToken Reduction | CodeCode Available | 2 |
| Occupancy-Based Dual Contouring | Sep 20, 2024 | 3D ReconstructionGPU | CodeCode Available | 2 |
| Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization | Sep 19, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Sep 16, 2024 | CPUGPU | CodeCode Available | 2 |
| Super Monotonic Alignment Search | Sep 12, 2024 | CPUGPU | CodeCode Available | 2 |
| Enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications | Sep 2, 2024 | CPUFederated Learning | CodeCode Available | 2 |
| Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation | Sep 2, 2024 | GPU | CodeCode Available | 2 |
| LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models | Aug 31, 2024 | 8kGPU | CodeCode Available | 2 |
| MemLong: Memory-Augmented Retrieval for Long Text Modeling | Aug 30, 2024 | 4kDecoder | CodeCode Available | 2 |
| deepmriprep: Voxel-based Morphometry (VBM) Preprocessing via Deep Neural Networks | Aug 20, 2024 | GPUImage Registration | CodeCode Available | 2 |
| Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters | Aug 7, 2024 | GPU | CodeCode Available | 2 |
| Palu: Compressing KV-Cache with Low-Rank Projection | Jul 30, 2024 | GPUQuantization | CodeCode Available | 2 |
| HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | Jul 26, 2024 | Depth EstimationGPU | CodeCode Available | 2 |
| ESOD: Efficient Small Object Detection on High-Resolution Images | Jul 23, 2024 | GPUObject | CodeCode Available | 2 |
| Forecasting GPU Performance for Deep Learning Training and Inference | Jul 18, 2024 | Deep LearningGPU | CodeCode Available | 2 |
| Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Jul 17, 2024 | GPULAMBADA | CodeCode Available | 2 |
| Differentiable Voxelization and Mesh Morphing | Jul 15, 2024 | GPU | CodeCode Available | 2 |
| From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients | Jul 15, 2024 | GPU | CodeCode Available | 2 |
| Gradient Boosting Reinforcement Learning | Jul 11, 2024 | GPUreinforcement-learning | CodeCode Available | 2 |
| MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis | Jul 10, 2024 | GPUImage Generation | CodeCode Available | 2 |
| P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jul 7, 2024 | 3D Single Object TrackingGPU | CodeCode Available | 2 |
| MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | Jun 21, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation | Jun 21, 2024 | 3D GenerationGPU | CodeCode Available | 2 |
| Duoduo CLIP: Efficient 3D Understanding with Multi-View Images | Jun 17, 2024 | GPUObject | CodeCode Available | 2 |
| GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion | Jun 14, 2024 | 3D GenerationGPU | CodeCode Available | 2 |
| Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Jun 13, 2024 | BenchmarkingGPU | CodeCode Available | 2 |
| Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models | Jun 11, 2024 | DiversityGPU | CodeCode Available | 2 |
| Low-Rank Quantization-Aware Training for LLMs | Jun 10, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| Spectrum: Targeted Training on Signal to Noise Ratio | Jun 7, 2024 | GPU | CodeCode Available | 2 |
| Latent Neural Operator for Solving Forward and Inverse PDE Problems | Jun 6, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control | Jun 4, 2024 | Bandwidth ExtensionCPU | CodeCode Available | 2 |
| SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM | Jun 3, 2024 | DecoderGPU | CodeCode Available | 2 |
| ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation | Jun 3, 2024 | GPUVideo Generation | CodeCode Available | 2 |
| Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow | Jun 3, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations | May 28, 2024 | GPU | CodeCode Available | 2 |
| ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention | May 28, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | May 28, 2024 | GPUText Generation | CodeCode Available | 2 |
| DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention | May 28, 2024 | GPUMamba | CodeCode Available | 2 |
| LoQT: Low-Rank Adapters for Quantized Pretraining | May 26, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions | May 22, 2024 | Data ValuationGPU | CodeCode Available | 2 |
| SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | May 20, 2024 | Audio ClassificationGPU | CodeCode Available | 2 |
| Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | May 19, 2024 | 6D Pose EstimationGPU | CodeCode Available | 2 |
| MAMCA -- Optimal on Accuracy and Efficiency for Automatic Modulation Classification with Extended Signal Length | May 18, 2024 | DenoisingGPU | CodeCode Available | 2 |
| Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model | May 15, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| Preble: Efficient Distributed Prompt Scheduling for LLM Serving | May 8, 2024 | GPUScheduling | CodeCode Available | 2 |
| FeNNol: an Efficient and Flexible Library for Building Force-field-enhanced Neural Network Potentials | May 2, 2024 | GPU | CodeCode Available | 2 |
| MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors | May 2, 2024 | 3D Object Captioning3D Object Classification | CodeCode Available | 2 |
| MicroDreamer: Efficient 3D Generation in 20 Seconds by Score-based Iterative Reconstruction | Apr 30, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 2 |
| HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis | Apr 29, 2024 | CPUEdge-computing | CodeCode Available | 2 |
| Partial Large Kernel CNNs for Efficient Super-Resolution | Apr 18, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 |