| Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference | Dec 6, 2024 | GPULanguage Modeling | —Unverified | 0 |
| GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments | Dec 6, 2024 | GPU | —Unverified | 0 |
| SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization | Dec 5, 2024 | ClusteringGPU | —Unverified | 0 |
| Assessing and Learning Alignment of Unimodal Vision and Language Models | Dec 5, 2024 | GPUSemantic Segmentation | —Unverified | 0 |
| CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning | Dec 4, 2024 | GPURepresentation Learning | —Unverified | 0 |
| Unifying KV Cache Compression for Large Language Models with LeanKV | Dec 4, 2024 | GPUQuantization | —Unverified | 0 |
| Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning | Dec 4, 2024 | GPU | —Unverified | 0 |
| FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness | Dec 4, 2024 | GPUQuantization | —Unverified | 0 |
| Can't Slow me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices | Dec 3, 2024 | Autonomous DrivingGPU | —Unverified | 0 |
| SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Dec 3, 2024 | GPUImage Segmentation | CodeCode Available | 0 |
| MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Dec 2, 2024 | Animal Pose EstimationGPU | —Unverified | 0 |
| Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification | Dec 2, 2024 | GPUQuantization | —Unverified | 0 |
| Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control | Dec 2, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| Improving feature interactions at Pinterest under industry constraints | Dec 2, 2024 | GPURecommendation Systems | —Unverified | 0 |
| SPILDL: A Scalable and Parallel Inductive Learner in Description Logic | Dec 1, 2024 | CPUGPU | —Unverified | 0 |
| HT-HEDL: High-Throughput Hypothesis Evaluation in Description Logic | Dec 1, 2024 | CPUGPU | —Unverified | 0 |
| BlendPCR: Seamless and Efficient Rendering of Dynamic Point Clouds captured by Multiple RGB-D Cameras | Dec 1, 2024 | GPUNeRF | CodeCode Available | 0 |
| PAL -- Parallel active learning for machine-learned potentials | Nov 30, 2024 | Active LearningCPU | CodeCode Available | 0 |
| Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing | Nov 29, 2024 | AllForm | —Unverified | 0 |
| Open source Differentiable ODE Solving Infrastructure | Nov 29, 2024 | GPU | —Unverified | 0 |
| BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching | Nov 29, 2024 | GPUManagement | —Unverified | 0 |
| A Simple Sparse Matrix Vector Multiplication Approach to Padded Convolution | Nov 29, 2024 | CPUGPU | CodeCode Available | 0 |
| Puzzle: Distillation-Based NAS for Inference-Optimized LLMs | Nov 28, 2024 | GPUKnowledge Distillation | —Unverified | 0 |
| An Integrated Artificial Intelligence Operating System for Advanced Low-Altitude Aviation Applications | Nov 28, 2024 | Computational EfficiencyCPU | —Unverified | 0 |
| PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers | Nov 28, 2024 | GPU | —Unverified | 0 |
| Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads | Nov 28, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Differentiable Topology Estimating from Curvatures for 3D Shapes | Nov 28, 2024 | GPUTopological Data Analysis | —Unverified | 0 |
| Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach | Nov 28, 2024 | GPU | —Unverified | 0 |
| Towards Chunk-Wise Generation for Long Videos | Nov 27, 2024 | DenoisingGPU | —Unverified | 0 |
| A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs | Nov 27, 2024 | Computational EfficiencyCPU | CodeCode Available | 0 |
| k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation | Nov 26, 2024 | CPUGPU | CodeCode Available | 0 |
| Automatic Skull Reconstruction by Deep Learnable Symmetry Enforcement | Nov 26, 2024 | GPU | —Unverified | 0 |
| A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training | Nov 26, 2024 | Federated LearningGPU | —Unverified | 0 |
| Knowledge-aware Evolutionary Graph Neural Architecture Search | Nov 26, 2024 | GPUGraph Neural Network | CodeCode Available | 0 |
| A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference | Nov 25, 2024 | CPUGPU | —Unverified | 0 |
| SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE | Nov 25, 2024 | 3D GenerationGPU | —Unverified | 0 |
| Plastic Arbor: a modern simulation framework for synaptic plasticity x2013 from single synapses to networks of morphological neurons | Nov 25, 2024 | CPUGPU | CodeCode Available | 0 |
| MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking | Nov 24, 2024 | GPUImage Enhancement | —Unverified | 0 |
| Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format | Nov 24, 2024 | GPU | —Unverified | 0 |
| Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud | Nov 23, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Reassessing Layer Pruning in LLMs: New Insights and Methods | Nov 23, 2024 | BenchmarkingGPU | CodeCode Available | 0 |
| Multi-scale Cascaded Large-Model for Whole-body ROI Segmentation | Nov 23, 2024 | Computational EfficiencyGPU | CodeCode Available | 0 |
| Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers | Nov 22, 2024 | Data AugmentationGPU | —Unverified | 0 |
| Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting | Nov 21, 2024 | GPU | —Unverified | 0 |
| Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction | Nov 21, 2024 | 3D GenerationGPU | —Unverified | 0 |
| Deep operator network models for predicting post-burn contraction | Nov 21, 2024 | CPUGPU | —Unverified | 0 |
| Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training | Nov 20, 2024 | GPU | —Unverified | 0 |
| FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting | Nov 20, 2024 | Dimensionality ReductionGPU | —Unverified | 0 |
| Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning | Nov 19, 2024 | GPU | —Unverified | 0 |