| CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation | Oct 7, 2024 | GPUMachine Translation | —Unverified | 0 |
| PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing | Oct 7, 2024 | GPU | CodeCode Available | 1 |
| Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective | Oct 6, 2024 | CPUGPU | CodeCode Available | 1 |
| PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms | Oct 5, 2024 | BenchmarkingGPU | —Unverified | 0 |
| Fast Object Detection with a Machine Learning Edge Device | Oct 5, 2024 | Autonomous NavigationCPU | —Unverified | 0 |
| High-Speed Stereo Visual SLAM for Low-Powered Computing Devices | Oct 5, 2024 | GPU | CodeCode Available | 3 |
| Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning | Oct 4, 2024 | CPUDeep Learning | —Unverified | 0 |
| Compute Or Load KV Cache? Why Not Both? | Oct 4, 2024 | GPU | —Unverified | 0 |
| SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Oct 4, 2024 | 16kCode Generation | CodeCode Available | 3 |
| LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy | Oct 4, 2024 | GPULow-rank compression | —Unverified | 0 |
| Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach | Oct 3, 2024 | energy managementGPU | CodeCode Available | 0 |
| Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping | Oct 3, 2024 | GPUMixture-of-Experts | —Unverified | 0 |
| Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network | Oct 3, 2024 | GPUReal-Time Semantic Segmentation | —Unverified | 0 |
| LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences | Oct 3, 2024 | GPUGraph Neural Network | —Unverified | 0 |
| LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services | Oct 3, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| Learning from Offline Foundation Features with Tensor Augmentations | Oct 3, 2024 | GPU | —Unverified | 0 |
| An Efficient Inference Frame for SMLM (Single-Molecule Localization Microscopy) | Oct 3, 2024 | Deep LearningGPU | CodeCode Available | 0 |
| Contextual Document Embeddings | Oct 3, 2024 | Contrastive LearningDocument Embedding | —Unverified | 0 |
| Depth Pro: Sharp Monocular Metric Depth in Less Than a Second | Oct 2, 2024 | Depth EstimationGPU | CodeCode Available | 9 |
| A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts | Oct 2, 2024 | 4kGPU | —Unverified | 0 |
| Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices | Oct 2, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Oct 2, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| Replacement Learning: Training Vision Tasks with Fewer Learnable Parameters | Oct 2, 2024 | GPU | —Unverified | 0 |
| TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery | Oct 2, 2024 | GPUModel Discovery | CodeCode Available | 1 |
| FlashMask: Efficient and Rich Mask Extension of FlashAttention | Oct 2, 2024 | Computational EfficiencyGPU | —Unverified | 0 |