| A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness | Nov 4, 2024 | Question AnsweringText Generation | CodeCode Available | 3 |
| Digitizing Touch with an Artificial Multimodal Fingertip | Nov 4, 2024 | ARC | CodeCode Available | 3 |
| Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration | Nov 3, 2024 | 5-Degradation Blind All-in-One Image RestorationBlind All-in-One Image Restoration | CodeCode Available | 3 |
| FilterNet: Harnessing Frequency Filters for Time Series Forecasting | Nov 3, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 |
| Rule Based Rewards for Language Model Safety | Nov 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| ZIM: Zero-Shot Image Matting for Anything | Nov 1, 2024 | Image InpaintingImage Matting | CodeCode Available | 3 |
| Face Anonymization Made Simple | Nov 1, 2024 | AttributeFace Anonymization | CodeCode Available | 3 |
| GameGen-X: Interactive Open-world Game Video Generation | Nov 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement | Nov 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making | Oct 31, 2024 | Decision MakingDiagnostic | CodeCode Available | 3 |
| SelfCodeAlign: Self-Alignment for Code Generation | Oct 31, 2024 | Code GenerationHumanEval | CodeCode Available | 3 |
| XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM | Oct 31, 2024 | 3DGSBenchmarking | CodeCode Available | 3 |
| PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks | Oct 31, 2024 | | CodeCode Available | 3 |
| AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | Oct 31, 2024 | Benchmarking | CodeCode Available | 3 |
| OS-ATLAS: A Foundation Action Model for Generalist GUI Agents | Oct 30, 2024 | Natural Language Visual Grounding | CodeCode Available | 3 |
| PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Oct 29, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 |
| Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework | Oct 28, 2024 | Image GenerationImage Manipulation | CodeCode Available | 3 |
| ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Oct 28, 2024 | CPU | CodeCode Available | 3 |
| Modular Duality in Deep Learning | Oct 28, 2024 | Deep LearningGPU | CodeCode Available | 3 |
| AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions | Oct 27, 2024 | Feature Engineering | CodeCode Available | 3 |
| Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders | Oct 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Centaur: a foundation model of human cognition | Oct 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Improving Model Evaluation using SMART Filtering of Benchmark Datasets | Oct 26, 2024 | ChatbotDiversity | CodeCode Available | 3 |
| OGBench: Benchmarking Offline Goal-Conditioned RL | Oct 26, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 3 |
| Paint Bucket Colorization Using Anime Character Color Design Sheets | Oct 25, 2024 | ColorizationLine Art Colorization | CodeCode Available | 3 |
| COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training | Oct 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models | Oct 25, 2024 | | CodeCode Available | 3 |
| Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances | Oct 24, 2024 | BenchmarkingImage to Video Generation | CodeCode Available | 3 |
| A Joint Representation Using Continuous and Discrete Features for Cardiovascular Diseases Risk Prediction on Chest CT Scans | Oct 24, 2024 | | CodeCode Available | 3 |
| PDL: A Declarative Prompt Programming Language | Oct 24, 2024 | RAG | CodeCode Available | 3 |
| Scaling up Masked Diffusion Models on Text | Oct 24, 2024 | GSM8KLanguage Modeling | CodeCode Available | 3 |
| Large Spatial Model: End-to-end Unposed Images to Semantic 3D | Oct 24, 2024 | 3D ReconstructionAttribute | CodeCode Available | 3 |
| 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation | Oct 24, 2024 | 3D Generation3D geometry | CodeCode Available | 3 |
| SMITE: Segment Me In TimE | Oct 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 3 |
| DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes | Oct 23, 2024 | Scene Generation | CodeCode Available | 3 |
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | Oct 23, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 3 |
| LEADS: Lightweight Embedded Assisted Driving System | Oct 23, 2024 | | CodeCode Available | 3 |
| VoiceBench: Benchmarking LLM-Based Voice Assistants | Oct 22, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding | Oct 22, 2024 | Token ReductionVideo Question Answering | CodeCode Available | 3 |
| Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss | Oct 22, 2024 | GPURepresentation Learning | CodeCode Available | 3 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| Generalizing Motion Planners with Mixture of Experts for Autonomous Driving | Oct 21, 2024 | Autonomous DrivingData Augmentation | CodeCode Available | 3 |
| Multi-Level Speaker Representation for Target Speaker Extraction | Oct 21, 2024 | Target Speaker Extraction | CodeCode Available | 3 |
| Pipeline Gradient-based Model Training on Analog In-memory Accelerators | Oct 19, 2024 | | CodeCode Available | 3 |
| A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends | Oct 19, 2024 | AllImage Restoration | CodeCode Available | 3 |
| Streaming Deep Reinforcement Learning Finally Works | Oct 18, 2024 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 3 |
| DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Oct 17, 2024 | Talking Head GenerationVideo Generation | CodeCode Available | 3 |
| FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model | Oct 17, 2024 | Computational EfficiencyImage Cropping | CodeCode Available | 3 |
| An Evolved Universal Transformer Memory | Oct 17, 2024 | | CodeCode Available | 3 |