| SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation | Nov 26, 2024 | DiversityImage Segmentation | CodeCode Available | 3 |
| CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos | Nov 26, 2024 | Common Sense ReasoningImitation Learning | CodeCode Available | 3 |
| Star Attention: Efficient LLM Inference over Long Sequences | Nov 26, 2024 | Computational Efficiency | CodeCode Available | 3 |
| On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning | Nov 26, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 |
| A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Nov 26, 2024 | Object TrackingSemi-Supervised Video Object Segmentation | CodeCode Available | 3 |
| SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving | Nov 25, 2024 | 3DGSAutonomous Driving | CodeCode Available | 3 |
| Cautious Optimizers: Improving Training with One Line of Code | Nov 25, 2024 | | CodeCode Available | 3 |
| BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment | Nov 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM | Nov 25, 2024 | Autonomous DrivingNovel View Synthesis | CodeCode Available | 3 |
| Nimbus: Secure and Efficient Two-Party Inference for Transformers | Nov 24, 2024 | | CodeCode Available | 3 |
| MobileMamba: Lightweight Multi-Receptive Visual Mamba Network | Nov 24, 2024 | GPUMamba | CodeCode Available | 3 |
| BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence | Nov 22, 2024 | 3D visual groundingVisual Grounding | CodeCode Available | 3 |
| TEXGen: a Generative Diffusion Model for Mesh Textures | Nov 22, 2024 | modelTexture Synthesis | CodeCode Available | 3 |
| MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Nov 22, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes | Nov 22, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 |
| Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing | Nov 22, 2024 | Computational EfficiencyCPU | CodeCode Available | 3 |
| SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model | Nov 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Nov 21, 2024 | Visual Reasoning | CodeCode Available | 3 |
| Stable Flow: Vital Layers for Training-Free Image Editing | Nov 21, 2024 | Text-based Image Editing | CodeCode Available | 3 |
| Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Nov 20, 2024 | GPUMME | CodeCode Available | 3 |
| When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Nov 20, 2024 | Computational EfficiencyPosition | CodeCode Available | 3 |
| REDUCIO! Generating 10241024 Video within 16 Seconds using Extremely Compressed Motion Latents | Nov 20, 2024 | GPUVideo Generation | CodeCode Available | 3 |
| Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline | Nov 19, 2024 | Image SegmentationInteractive Segmentation | CodeCode Available | 3 |
| ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses | Nov 18, 2024 | | CodeCode Available | 3 |
| DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes | Nov 18, 2024 | Autonomous DrivingSurface Reconstruction | CodeCode Available | 3 |
| FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations | Nov 16, 2024 | Visual Storytelling | CodeCode Available | 3 |
| Model Inversion Attacks: A Survey of Approaches and Countermeasures | Nov 15, 2024 | Survey | CodeCode Available | 3 |
| WavChat: A Survey of Spoken Dialogue Models | Nov 15, 2024 | speech-recognitionSpeech Recognition | CodeCode Available | 3 |
| Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey | Nov 14, 2024 | | CodeCode Available | 3 |
| Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and Forecasts | Nov 14, 2024 | Benchmarking | CodeCode Available | 3 |
| InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders | Nov 13, 2024 | | CodeCode Available | 3 |
| CameraHMR: Aligning People with Perspective | Nov 12, 2024 | 3D human pose and shape estimation | CodeCode Available | 3 |
| MureObjectStitch: Multi-reference Image Composition | Nov 12, 2024 | Object | CodeCode Available | 3 |
| The Surprising Effectiveness of Test-Time Training for Few-Shot Learning | Nov 11, 2024 | ARCFew-Shot Learning | CodeCode Available | 3 |
| General Geospatial Inference with a Population Dynamics Foundation Model | Nov 11, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 3 |
| SplatFormer: Point Transformer for Robust 3D Gaussian Splatting | Nov 10, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 |
| Game-theoretic LLM: Agent Workflow for Negotiation Games | Nov 8, 2024 | Decision Making | CodeCode Available | 3 |
| Effects of charging and discharging capabilities on trade-offs between model accuracy and computational efficiency in pumped thermal electricity storage | Nov 8, 2024 | Computational Efficiency | CodeCode Available | 3 |
| SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications | Nov 7, 2024 | Code GenerationLanguage Modeling | CodeCode Available | 3 |
| DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | Nov 7, 2024 | Object Localization | CodeCode Available | 3 |
| ZipNN: Lossless Compression for AI Models | Nov 7, 2024 | Model Compression | CodeCode Available | 3 |
| MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views | Nov 7, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| Classification Done Right for Vision-Language Pre-Training | Nov 5, 2024 | Classification | CodeCode Available | 3 |
| ADOPT: Modified Adam Can Converge with Any β_2 with the Optimal Rate | Nov 5, 2024 | Deep Reinforcement Learningimage-classification | CodeCode Available | 3 |
| HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | Nov 5, 2024 | HallucinationRAG | CodeCode Available | 3 |
| Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Nov 5, 2024 | BenchmarkingHallucination | CodeCode Available | 3 |
| AutoVFX: Physically Realistic Video Editing from Natural Language Instructions | Nov 4, 2024 | Code GenerationVideo Editing | CodeCode Available | 3 |
| A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness | Nov 4, 2024 | Question AnsweringText Generation | CodeCode Available | 3 |
| ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer | Nov 4, 2024 | PositionTime Series | CodeCode Available | 3 |
| Addressing Representation Collapse in Vector Quantized Models with One Linear Layer | Nov 4, 2024 | QuantizationRepresentation Learning | CodeCode Available | 3 |