| Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation | Dec 9, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | Dec 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video | Dec 9, 2024 | 3DGS4D reconstruction | CodeCode Available | 2 |
| M^3-20M: A Large-Scale Multi-Modal Molecule Dataset for AI-driven Drug Design and Discovery | Dec 8, 2024 | Drug DesignMolecular Property Prediction | CodeCode Available | 2 |
| TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action | Dec 7, 2024 | Depth EstimationMathematical Reasoning | CodeCode Available | 2 |
| Perceptually Transparent Binaural Auralization of Simulated Sound Fields | Dec 6, 2024 | | CodeCode Available | 2 |
| PanoDreamer: Optimization-Based Single Image to 360 3D Scene With Diffusion | Dec 6, 2024 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 2 |
| LinVT: Empower Your Image-level Large Language Model to Understand Videos | Dec 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection | Dec 6, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Dec 6, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| Wavelet Diffusion Neural Operator | Dec 6, 2024 | | CodeCode Available | 2 |
| DreamColour: Controllable Video Colour Editing without Training | Dec 6, 2024 | Instance SegmentationSemantic Segmentation | CodeCode Available | 2 |
| C^2LEVA: Toward Comprehensive and Contamination-Free Language Model Evaluation | Dec 6, 2024 | Language Model EvaluationLanguage Modeling | CodeCode Available | 2 |
| Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction | Dec 6, 2024 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 |
| SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning | Dec 5, 2024 | Domain AdaptationDomain Generalization | CodeCode Available | 2 |
| Federated Learning in Mobile Networks: A Comprehensive Case Study on Traffic Forecasting | Dec 5, 2024 | Federated LearningManagement | CodeCode Available | 2 |
| Monet: Mixture of Monosemantic Experts for Transformers | Dec 5, 2024 | Dictionary LearningMixture-of-Experts | CodeCode Available | 2 |
| QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Dec 5, 2024 | AttributeQuantization | CodeCode Available | 2 |
| Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Dec 5, 2024 | Image ComprehensionRepresentation Learning | CodeCode Available | 2 |
| FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression | Dec 5, 2024 | DescriptiveVisual Question Answering | CodeCode Available | 2 |
| Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models | Dec 5, 2024 | | CodeCode Available | 2 |
| SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model | Dec 5, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 2 |
| Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation | Dec 5, 2024 | Image SegmentationOpen Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos | Dec 5, 2024 | Robot Manipulation | CodeCode Available | 2 |
| HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting | Dec 5, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding | Dec 5, 2024 | PredictionScene Understanding | CodeCode Available | 2 |
| Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation | Dec 5, 2024 | Semantic SegmentationTime Series | CodeCode Available | 2 |
| ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality | Dec 5, 2024 | Image Generation | CodeCode Available | 2 |
| Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning | Dec 4, 2024 | Federated Learning | CodeCode Available | 2 |
| CleanDIFT: Diffusion Features without Noise | Dec 4, 2024 | Semantic correspondence | CodeCode Available | 2 |
| Volumetrically Consistent 3D Gaussian Rasterization | Dec 4, 2024 | 3DGSSSIM | CodeCode Available | 2 |
| AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning | Dec 4, 2024 | Video Understanding | CodeCode Available | 2 |
| JPC: Flexible Inference for Predictive Coding Networks in JAX | Dec 4, 2024 | | CodeCode Available | 2 |
| FLAIR: VLM with Fine-grained Language-informed Image Representations | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Good practices for evaluation of machine learning systems | Dec 4, 2024 | | CodeCode Available | 2 |
| Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion | Dec 4, 2024 | Autonomous VehiclesLidar Scene Completion | CodeCode Available | 2 |
| How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | Dec 4, 2024 | GSM8K | CodeCode Available | 2 |
| MmCows: A Multimodal Dataset for Dairy Cattle Monitoring | Dec 4, 2024 | | CodeCode Available | 2 |
| Video Quality Assessment: A Comprehensive Survey | Dec 4, 2024 | BenchmarkingSurvey | CodeCode Available | 2 |
| HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset | Dec 3, 2024 | 3D Generation | CodeCode Available | 2 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis | Dec 3, 2024 | Image Generation | CodeCode Available | 2 |
| Diffusion-based Visual Anagram as Multi-task Learning | Dec 3, 2024 | DenoisingMulti-Task Learning | CodeCode Available | 2 |
| ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Dec 3, 2024 | 2D Human Pose EstimationData Augmentation | CodeCode Available | 2 |
| VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation | Dec 3, 2024 | Script GenerationVideo Generation | CodeCode Available | 2 |
| Enhanced Photovoltaic Power Forecasting: An iTransformer and LSTM-Based Model Integrating Temporal and Covariate Interactions | Dec 3, 2024 | energy managementManagement | CodeCode Available | 2 |
| Conformal Symplectic Optimization for Stable Reinforcement Learning | Dec 3, 2024 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 2 |
| Hacking CTFs with Plain Agents | Dec 3, 2024 | | CodeCode Available | 2 |
| Many-MobileNet: Multi-Model Augmentation for Robust Retinal Disease Classification | Dec 3, 2024 | Computational EfficiencyData Augmentation | CodeCode Available | 2 |
| OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows | Dec 2, 2024 | Audio SynthesisImage Generation | CodeCode Available | 2 |