| Faithful Logical Reasoning via Symbolic Chain-of-Thought | May 28, 2024 | Logical Reasoning | CodeCode Available | 3 |
| Low-Rank Few-Shot Adaptation of Vision-Language Models | May 28, 2024 | Few-Shot Learningparameter-efficient fine-tuning | CodeCode Available | 3 |
| Tool Learning with Large Language Models: A Survey | May 28, 2024 | Response GenerationSurvey | CodeCode Available | 3 |
| ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling | May 28, 2024 | Prompt Engineering | CodeCode Available | 3 |
| CHESS: Contextual Harnessing for Efficient SQL Synthesis | May 27, 2024 | Large Language ModelPrivacy Preserving | CodeCode Available | 3 |
| Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving | May 27, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 3 |
| Hawk: Learning to Understand Open-World Video Anomalies | May 27, 2024 | Anomaly DetectionQuestion Answering | CodeCode Available | 3 |
| Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels | May 27, 2024 | 4D reconstruction | CodeCode Available | 3 |
| GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns | May 27, 2024 | | CodeCode Available | 3 |
| MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | May 27, 2024 | 4D reconstructionPose Estimation | CodeCode Available | 3 |
| RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control | May 27, 2024 | | CodeCode Available | 3 |
| Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention | May 27, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 |
| GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping | May 27, 2024 | Depth EstimationDiversity | CodeCode Available | 3 |
| vHeat: Building Vision Models upon Heat Conduction | May 26, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| GRAG: Graph Retrieval-Augmented Generation | May 26, 2024 | Entity RetrievalKnowledge Graphs | CodeCode Available | 3 |
| Demystify Mamba in Vision: A Linear Attention Perspective | May 26, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting | May 24, 2024 | NeRFNovel View Synthesis | CodeCode Available | 3 |
| Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach | May 24, 2024 | ClusteringSelf-Supervised Learning | CodeCode Available | 3 |
| NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer | May 24, 2024 | Novel View Synthesis | CodeCode Available | 3 |
| GroundGrid:LiDAR Point Cloud Ground Segmentation and Terrain Estimation | May 24, 2024 | Autonomous VehiclesSegmentation | CodeCode Available | 3 |
| Pipeline Parallelism with Controllable Memory | May 24, 2024 | | CodeCode Available | 3 |
| SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction | May 24, 2024 | Autonomous DrivingMotion Generation | CodeCode Available | 3 |
| Scalable Optimization in the Modular Norm | May 23, 2024 | | CodeCode Available | 3 |
| PuzzleAvatar: Assembling 3D Avatars from Personal Albums | May 23, 2024 | Language ModellingText to 3D | CodeCode Available | 3 |
| From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | May 23, 2024 | GSM8K | CodeCode Available | 3 |
| DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis | May 23, 2024 | Image GenerationMamba | CodeCode Available | 3 |
| Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection | May 23, 2024 | Anomaly DetectionMulti-class Anomaly Detection | CodeCode Available | 3 |
| Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization | May 23, 2024 | | CodeCode Available | 3 |
| Deep Learning for Protein-Ligand Docking: Are We There Yet? | May 23, 2024 | Deep LearningDrug Discovery | CodeCode Available | 3 |
| RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models | May 23, 2024 | HallucinationSentence | CodeCode Available | 3 |
| A Declarative System for Optimizing AI Workloads | May 23, 2024 | | CodeCode Available | 3 |
| Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer | May 23, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| 360Zhinao Technical Report | May 22, 2024 | 4k | CodeCode Available | 3 |
| Intervention-Aware Forecasting: Breaking Historical Limits from a System Perspective | May 22, 2024 | Data IntegrationSensitivity | CodeCode Available | 3 |
| DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus | May 22, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization | May 20, 2024 | Visual Localization | CodeCode Available | 3 |
| A Foundation Model for the Earth System | May 20, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 |
| MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | May 20, 2024 | Continual PretrainingMathematical Reasoning | CodeCode Available | 3 |
| FIFO-Diffusion: Generating Infinite Videos from Text without Training | May 19, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| On the Trajectory Regularity of ODE-based Diffusion Sampling | May 18, 2024 | DenoisingImage Generation | CodeCode Available | 3 |
| Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | May 17, 2024 | 16kBenchmarking | CodeCode Available | 3 |
| From Sora What We Can See: A Survey of Text-to-Video Generation | May 17, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Efficient Multimodal Large Language Models: A Survey | May 17, 2024 | Edge-computingQuestion Answering | CodeCode Available | 3 |
| CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation | May 17, 2024 | DecoderMamba | CodeCode Available | 3 |
| Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis | May 16, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| 4D Panoptic Scene Graph Generation | May 16, 2024 | 4D Panoptic SegmentationGraph Generation | CodeCode Available | 3 |
| How Far Are We From AGI: Are LLMs All We Need? | May 16, 2024 | All | CodeCode Available | 3 |
| SARATR-X: Toward Building A Foundation Model for SAR Target Recognition | May 15, 2024 | 2D Object DetectionEarth Observation | CodeCode Available | 3 |