| TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition | Dec 2, 2024 | Image GenerationOptical Character Recognition (OCR) | CodeCode Available | 2 |
| RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting | Dec 2, 2024 | | CodeCode Available | 2 |
| Global Estimation of Building-Integrated Facade and Rooftop Photovoltaic Potential by Integrating 3D Building Footprint and Spatio-Temporal Datasets | Dec 2, 2024 | | CodeCode Available | 2 |
| Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective | Dec 2, 2024 | Density EstimationOffline RL | CodeCode Available | 2 |
| SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames | Dec 2, 2024 | geo-localizationPose Estimation | CodeCode Available | 2 |
| LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant | Dec 2, 2024 | Contrastive LearningInformation Retrieval | CodeCode Available | 2 |
| SfM-Free 3D Gaussian Splatting via Hierarchical Training | Dec 2, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows | Dec 2, 2024 | Audio SynthesisImage Generation | CodeCode Available | 2 |
| NLPrompt: Noise-Label Prompt Learning for Vision-Language Models | Dec 2, 2024 | Learning TheoryLearning with noisy labels | CodeCode Available | 2 |
| V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction | Dec 2, 2024 | Prediction | CodeCode Available | 2 |
| Commit0: Library Generation from Scratch | Dec 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs | Dec 2, 2024 | AllLanguage Modeling | CodeCode Available | 2 |
| InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences | Dec 2, 2024 | | CodeCode Available | 2 |
| LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences | Dec 2, 2024 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 2 |
| TinyFusion: Diffusion Transformers Learned Shallow | Dec 2, 2024 | Image Generation | CodeCode Available | 2 |
| Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention | Dec 1, 2024 | 3D Object Reconstruction3D Reconstruction | CodeCode Available | 2 |
| CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking | Dec 1, 2024 | Bug fixingCode Generation | CodeCode Available | 2 |
| BIGCity: A Universal Spatiotemporal Model for Unified Trajectory and Traffic State Data Analysis | Dec 1, 2024 | | CodeCode Available | 2 |
| Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification | Dec 1, 2024 | GPUVisual Question Answering | CodeCode Available | 2 |
| Scaling New Frontiers: Insights into Large Recommendation Models | Dec 1, 2024 | Recommendation Systems | CodeCode Available | 2 |
| A Comprehensive Guide to Explainable AI: From Classical Models to LLMs | Dec 1, 2024 | Causal Inferencecounterfactual | CodeCode Available | 2 |
| 2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification | Dec 1, 2024 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| Playable Game Generation | Dec 1, 2024 | GPUImage Generation | CodeCode Available | 2 |
| Ref-GS: Directional Factorization for 2D Gaussian Splatting | Dec 1, 2024 | | CodeCode Available | 2 |
| Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments | Nov 30, 2024 | Autonomous NavigationGPU | CodeCode Available | 2 |
| Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows | Nov 30, 2024 | Dynamic Time Warping | CodeCode Available | 2 |
| PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation | Nov 30, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 2 |
| Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection | Nov 29, 2024 | Prompt Learning | CodeCode Available | 2 |
| VLSBench: Unveiling Visual Leakage in Multimodal Safety | Nov 29, 2024 | | CodeCode Available | 2 |
| OpenQDC: Open Quantum Data Commons | Nov 29, 2024 | Benchmarking | CodeCode Available | 2 |
| Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning | Nov 29, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| KV Shifting Attention Enhances Language Modeling | Nov 29, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos | Nov 29, 2024 | Boundary DetectionDense Video Captioning | CodeCode Available | 2 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 |
| DeMo: Decoupled Momentum Optimization | Nov 29, 2024 | 10-shot image generation1 Image, 2*2 Stitchi | CodeCode Available | 2 |
| TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting | Nov 29, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| L4acados: Learning-based models for acados, applied to Gaussian process-based predictive control | Nov 28, 2024 | Computational EfficiencyGaussian Processes | CodeCode Available | 2 |
| SADG: Segment Any Dynamic Gaussian Without Object Trackers | Nov 28, 2024 | 3D ReconstructionAutonomous Driving | CodeCode Available | 2 |
| SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors | Nov 28, 2024 | Novel View Synthesis | CodeCode Available | 2 |
| Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs | Nov 28, 2024 | Object | CodeCode Available | 2 |
| Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition | Nov 28, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 2 |
| GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks | Nov 28, 2024 | BenchmarkingObject Counting | CodeCode Available | 2 |
| Auto-Encoded Supervision for Perceptual Image Super-Resolution | Nov 28, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation | Nov 28, 2024 | Segmentation | CodeCode Available | 2 |
| Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2 | Nov 28, 2024 | Video SegmentationVideo Semantic Segmentation | CodeCode Available | 2 |
| AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models | Nov 28, 2024 | Audio captioningAudio to Text Retrieval | CodeCode Available | 2 |
| ETAP: Event-based Tracking of Any Point | Nov 28, 2024 | Motion Estimation | CodeCode Available | 2 |
| OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration | Nov 28, 2024 | Depth Completion | CodeCode Available | 2 |
| Monocular Obstacle Avoidance Based on Inverse PPO for Fixed-wing UAVs | Nov 27, 2024 | Collision AvoidanceDeep Reinforcement Learning | CodeCode Available | 2 |
| GaussianSpeech: Audio-Driven Gaussian Avatars | Nov 27, 2024 | 3DGS | CodeCode Available | 2 |