| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Nov 15, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation | Nov 15, 2024 | Contrastive LearningDiagnostic | CodeCode Available | 2 |
| SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Nov 15, 2024 | 3D ReconstructionDepth Estimation | CodeCode Available | 2 |
| SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning | Nov 15, 2024 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| SymbolFit: Automatic Parametric Modeling with Symbolic Regression | Nov 15, 2024 | Formregression | CodeCode Available | 2 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation | Nov 15, 2024 | Image GenerationMamba | CodeCode Available | 2 |
| Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Nov 15, 2024 | Survey | CodeCode Available | 2 |
| Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation | Nov 14, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Golden Noise for Diffusion Models: A Learning Framework | Nov 14, 2024 | Prompt Learning | CodeCode Available | 2 |
| LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Nov 14, 2024 | Earth ObservationInstruction Following | CodeCode Available | 2 |
| Image Matching Filtering and Refinement by Planes and Beyond | Nov 14, 2024 | Deep LearningTemplate Matching | CodeCode Available | 2 |
| Squeezed Attention: Accelerating Long Context Length LLM Inference | Nov 14, 2024 | Code GenerationLarge Language Model | CodeCode Available | 2 |
| Physics Informed Distillation for Diffusion Models | Nov 13, 2024 | Dataset GenerationImage Generation | CodeCode Available | 2 |
| A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos | Nov 13, 2024 | Repetitive Action Counting | CodeCode Available | 2 |
| MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation | Nov 13, 2024 | 3DGSCamera Localization | CodeCode Available | 2 |
| Isotropic Correlation Models for the Cross-Section of Equity Returns | Nov 13, 2024 | | CodeCode Available | 2 |
| Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks | Nov 13, 2024 | Anomaly DetectionDemand Forecasting | CodeCode Available | 2 |
| Deep Learning Accelerated Quantum Transport Simulations in Nanoelectronics: From Break Junctions to Field-Effect Transistors | Nov 13, 2024 | Computational Efficiency | CodeCode Available | 2 |
| BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis | Nov 13, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance | Nov 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| Searching Latent Program Spaces | Nov 13, 2024 | ARCProgram induction | CodeCode Available | 2 |
| PyGen: A Collaborative Human-AI Approach to Python Package Creation | Nov 13, 2024 | Code Generation | CodeCode Available | 2 |
| LogLLM: Log-based Anomaly Detection Using Large Language Models | Nov 13, 2024 | Anomaly DetectionDecoder | CodeCode Available | 2 |