| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning | Nov 15, 2024 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| SymbolFit: Automatic Parametric Modeling with Symbolic Regression | Nov 15, 2024 | Formregression | CodeCode Available | 2 |
| M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation | Nov 15, 2024 | Image GenerationMamba | CodeCode Available | 2 |
| Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Nov 15, 2024 | Survey | CodeCode Available | 2 |
| SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Nov 15, 2024 | 3D ReconstructionDepth Estimation | CodeCode Available | 2 |
| CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Nov 15, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation | Nov 15, 2024 | Contrastive LearningDiagnostic | CodeCode Available | 2 |
| Squeezed Attention: Accelerating Long Context Length LLM Inference | Nov 14, 2024 | Code GenerationLarge Language Model | CodeCode Available | 2 |
| Image Matching Filtering and Refinement by Planes and Beyond | Nov 14, 2024 | Deep LearningTemplate Matching | CodeCode Available | 2 |
| Golden Noise for Diffusion Models: A Learning Framework | Nov 14, 2024 | Prompt Learning | CodeCode Available | 2 |
| Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation | Nov 14, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Nov 14, 2024 | Earth ObservationInstruction Following | CodeCode Available | 2 |
| Isotropic Correlation Models for the Cross-Section of Equity Returns | Nov 13, 2024 | | CodeCode Available | 2 |
| A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos | Nov 13, 2024 | Repetitive Action Counting | CodeCode Available | 2 |
| PyGen: A Collaborative Human-AI Approach to Python Package Creation | Nov 13, 2024 | Code Generation | CodeCode Available | 2 |
| Searching Latent Program Spaces | Nov 13, 2024 | ARCProgram induction | CodeCode Available | 2 |
| BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis | Nov 13, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation | Nov 13, 2024 | 3DGSCamera Localization | CodeCode Available | 2 |
| OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance | Nov 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| LogLLM: Log-based Anomaly Detection Using Large Language Models | Nov 13, 2024 | Anomaly DetectionDecoder | CodeCode Available | 2 |
| Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks | Nov 13, 2024 | Anomaly DetectionDemand Forecasting | CodeCode Available | 2 |
| Deep Learning Accelerated Quantum Transport Simulations in Nanoelectronics: From Break Junctions to Field-Effect Transistors | Nov 13, 2024 | Computational Efficiency | CodeCode Available | 2 |
| Physics Informed Distillation for Diffusion Models | Nov 13, 2024 | Dataset GenerationImage Generation | CodeCode Available | 2 |
| V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion | Nov 13, 2024 | 3D Object DetectionDenoising | CodeCode Available | 2 |
| Retrieval Augmented Time Series Forecasting | Nov 12, 2024 | RAGRetrieval | CodeCode Available | 2 |
| GTA: Global Tracklet Association for Multi-Object Tracking in Sports | Nov 12, 2024 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 |
| Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | Nov 12, 2024 | AttributeComputational Efficiency | CodeCode Available | 2 |
| RedCode: Risky Code Execution and Generation Benchmark for Code Agents | Nov 12, 2024 | | CodeCode Available | 2 |
| Tucano: Advancing Neural Text Generation for Portuguese | Nov 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection | Nov 12, 2024 | Optical Flow EstimationOut-of-Distribution Detection | CodeCode Available | 2 |
| TIPO: Text to Image with Text Presampling for Prompt Optimization | Nov 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Large Language Models Can Self-Improve in Long-context Reasoning | Nov 12, 2024 | | CodeCode Available | 2 |
| AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models | Nov 11, 2024 | Audio Super-ResolutionGPU | CodeCode Available | 2 |
| The Super Weight in Large Language Models | Nov 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Nov 11, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Nov 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | Nov 11, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents | Nov 10, 2024 | model | CodeCode Available | 2 |
| InvisMark: Invisible and Robust Watermarking for AI-generated Image Provenance | Nov 10, 2024 | SSIM | CodeCode Available | 2 |
| Reaction-conditioned De Novo Enzyme Design with GENzyme | Nov 10, 2024 | | CodeCode Available | 2 |
| Graph Neural Network Surrogates to leverage Mechanistic Expert Knowledge towards Reliable and Immediate Pandemic Response | Nov 10, 2024 | Decision MakingGraph Neural Network | CodeCode Available | 2 |
| Community Research Earth Digital Intelligence Twin (CREDIT) | Nov 9, 2024 | | CodeCode Available | 2 |
| Reliable-loc: Robust sequential LiDAR global localization in large-scale street scenes based on verifiable cues | Nov 9, 2024 | | CodeCode Available | 2 |
| Concept Bottleneck Language Models For protein design | Nov 9, 2024 | Decision MakingDrug Discovery | CodeCode Available | 2 |
| GFT: Graph Foundation Model with Transferable Tree Vocabulary | Nov 9, 2024 | Drug DiscoveryGraph Learning | CodeCode Available | 2 |
| Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks | Nov 8, 2024 | Emotion Recognition | CodeCode Available | 2 |
| End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LLM-PySC2: Starcraft II learning environment for Large Language Models | Nov 8, 2024 | Decision MakingLanguage Modelling | CodeCode Available | 2 |