| PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs | Nov 24, 2024 | Image Generation | CodeCode Available | 1 |
| PromptHSI: Universal Hyperspectral Image Restoration with Vision-Language Modulated Frequency Adaptation | Nov 24, 2024 | Image RestorationLanguage Modeling | CodeCode Available | 1 |
| GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision | Nov 24, 2024 | 3DGS3D Reconstruction | CodeCode Available | 1 |
| Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching | Nov 24, 2024 | ObjectPose Estimation | CodeCode Available | 1 |
| Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics | Nov 24, 2024 | Brain Tumor SegmentationTumor Segmentation | CodeCode Available | 1 |
| TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models | Nov 24, 2024 | Problem DecompositionTime Series | CodeCode Available | 1 |
| Towards RAW Object Detection in Diverse Conditions | Nov 24, 2024 | Objectobject-detection | CodeCode Available | 1 |
| Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning | Nov 24, 2024 | | CodeCode Available | 1 |
| Peritumoral Expansion Radiomics for Improved Lung Cancer Classification | Nov 24, 2024 | 3D ClassificationCancer Classification | CodeCode Available | 1 |
| Highly Efficient and Unsupervised Framework for Moving Object Detection in Satellite Videos | Nov 24, 2024 | Moving Object Detectionobject-detection | CodeCode Available | 1 |
| LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions | Nov 24, 2024 | | CodeCode Available | 1 |
| Navigating the Effect of Parametrization for Dimensionality Reduction | Nov 24, 2024 | Dimensionality Reduction | CodeCode Available | 1 |
| VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding | Nov 24, 2024 | HallucinationLanguage Modeling | CodeCode Available | 1 |
| LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic Annotation | Nov 24, 2024 | Ensemble LearningObject | CodeCode Available | 1 |
| Medical Slice Transformer: Improved Diagnosis and Explainability on 3D Medical Images with DINOv2 | Nov 24, 2024 | ClassificationDiagnostic | CodeCode Available | 1 |
| ROOT: VLM based System for Indoor Scene Understanding and Beyond | Nov 24, 2024 | Scene GenerationScene Understanding | CodeCode Available | 1 |
| MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training | Nov 23, 2024 | Computed Tomography (CT)Image Segmentation | CodeCode Available | 1 |
| FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation | Nov 23, 2024 | AnatomyImage Captioning | CodeCode Available | 1 |
| GeoAI-Enhanced Community Detection on Spatial Networks with Graph Deep Learning | Nov 23, 2024 | AttributeCommunity Detection | CodeCode Available | 1 |
| TKG-DM: Training-free Chroma Key Content Generation Diffusion Model | Nov 23, 2024 | | CodeCode Available | 1 |
| Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark | Nov 23, 2024 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| Accelerated Hydration Site Localization and Thermodynamic Profiling | Nov 23, 2024 | | CodeCode Available | 1 |
| Multi-label Sequential Sentence Classification via Large Language Model | Nov 23, 2024 | Contrastive LearningExtractive Summarization | CodeCode Available | 1 |
| Revelio: Interpreting and leveraging semantic information in diffusion models | Nov 23, 2024 | DenoisingLanguage Modeling | CodeCode Available | 1 |
| Sample- and Parameter-Efficient Auto-Regressive Image Models | Nov 23, 2024 | | CodeCode Available | 1 |
| Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT | Nov 23, 2024 | Drug DiscoveryMolecular Docking | CodeCode Available | 1 |
| LDM-Morph: Latent diffusion model guided deformable image registration | Nov 23, 2024 | Computational EfficiencyImage Registration | CodeCode Available | 1 |
| OCDet: Object Center Detection via Bounding Box-Aware Heatmap Prediction on Edge Devices with NPUs | Nov 23, 2024 | Keypoint DetectionObject | CodeCode Available | 1 |
| Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai | Nov 23, 2024 | DiversityQuestion Answering | CodeCode Available | 1 |
| FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification | Nov 22, 2024 | DiagnosticFew-Shot Learning | CodeCode Available | 1 |
| AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Nov 22, 2024 | | CodeCode Available | 1 |
| Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models | Nov 22, 2024 | Data AugmentationInformation Retrieval | CodeCode Available | 1 |
| CardioLab: Laboratory Values Estimation and Monitoring from Electrocardiogram Signals -- A Multimodal Deep Learning Approach | Nov 22, 2024 | Medical DiagnosisMultimodal Deep Learning | CodeCode Available | 1 |
| A Plug-and-Play Temporal Normalization Module for Robust Remote Photoplethysmography | Nov 22, 2024 | | CodeCode Available | 1 |
| FastGrasp: Efficient Grasp Synthesis with Diffusion | Nov 22, 2024 | Diversity | CodeCode Available | 1 |
| PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision | Nov 22, 2024 | | CodeCode Available | 1 |
| ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos | Nov 22, 2024 | Language-Based Temporal LocalizationLanguage Modeling | CodeCode Available | 1 |
| Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small Vessel Enhancement and Morphological Correction | Nov 22, 2024 | Segmentation | CodeCode Available | 1 |
| Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction | Nov 22, 2024 | Click-Through Rate PredictionRetrieval | CodeCode Available | 1 |
| AI Foundation Models for Wearable Movement Data in Mental Health Research | Nov 22, 2024 | | CodeCode Available | 1 |
| FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data | Nov 22, 2024 | Federated Learning | CodeCode Available | 1 |
| Lie-Equivariant Quantum Graph Neural Networks | Nov 22, 2024 | Binary ClassificationGraph Neural Network | CodeCode Available | 1 |
| Exploring Foundation Models Fine-Tuning for Cytology Classification | Nov 22, 2024 | ClassificationFew-Shot Learning | CodeCode Available | 1 |
| Expert-guided protein language models enable accurate and blazingly fast fitness prediction | Nov 22, 2024 | CPUMultiple Sequence Alignment | CodeCode Available | 1 |
| Recursive Gaussian Process State Space Model | Nov 22, 2024 | Computational EfficiencyHyperparameter Optimization | CodeCode Available | 1 |
| A Benchmark Dataset for Collaborative SLAM in Service Environments | Nov 22, 2024 | | CodeCode Available | 1 |
| There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks | Nov 22, 2024 | In-Context Learning | CodeCode Available | 1 |
| Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers | Nov 22, 2024 | AvgDeep Reinforcement Learning | CodeCode Available | 1 |
| CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation | Nov 21, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 1 |
| Learning to Cooperate with Humans using Generative Agents | Nov 21, 2024 | Multi-agent Reinforcement Learning | CodeCode Available | 1 |