| Paint by Inpaint: Learning to Add Image Objects by Removing Them First | Apr 28, 2024 | Image InpaintingLanguage Modeling | CodeCode Available | 2 |
| WorldGPT: Empowering LLM as Multimodal World Model | Apr 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment | Apr 28, 2024 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 2 |
| S^2Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification | Apr 28, 2024 | Hyperspectral Image Classificationimage-classification | CodeCode Available | 2 |
| FRAME: A Modular Framework for Autonomous Map Merging: Advancements in the Field | Apr 27, 2024 | Point Cloud Registration | CodeCode Available | 2 |
| LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing | Apr 27, 2024 | Log Parsing | CodeCode Available | 2 |
| Generative Diffusion-based Downscaling for Climate | Apr 27, 2024 | Super-Resolution | CodeCode Available | 2 |
| Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations | Apr 26, 2024 | Imitation Learning | CodeCode Available | 2 |
| Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout | Apr 26, 2024 | | CodeCode Available | 2 |
| UniRGB-IR: A Unified Framework for RGB-Infrared Semantic Tasks via Adapter Tuning | Apr 26, 2024 | Multispectral Object DetectionPedestrian Detection | CodeCode Available | 2 |
| PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games | Apr 26, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search | Apr 25, 2024 | Entity EmbeddingsImage Captioning | CodeCode Available | 2 |
| REBEL: Reinforcement Learning via Regressing Relative Rewards | Apr 25, 2024 | continuous-controlContinuous Control | CodeCode Available | 2 |
| CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions | Apr 25, 2024 | MambaMultispectral Object Detection | CodeCode Available | 2 |
| Learning Visuotactile Skills with Two Multifingered Hands | Apr 25, 2024 | | CodeCode Available | 2 |
| Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents | Apr 25, 2024 | Decision MakingSpecificity | CodeCode Available | 2 |
| A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation | Apr 25, 2024 | Autonomous DrivingEvolutionary Algorithms | CodeCode Available | 2 |
| Commonsense Prototype for Outdoor Unsupervised 3D Object Detection | Apr 25, 2024 | 3D Object DetectionObject | CodeCode Available | 2 |
| IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | Apr 25, 2024 | Cross-Lingual Question AnsweringDiversity | CodeCode Available | 2 |
| EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces | Apr 25, 2024 | EEGElectroencephalogram (EEG) | CodeCode Available | 2 |
| List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | Apr 25, 2024 | Visual GroundingVisual Question Answering | CodeCode Available | 2 |
| DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting | Apr 25, 2024 | Exemplar-Free CountingFew-shot Object Counting and Detection | CodeCode Available | 2 |
| Multimodal Information Interaction for Medical Image Segmentation | Apr 25, 2024 | Heart SegmentationImage Segmentation | CodeCode Available | 2 |
| Weak-to-Strong Extrapolation Expedites Alignment | Apr 25, 2024 | | CodeCode Available | 2 |
| TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models | Apr 25, 2024 | DenoisingImage to Video Generation | CodeCode Available | 2 |
| Multi-Scale Representations by Varying Window Attention for Semantic Segmentation | Apr 25, 2024 | DecoderSemantic Segmentation | CodeCode Available | 2 |
| Latent Modulated Function for Computational Optimal Continuous Image Representation | Apr 25, 2024 | Computational EfficiencySuper-Resolution | CodeCode Available | 2 |
| The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models | Apr 24, 2024 | DiversityNavigate | CodeCode Available | 2 |
| Gradformer: Graph Transformer with Exponential Decay | Apr 24, 2024 | Graph ClassificationGraph Neural Network | CodeCode Available | 2 |
| From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models | Apr 24, 2024 | Instruction Following | CodeCode Available | 2 |
| A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution | Apr 24, 2024 | Blind Super-ResolutionImage Restoration | CodeCode Available | 2 |
| MaGGIe: Masked Guided Gradual Human Instance Matting | Apr 24, 2024 | Image MattingVideo Matting | CodeCode Available | 2 |
| Let's Think Dot by Dot: Hidden Computation in Transformer Language Models | Apr 24, 2024 | | CodeCode Available | 2 |
| Telco-RAG: Navigating the Challenges of Retrieval-Augmented Language Models for Telecommunications | Apr 24, 2024 | RAGRetrieval | CodeCode Available | 2 |
| Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges | Apr 24, 2024 | Drug DesignInductive Bias | CodeCode Available | 2 |
| zkLLM: Zero Knowledge Proofs for Large Language Models | Apr 24, 2024 | | CodeCode Available | 2 |
| Facilitating Advanced Sentinel-2 Analysis Through a Simplified Computation of Nadir BRDF Adjusted Reflectance | Apr 24, 2024 | | CodeCode Available | 2 |
| Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization | Apr 23, 2024 | global-optimizationOptical Flow Estimation | CodeCode Available | 2 |
| From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation | Apr 23, 2024 | Image Generation | CodeCode Available | 2 |
| Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model | Apr 23, 2024 | 3D Point Cloud ClassificationMamba | CodeCode Available | 2 |
| Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering | Apr 23, 2024 | Graph Question AnsweringHallucination | CodeCode Available | 2 |
| GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration | Apr 23, 2024 | Collaborative InferenceIn-Context Learning | CodeCode Available | 2 |
| SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Apr 23, 2024 | 3D Human Pose EstimationPose Estimation | CodeCode Available | 2 |
| An empirical study of LLaMA3 quantization: from LLMs to MLLMs | Apr 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Graphic Design with Large Multimodal Model | Apr 22, 2024 | Layout Generationmodel | CodeCode Available | 2 |
| Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs | Apr 22, 2024 | Misinformation | CodeCode Available | 2 |
| Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy | Apr 22, 2024 | Autonomous DrivingDeep Learning | CodeCode Available | 2 |
| SpaceByte: Towards Deleting Tokenization from Large Language Modeling | Apr 22, 2024 | DecoderLanguage Modeling | CodeCode Available | 2 |
| SwinFuSR: an image fusion-inspired model for RGB-guided thermal image super-resolution | Apr 22, 2024 | Image Super-ResolutionSSIM | CodeCode Available | 2 |
| CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding | Apr 22, 2024 | Attribute | CodeCode Available | 2 |