SOTAVerified

Taming Large Multimodal Agents for Ultra-low Bitrate Semantically Disentangled Image Compression

2025-03-01Code Available0· sign in to hype

Juan Song, Lijie Yang, Mingtao Feng

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

It remains a significant challenge to compress images at ultra-low bitrate while achieving both semantic consistency and high perceptual quality. We propose a novel image compression framework, Semantically Disentangled Image Compression (SEDIC) in this paper. Our proposed SEDIC leverages large multimodal models (LMMs) to disentangle the image into several essential semantic information, including an extremely compressed reference image, overall and object-level text descriptions, and the semantic masks. A multi-stage semantic decoder is designed to progressively restore the transmitted reference image object-by-object, ultimately producing high-quality and perceptually consistent reconstructions. In each decoding stage, a pre-trained controllable diffusion model is utilized to restore the object details on the reference image conditioned by the text descriptions and semantic masks. Experimental results demonstrate that SEDIC significantly outperforms state-of-the-art approaches, achieving superior perceptual quality and semantic consistency at ultra-low bitrates ( 0.05 bpp). Our code is available at https://github.com/yang-xidian/SEDIC.

Tasks

Reproductions