Catalogue Search | MBRL

Side information-driven image coding for hybrid machine–human vision

by Peng, Wen-Hsiao , Zhang, Zhongpeng , Liu, Ying in Biometrics , Codec , Engineering

2025

With the development of machine learning, advanced photography and image transmission systems, images are being processed more and more by machines, so image coding for machines (ICM) came into being. After the image codec compresses and transmits the image, the image will be handed over to machine vision task networks. These vision tasks include image classification, semantic segmentation, and so on. We propose a side information-driven image coding for hybrid machine–human vision (SICMH) framework, not only for machine vision tasks, but also for human vision-oriented image reconstruction. The proposed SICMH framework can perform image classification, semantic segmentation, and coarse image reconstruction by using purely the side information. Moreover, SICMH can perform fine image reconstruction by using the residue information. In particular, we propose a multi-scale feature fusion block to enhance the usage of side information, and a novel semantic segmentation network named modified TrSeg to generate better semantic segmentation maps. The experimental results well demonstrated the effectiveness of our proposed framework. SICMH achieves the same image classification and semantic segmentation accuracy as the existing traditional or learning-based multi-task ICM frameworks using the lowest bitrate. For the image reconstruction task, the proposed SICMH achieved the same PSNR as existing learning-based multi-task hybrid ICM frameworks and the traditional image codec BPG again with the lowest bitrate.

Journal Article

Share this book

Add to My Shelf

Lossless Image Coding Using Non-MMSE Algorithms to Calculate Linear Prediction Coefficients

by Ulacha, Grzegorz , Łazoryszczak, Mirosław in Algorithms , Binary codes , Codec

2023

This paper presents a lossless image compression method with a fast decoding time and flexible adjustment of coder parameters affecting its implementation complexity. A comparison of several approaches for computing non-MMSE prediction coefficients with different levels of complexity was made. The data modeling stage of the proposed codec was based on linear (calculated by the non-MMSE method) and non-linear (complemented by a context-dependent constant component removal block) predictions. Prediction error coding uses a two-stage compression: an adaptive Golomb code and a binary arithmetic code. The proposed solution results in 30% shorter decoding times and a lower bit average than competing solutions (by 7.9% relative to the popular JPEG-LS codec).

Journal Article

Share this book

Add to My Shelf

Towards 360∘ image compression for machines via modulating pixel significance

by Zheng, Silin , Zhang, Qiudan , Shen, Xuelin in 1247: Recent Advances in AI-Powered Multimedia Visual Computing and Multimodal Signal Processing for Metaverse Era , Affine transformations , Codec

2024

The rapid growth of computer vision-based applications, including smart cities and autonomous driving, has created a pressing demand for efficient 360 ∘ image compression and computer vision analytics. In most circumstances, 360 ∘ image compression and computer vision face challenges arising from the oversampling inherent in the Equirectangular Projection (ERP). However, these two fields often employ divergent technological approaches. Since image compression aims to reduce redundancy, computer vision analytics attempts to compensate for the semantic distortion caused by the projection process, resulting in a potential conflict between the two objectives. This paper explores a potential route, i.e. 360 ∘ Image Coding for Machine (360-ICM), which offers an image processing framework that addresses both object deformation and oversampling redundancy within a unified framework. The key innovation lies in inferring a pixel-wise significant map by jointly considering the requirements of redundancy removal and object deformation offsetting. The significance map would be subsequently fed to a deformation-aware image compression network, guiding the bit allocation process as an external condition. More specifically, we employ a deformation-aware image compression network that is characterized by the Spatial Feature Transform (SFT) layer, which is capable of performing complex affine transformations of high-level semantic features, to be essential in dealing with the deformation. The image compression network and significance inference network are jointly trained under the supervision of a 360 ∘ image-specified object detection network, obtaining a compact representation that is both analytics-oriented and deformation-aware. Extensive experimental results have demonstrated the superiority of the proposed method over existing state-of-the-art image codecs in terms of rate-analytics performance.

Journal Article

Share this book

Add to My Shelf

An Integrated Approach for Lossless Image Compression Using CLAHE, Two-Channel Encoding and Adaptive Arithmetic Coding

by Kumar, P. R. Rajesh , Prabhakar, M. in Adaptive algorithms , Advances in Computational Approaches for Image Processing , Algorithms

2024

Lossless image compression techniques play a crucial role in preserving image quality while reducing storage space and transmission bandwidth. This paper proposes a novel hybrid integrated method for lossless image compression by combining Contrast Limited Adaptive Histogram Equalization (CLAHE), two-channel encoding, and adaptive arithmetic coding to achieve highly efficient compression without any loss of image information. The first step of the proposed approach involves applying Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance the local contrast of the image. This pre-processing step aids in reducing the entropy and increasing the redundancy in the image, creating a more favourable environment for subsequent compression algorithms. Next, the image is divided into two channels: one channel focuses on encoding essential structural information, while the other channel handles the finer details. This segregation leverages the inherent properties of images to improve compression efficiency. To achieve further compression gains, an adaptive arithmetic coding algorithm for encoding the data in each channel is utilized. Adaptive arithmetic coding adapts its probability model during the encoding process, leading to improved compression performance compared to traditional static coding methods. The proposed method offers significant potential in various applications, it is especially crucial in medical imaging, where large volumes of high-resolution images are generated during procedures such as MRI, CT scans, or digital pathology, transmitting high-quality images in resource-constrained environments, and facilitating image processing tasks requiring precise data preservation. CLAHE can be a valuable tool in medical imaging to enhance essential diagnostic information in medical images before compression. By improving contrast and visibility of structures, CLAHE may aid in achieving better compression efficiency and reduce the risk of introducing compression artifacts. To assess the effectiveness of our proposed method, comprehensive experiments are conducted on various benchmark image datasets. The performance evaluation parameters such as compressed image size, compression ratio, coding efficiency, compression gain and bit rate evaluated. The results demonstrate that the proposed approach achieves superior compression ratios while ensuring lossless reconstruction of the original image. The incorporation of CLAHE enhances the compression efficiency by exploiting local image characteristics, while two-channel encoding and adaptive arithmetic coding work synergistically to achieve high compression gains.

Journal Article

Share this book

Add to My Shelf

Evaluating the Coding Performance of 360° Image Projection Formats Using Objective Quality Metrics

by Choi, Seungcheol , Hussain, Ikram , Kwon, Oh-Jin in Codec , Coding standards , Efficiency

2021

Recently, 360° content has emerged as a new method for offering real-life interaction. Ultra-high resolution 360° content is mapped to the two-dimensional plane to adjust to the input of existing generic coding standards for transmission. Many formats have been proposed, and tremendous work is being done to investigate 360° videos in the Joint Video Exploration Team using projection-based coding. However, the standardization activities for quality assessment of 360° images are limited. In this study, we evaluate the coding performance of various projection formats, including recently-proposed formats adapting to the input of JPEG and JPEG 2000 content. We present an overview of the nine state-of-the-art formats considered in the evaluation. We also propose an evaluation framework for reducing the bias toward the native equi-rectangular (ERP) format. We consider the downsampled ERP image as the ground truth image. Firstly, format conversions are applied to the ERP image. Secondly, each converted image is subjected to the JPEG and JPEG 2000 image coding standards, then decoded and converted back to the downsampled ERP to find the coding gain of each format. The quality metrics designed for 360° content and conventional 2D metrics have been used for both end-to-end distortion measurement and codec level, in two subsampling modes, i.e., YUV (4:2:0 and 4:4:4). Our evaluation results prove that the hybrid equi-angular format and equatorial cylindrical format achieve better coding performance among the compared formats. Our work presents evidence to find the coding gain of these formats over ERP, which is useful for identifying the best image format for a future standard.

Journal Article

Share this book

Add to My Shelf

Improved JPEG Coding by Filtering 8 × 8 DCT Blocks

by Kwon, Oh-Jin , Iqbal, Yasir in Algorithms , Arithmetic coding , Buffers

2021

The JPEG format, consisting of a set of image compression techniques, is one of the most commonly used image coding standards for both lossy and lossless image encoding. In this format, various techniques are used to improve image transmission and storage. In the final step of lossy image coding, JPEG uses either arithmetic or Huffman entropy coding modes to further compress data processed by lossy compression. Both modes encode all the 8 × 8 DCT blocks without filtering empty ones. An end-of-block marker is coded for empty blocks, and these empty blocks cause an unnecessary increase in file size when they are stored with the rest of the data. In this paper, we propose a modified version of the JPEG entropy coding. In the proposed version, instead of storing an end-of-block code for empty blocks with the rest of the data, we store their location in a separate buffer and then compress the buffer with an efficient lossless method to achieve a higher compression ratio. The size of the additional buffer, which keeps the information of location for the empty and non-empty blocks, was considered during the calculation of bits per pixel for the test images. In image compression, peak signal-to-noise ratio versus bits per pixel has been a major measure for evaluating the coding performance. Experimental results indicate that the proposed modified algorithm achieves lower bits per pixel while retaining quality.

Journal Article

Share this book

Add to My Shelf

$Perceptual image hashing based on structural fractal features of image coding and ring partition$

Perceptual image hashing based on structural fractal features of image coding and ring partition

by He, HongJie , Fares, Khelaifi in Encryption , Feature extraction , Fractals

2020

Perceptual image hashing finds increasing attention in several multimedia security applications. However, reaching the trade-off balance between the two most important properties of image hashing− robustness and discrimination, still remains the most restive challenge in hashing schemes. In this study, a robust image hashing technique is proposed by incorporating ring partition and fractal image coding. The scheme starts by normalizing the image to help in extracting its local features. Then the concept of ring partition is introduced in order to make our hash rotation invariant by dividing the image into 5 different rings to form a secondary image that possesses the invariant property. Further, image coding is introduced by extracting the structural fractal features to exploit dimensionality reduction and compression, hence, generating a robust hash. To ensure the system’s security, encryption is performed on the generated fractal elements before the final hash construction. We conduct series of experiments to evaluate the performance of our scheme. The achieved result shows that our scheme is robust against several content-preserving attacks such as image rotation, JPEG compression, gamma correction, gaussian low pass filtering, image scaling, cropping, brightness adjustment and contrast adjustment. In addition, the receiver operating characteristics is used to show the discriminative capability and robustness of our scheme as compared to other state-of-art schemes in the literature.

Journal Article

Share this book

Add to My Shelf

Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness

by Lin, Chaoheng , Su, Minxian , Lei, Zhongyue in Accuracy , Adaptation , Adaptive algorithms

2023

This paper proposes a new algorithm for adaptive deep image compression (DIC) that can compress images for different purposes or contexts at different rates. The algorithm can compress images with semantic awareness, which means classification-related semantic features are better protected in lossy image compression. It builds on the existing conditional encoder-based DIC method and adds two features: a model-based rate-distortion-classification-perception (RDCP) framework to control the trade-off between rate and performance for different contexts, and a mechanism to generate coding conditions based on image complexity and semantic importance. The algorithm outperforms the QMAP2021 benchmark on the ImageNet dataset. Over the tested rate range, it improves the classification accuracy by 11% and the perceptual quality by 12.4%, 32%, and 1.3% on average for NIQE, LPIPS, and FSIM metrics, respectively.

Journal Article

Share this book

Add to My Shelf

A Sound-Image Coding Method Inspired by an Acousto-Optic Electronic Piano

by Qu, Shaocheng , Liu, Qi in Acousto-optics , Image coding , interactive design

2020

This paper firstly brings up a point that there seems to be a lack of a direct mapping bridge method for interconversion between sound and image, because many methods invented for scientific research on sound visualization are based on the sound characteristics of pitch, duration, loudness and timbre. Secondly, it designs an acousto-optic electronic piano which can possess both sound and color effects with the application of music visualization, and presents a sound-image coding method inspired by the previous design which can transform music note to picture pixel. Thirdly, it manages to transform a simple piece of music called \"I am a painting master\" into a corresponding picture with the mapping bridge method using Matlab software. More coding bits of the method should be added for complex music. The sound-image coding method exhibits a mapping bridge role for sound-image interactive design, which may help to find essential relationship between music and picture in mentality.

Journal Article

Share this book

Add to My Shelf

Computationally efficient wavelet-based low memory image coder for WMSNs/IoT

by Pinheiro, Antonio , Tausif, Mohd , Khan, Ekram in Algorithms , Artificial Intelligence , Body area networks

2023

This paper proposes a simple and efficient modification to the state-of-the-art zero memory set partitioned embedded block (ZM-SPECK) image coding algorithm to reduce its computational complexity without any significant increase in memory. It has been observed that comparing every element of blocks/sets with the current threshold in every bit-plane is one of the time-consuming process in the ZM-SPECK algorithm. The main contribution of this paper is to avoid this computationally complex process by using the magnitude of the largest coefficient in each subband, which is searched and stored while computing the DWT, prior to the coding. Moreover, the peak-signal-to-noise-ratio (PSNR) of the proposed technique is exactly the same as that obtained by ZM-SPECK. The simulation results show that the proposed method can reduce the complexity of ZM-SPECK by approximately 29% making it suitable for resource-constrained sensor nodes in wireless multimedia sensor networks (WMSNs), Internet of things (IoT), body area networks etc.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter