Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
34,715
result(s) for
"Video compression."
Sort by:
Image and video compression : fundamentals, techniques, and applications
\"Preface This book is intended primarily for courses in image compression techniques for undergraduate through postgraduate students, research scholars, and engineers working in the field. It presents the basic concepts and technologies in a student-friendly manner. The major techniques in image compression are explained with informative illustrations, and the concepts are evolved from the basics. Practical implementation is demonstrated with MATLAB
Handbook of image and video processing
2005
55% new material in the latest edition of this \"must-have for students and practitioners of image & video processing!This Handbook is intended to serve as the basic reference point on image and video processing, in the field, in the research laboratory, and in the classroom.
Masked Feature Residual Coding for Neural Video Compression
2025
In neural video compression, an approximation of the target frame is predicted, and a mask is subsequently applied to it. Then, the masked predicted frame is subtracted from the target frame and fed into the encoder along with the conditional information. However, this structure has two limitations. First, in the pixel domain, even if the mask is perfectly predicted, the residuals cannot be significantly reduced. Second, reconstructed features with abundant temporal context information cannot be used as references for compressing the next frame. To address these problems, we propose Conditional Masked Feature Residual (CMFR) Coding. We extract features from the target frame and the predicted features using neural networks. Then, we predict the mask and subtract the masked predicted features from the target features. Thereafter, the difference is fed into the encoder with the conditional information. Moreover, to more effectively remove conditional information from the target frame, we introduce a Scaled Feature Fusion (SFF) module. In addition, we introduce a Motion Refiner to enhance the quality of the decoded optical flow. Experimental results show that our model achieves an 11.76% bit saving over the model without the proposed methods, averaged over all HEVC test sequences, demonstrating the effectiveness of the proposed methods.
Journal Article
Efficient Video Compression Using Afterimage Representation
2024
Recent advancements in large-scale video data have highlighted the growing need for efficient data compression techniques to enhance video processing performance. In this paper, we propose an afterimage-based video compression method that significantly reduces video data volume while maintaining analytical performance. The proposed approach utilizes optical flow to adaptively select the number of keyframes based on scene complexity, optimizing compression efficiency. Additionally, object movement masks extracted from keyframes are accumulated over time using alpha blending to generate the final afterimage. Experiments on the UCF-Crime dataset demonstrated that the proposed method achieved a 95.97% compression ratio. In binary classification experiments on normal/abnormal behaviors, the compressed videos maintained performance comparable to the original videos, while in multi-class classification, they outperformed the originals. Notably, classification experiments focused exclusively on abnormal behaviors exhibited a significant 4.25% improvement in performance. Moreover, further experiments showed that large language models (LLMs) can interpret the temporal context of original videos from single afterimages. These findings confirm that the proposed afterimage-based compression technique effectively preserves spatiotemporal information while significantly reducing data size.
Journal Article
End-to-end video compression for surveillance and conference videos
by
Wang, Shenhao
,
Gao, Han
,
Li, Shuai
in
1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
,
Codec
,
Compressive strength
2022
The storage and transmission tasks of surveillance and conference videos are an important branch of video compression. Since surveillance and conference videos have strong inter-frame correlation, considerable continuity at the image level and motion level between the consecutive frames exists. However, traditional video codec networks cannot fully use the characteristics of surveillance and conference videos during compression. Therefore, based on the DVC video codec framework, we propose a “MV residual + MV optimization” coding strategy for surveillance and conference videos to further reduce the compression rate and improve the quality of compressed video frames. During the testing stage, the online update strategy is promoted, which adapts the network’s parameters to different surveillance and conference videos. Our contribution is to propose an optical flow residual coding method for videos with strong inter-frame correlation, implement optical flow optimization at decoding end and online update strategy at the encoding end. Experiments show that our method can outperform DVC framework, especially on CUHK Square surveillance video with 1.2dB improvement.
Journal Article
Ultra-Low Bitrate Predictive Portrait Video Compression with Diffusion Models
2025
Deep neural video compression codecs have shown great promise in recent years. However, there are still considerable challenges for ultra-low bitrate video coding. Inspired by recent diffusion models for image and video compression attempts, we attempt to leverage diffusion models for ultra-low bitrate portrait video compression. In this paper, we propose a predictive portrait video compression method that leverages the temporal prediction capabilities of diffusion models. Specifically, we develop a temporal diffusion predictor based on a conditional latent diffusion model, with the predicted results serving as decoded frames. We symmetrically integrate a temporal diffusion predictor at the encoding and decoding side, respectively. When the perceptual quality of the predicted results in encoding end falls below a predefined threshold, a new frame sequence is employed for prediction. While the predictor at the decoding side directly generates predicted frames as reconstruction based on the evaluation results. This symmetry ensures that the prediction frames generated at the decoding end are consistent with those at the encoding end. We also design an adaptive coding strategy that incorporates frame quality assessment and adaptive keyframe control. To ensure consistent quality of subsequent predicted frames and achieve high perceptual reconstruction, this strategy dynamically evaluates the visual quality of the predicted results during encoding, retains the predicted frames that meet the quality threshold, and adaptively adjusts the length of the keyframe sequence based on motion complexity. The experimental results demonstrate that, compared with the traditional video codecs and other popular methods, the proposed scheme provides superior compression performance at ultra-low bitrates while maintaining competitiveness in visual effects, achieving more than 24% bitrate savings compared with VVC in terms of perceptual distortion.
Journal Article
Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations
2021
Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.
Journal Article
Model-based portrait video compression with spatial constraint and adaptive pose processing
2024
Motion model based video coding approach, which employs sparse sets of keypoints instead of dense optical flows, can efficiently compress videos at ultra-low bitrates. Such schemes obtain notable performance gains over traditional video codecs in face-centric scenarios, such as video conferencing. However, due to the high complexity of human poses, there is still a lack of research on motion model based human body video coding, especially in the case of large pose variations. In order to overcome this limitation, we present a thin-plate spline motion model based portrait video compression framework oriented to adaptive pose processing. Firstly a more flexible thin-plate spline transformation rather than simple affine transformation is adopted for motion estimation, since the nonlinear property allows representing more complex motions. Meanwhile, spatial constraints are incorporated into the keypoint detector to generate keypoints that are more consistent with the human poses, thus obtaining more accurate optical flow. In addition, a motion intensity evaluation module is designed at the encoder side to dynamically evaluate the inter-frame motion intensity. Adaptive Reference Frame Selection algorithm is then further devised at the decoder side to adaptively select the reconstruction scheme for different intensities of portrait motion. Finally, a multi-frame reconstruction module is introduced for large pose variations to improve the consistency of human pose and subjective quality. The experimental results demonstrate that compared to the state-of-the-art video coding standard Versatile Video Coding and existing motion model based compression techniques, our proposed scheme can better cope with large pose variation scenarios and outperforms in both objective and subjective quality at the similar bitrate with higher temporal consistency.
Journal Article
Modified Hilbert Curve for Rectangles and Cuboids and Its Application in Entropy Coding for Image and Video Compression
2021
In our previous work, by combining the Hilbert scan with the symbol grouping method, efficient run-length-based entropy coding was developed, and high-efficiency image compression algorithms based on the entropy coding were obtained. However, the 2-D Hilbert curves, which are a critical part of the above-mentioned entropy coding, are defined on squares with the side length being the powers of 2, i.e., 2n, while a subband is normally a rectangle of arbitrary sizes. It is not straightforward to modify the Hilbert curve from squares of side lengths of 2n to an arbitrary rectangle. In this short article, we provide the details of constructing the modified 2-D Hilbert curve of arbitrary rectangle sizes. Furthermore, we extend the method from a 2-D rectangle to a 3-D cuboid. The 3-D modified Hilbert curves are used in a novel 3-D transform video compression algorithm that employs the run-length-based entropy coding. Additionally, the modified 2-D and 3-D Hilbert curves introduced in this short article could be useful for some unknown applications in the future.
Journal Article
Information Bottleneck Driven Deep Video Compression—IBOpenDVCW
2024
Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information and mutual information across various mother wavelets and decomposition levels. Additionally, we replace the conventional average pooling layers with a discrete wavelet transform creating more advanced pooling methods to investigate their effects on information and mutual information. Our results demonstrate that the proposed model and training technique outperform existing state-of-the-art video compression methods, delivering competitive rate-distortion performance compared to the AVC/H.264 and HEVC/H.265 codecs.
Journal Article