Catalogue Search | MBRL

Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection

by Tomoyuki Suzuki , Yoshimitsu Aoki in Accuracy , action recognition , Analysis

2022

Recently, Transformer-based video recognition models have achieved state-of-the-art results on major video recognition benchmarks. However, their high inference cost significantly limits research speed and practical use. In video compression, methods considering small motions and residuals that are less informative and assigning short code lengths to them (e.g., MPEG4) have successfully reduced the redundancy of videos. Inspired by this idea, we propose Informative Patch Selection (IPS), which efficiently reduces the inference cost by excluding redundant patches from the input of the Transformer-based video model. The redundancy of each patch is calculated from motions and residuals obtained while decoding a compressed video. The proposed method is simple and effective in that it can dynamically reduce the inference cost depending on the input without any policy model or additional loss term. Extensive experiments on action recognition demonstrated that our method could significantly improve the trade-off between the accuracy and inference cost of the Transformer-based video model. Although the method does not require any policy model or additional loss term, its performance approaches that of existing methods that do require them.

Journal Article

Share this book

Add to My Shelf

Fast-MFQE: A Fast Approach for Multi-Frame Quality Enhancement on Compressed Video

by Zeng, Huanqiang , Shen, Xueyuan , Chen, Kemi in Bandwidths , Coding standards , compressed video enhancement

2023

For compressed images and videos, quality enhancement is essential. Though there have been remarkable achievements related to deep learning, deep learning models are too large to apply to real-time tasks. Therefore, a fast multi-frame quality enhancement method for compressed video, named Fast-MFQE, is proposed to meet the requirement of video-quality enhancement for real-time applications. There are three main modules in this method. One is the image pre-processing building module (IPPB), which is used to reduce redundant information of input images. The second one is the spatio-temporal fusion attention (STFA) module. It is introduced to effectively merge temporal and spatial information of input video frames. The third one is the feature reconstruction network (FRN), which is developed to effectively reconstruct and enhance the spatio-temporal information. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods in terms of lightweight parameters, inference speed, and quality enhancement performance. Even at a resolution of 1080p, the Fast-MFQE achieves a remarkable inference speed of over 25 frames per second, while providing a PSNR increase of 19.6% on average when QP = 37.

Journal Article

Share this book

Add to My Shelf

Edge-Oriented Compressed Video Super-Resolution

by Wang, Zheng , Quan, Guancheng , He, Gang in compressed video super-resolution , Deep learning , edge-oriented

2023

Due to the proliferation of video data in Internet of Things (IoT) systems, in order to reduce the data burden, most social media platforms typically employ downsampling to reduce the resolution of high-resolution (HR) videos before video coding. Consequently, the loss of detail and the introduction of additional artifacts seriously compromise the quality of experience (QoE). Recently, the task of compressive video super-resolution (CVSR) has garnered significant attention, aiming to simultaneously eliminate compression artifacts and enhance the resolution of compressed videos. In this paper, we propose an edge-oriented compressed video super-resolution network (EOCVSR), which focuses on reconstructing higher-quality details, to effectively address the CVSR task. Firstly, we devised a motion-guided alignment module (MGAM) to achieve precise bi-direction motion compensation in a multi-scale manner. Secondly, we introduced an edge-oriented recurrent block (EORB) to reconstruct edge information by combining the merits of explicit and implicit edge extraction. In addition, benefiting from the recurrent structure, the receptive field of EOCVSR can be enhanced and the features can be effectively refined without introducing additional parameters. Extensive experiments conducted on benchmark datasets demonstrate that our method surpasses the performance of state-of-the-art (SOTA) approaches in both quantitative and qualitative evaluations. Our approach can provide users with high-quality and cost-effective HR videos by integrating with sensors and codecs.

Journal Article

Share this book

Add to My Shelf

PixRevive: Latent Feature Diffusion Model for Compressed Video Quality Enhancement

by Jing, Minge , Wang, Weiran , Weng, Wei in Algorithms , Coding standards , compressed video restoration

2024

In recent years, the rapid prevalence of high-definition video in Internet of Things (IoT) systems has been directly facilitated by advances in imaging sensor technology. To adapt to limited uplink bandwidth, most media platforms opt to compress videos to bitrate streams for transmission. However, this compression often leads to significant texture loss and artifacts, which severely degrade the Quality of Experience (QoE). We propose a latent feature diffusion model (LFDM) for compressed video quality enhancement, which comprises a compact edge latent feature prior network (ELPN) and a conditional noise prediction network (CNPN). Specifically, we first pre-train ELPNet to construct a latent feature space that captures rich detail information for representing sharpness latent variables. Second, we incorporate these latent variables into the prediction network to iteratively guide the generation direction, thus resolving the problem that the direct application of diffusion models to temporal prediction disrupts inter-frame dependencies, thereby completing the modeling of temporal correlations. Lastly, we innovatively develop a Grouped Domain Fusion module that effectively addresses the challenges of diffusion distortion caused by naive cross-domain information fusion. Comparative experiments on the MFQEv2 benchmark validate our algorithm’s superior performance in terms of both objective and subjective metrics. By integrating with codecs and image sensors, our method can provide higher video quality.

Journal Article

Share this book

Add to My Shelf

Video steganography: recent advances and challenges

by Subramanian, Nandhini , Al-Maadeed, Somaya , Bouridane, Ahmed in Computer Communication Networks , Computer Science , Data Structures and Information Theory

2023

Video steganography approach enables hiding chunks of secret information inside video sequences. The features of video sequences including high capacity as well as complex structure make them more preferable for choosing as cover media over other media such as image, text, or audio. Video steganography is a prominent as well as the evolving field in the information security domain and significant number of video steganography methods are proposed in recent years. This article provides a comprehensive review of video steganography methods proposed in the literature. This article initially reviews various raw domain-based video steganography methods. In particular, the raw domain-based methods include spatial domain approaches such as least significant bits (LSB), transform domain-based methods such as discrete wavelet transform, discrete cosine transform, etc. Furthermore, the article looks into various compressed domain steganography methods. A critical comparative analysis is included in the article to analyze and contrast the steganography methods proposed in the literature. A brief description of various evaluation matrices for video steganography methods is provided in this article. Moreover, a brief introduction to steganalysis and video steganalysis is provided. The article concludes with a discussion focused on the limitations and challenges of the video steganography methods. Further, a brief insight into future directions in video steganography systems is provided.

Journal Article

Share this book

Add to My Shelf

Compressed Video Sensing Based on Deep Generative Adversarial Network

by Azghani, Masoumeh , Nezhad, Valiyeh Ansarian , Marvasti, Farokh in Algorithms , Approximation , Circuits and Systems

2024

This paper considers the deep-learning-aided compressed video sensing problem. To this end, a deep generative adversarial network has been proposed to provide an approximation of the non-reference frame using its corresponding reference frame. The tests confirm the superiority of this scheme over the conventional methods used earlier. Furthermore, two scenarios have been suggested for deep compressed video sensing and recovery. In the first scenario, the difference between the non-reference frame and its approximation obtained from the pre-trained network is compressively sampled and transmitted to the receiver where the proposed residual reconstruction network is adopted to reconstruct the signal. The second scenario utilizes a pre-trained network followed by an augmented layer to approximate the non-reference frames. In the transmitter, the parameters of the augmented layer are trained for the current non-reference block. Instead of transmitting the samples of the block, the parameters of its trained augmented layer are sent to the receiver where the reconstruction is done using the same pre-trained network. The performances of the proposed scenarios demonstrate their objective and subjective superiority over the state-of-the-art algorithms in both the reconstruction quality and run time.

Journal Article

Share this book

Add to My Shelf

Study and investigation of video steganography over uncompressed and compressed domain: a comprehensive review

by Lad, Kalpesh , Patel, Rachna , Patel, Mukesh in Algorithms , Audio data , Bit error rate

2021

In the technological era, the primary source of information is in the form of digital data, which has to be secured while storing or transmitting during communication over an unsecured network. Different approaches are used to provide security to digital data, viz. text, audio, image, and video. This paper initially explains the security system such as cryptography, watermarking, and steganography and their comparative analysis based on different characteristics, viz. satisfaction level of objective, type of carrier object and secret information to be used, dependency of security level, and quality assessment parameters. This review article focuses more on steganography methods applied over video. The various methods implemented for video steganography in compressed domain, viz. inter-frame and intra-frame prediction, motion vector estimation, entropy coding (CAVLC and CABAC), and transformed and quantized coefficients of DCT, DST, and DWT, etc. and the methods based on spatial and transform domain for uncompressed video are briefly described. It is followed by the detailed analysis of related work done by various researchers in video steganography and the obtained experimental results. Furthermore, the confidential data hiding in compressed videos are explained using Moving Picture Expert Group (MPEG—1, MPEG—2, MPEG—4), Advanced Video Coding (AVC)/H.264, and High-Efficiency Video Coding (HEVC)/H.265 that includes both spatial and transform domain. This paper summarizes and explains the detailed investigations of numerous techniques of video steganography based on the comprehensive literature survey. The methods used to assess the performance of video steganography are analyzed based on the quality assessment parameters such as imperceptibility; measured by peak signal to noise ratio (PSNR), mean square error (MSE), and structural similarity (SSIM), robustness; measured by bit error rate (BER) and similarity (Sim), and embedding capacity; measured by hiding ratio. The overall review of past literature facilitates to have in-depth knowledge for upgrading the video steganography.

Journal Article

Share this book

Add to My Shelf

Dynamic gesture recognition based on compressed video for UAV control

by Yang, Yuchen in Aircraft control , Algorithms , Artificial neural networks

2023

As a multifunctional aircraft with small size, low cost and easy control, UAVs can be used in many fields such as gardening, plant protection, mapping, logistics, military, etc. For a more convenient human-machine interaction mode, the user interacts with the UAVs in the form of dynamic gestures. The current traditional approach is to train convolutional neural networks with images as direct inputs to the system for the purpose of controlling UAVs. However, this approach leaves the temporal representation information between images missing, resulting in poor training results or over-reliance on computing resources. Therefore, this paper proposes an efficient processing strategy, that is, for simple gesture tasks, the optimized frame extraction algorithm is adopted to process picture-based gesture recognition; for complex gesture tasks, compressed video is used as system input to complete video-based gesture recognition. Based on the training results in action recognition datasets, UCF-101 and HMDB-51, the efficiency of compressed video in gesture recognition tasks has been verified, which can be applied to UAV dynamic gesture control.

Journal Article

Share this book

Add to My Shelf

Convolutional neural network with spatio-temporal-channel attention for remote heart rate estimation

by Hu, Meng , Feng, Yuanjing , Li, Yongqiang in Artificial Intelligence , Artificial neural networks , Blood

2023

Remote photoplethysmography (rPPG), which measures human heart rate without physical contact with the skin, has become active research in recent years. Neural networks have been introduced into rPPG for accurate pulse measurement and have achieved overwhelming results. However, there is a lack of in-depth analysis of key components of neural networks exhibiting a crucial impact on pulse extraction from video. In this paper, we present a network with attention and spatio-temporal convolutional block (ASTNet), exploiting the impact of key factors including different spatio-temporal convolutions, attention mechanism, the number of convolutional layers, and receptive field sizes. The novel attention module named spatio-temporal-channel (STC) attention is designed to jointly learn weights in spatial, temporal, and channel dimensions in a more efficient way. Extensive experiments have been conducted on two uncompressed datasets and one compressed dataset. Results show that ASTNet outperforms state-of-the-art methods in accuracy and computational time. Specifically, networks with larger receptive field sizes and more spatio-temporal blocks generally achieve better performance. Networks with pseudo 3D convolution outperform those with convolutional 3D in static videos, and the opposite is true in motion videos. The results exhibit a similar tendency both on uncompressed and compressed datasets. The proposed method improves the performance of pulse signal compared to PhysNet (the second-best approach in the compared methods), with the signal-to-noise ratio increased by 7.03%, 10.19%, 4.79%, the mean absolute error decreased by 17.95%, 14.17%, 22.76%, and the root-mean-square error decreased by 21.43%, 2.73%, 25.43%, on the PURE, Self-rPPG, and COHFACE datasets, respectively.

Journal Article

Share this book

Add to My Shelf

Compressed video quality enhancement algorithm based on 3D-CNNs

by Liu, Pengyu , Wang, Sirong , Zhang, Yue in Algorithms , Artificial neural networks , Coding standards

2024

By exploring the current block-based lossy video coding process and compressed videos, this paper finds two unique characteristics namely quality fluctuation and pixel deficiency. And we use 3D convolutional neural network (3D-CNN) to make full use of the limited temporal and spatial information in compressed video and build compressed video quality enhancement network (CVQENet) to improve the compressed video quality. The experimental results show that compared with the videos encoded by High Efficiency Video Coding (HEVC/H.265), the mean value of the Peak Signal-to-Noise Ratio (PSNR) of enhanced videos has been improved by 0.4652 dB under Low Delay (LD) configuration with Quantization Parameter (QP) is set to 37.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter