Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
23 result(s) for "Mantiuk, Rafał K."
Sort by:
Overview and evaluation of the JPEG XT HDR image compression standard
Standards play an important role in providing a common set of specifications and allowing inter-operability between devices and systems. Until recently, no standard for high-dynamic-range (HDR) image coding had been adopted by the market, and HDR imaging relies on proprietary and vendor-specific formats which are unsuitable for storage or exchange of such images. To resolve this situation, the JPEG Committee is developing a new coding standard called JPEG XT that is backward compatible to the popular JPEG compression, allowing it to be implemented using standard 8-bit JPEG coding hardware or software. In this paper, we present design principles and technical details of JPEG XT. It is based on a two-layer design, a base layer containing a low-dynamic-range image accessible to legacy implementations, and an extension layer providing the full dynamic range. The paper introduces three of currently defined profiles in JPEG XT, each constraining the common decoder architecture to a subset of allowable configurations. We assess the coding efficiency of each profile extensively through subjective assessments, using 24 naïve subjects to evaluate 20 images, and objective evaluations, using 106 images with five different tone-mapping operators and at 100 different bit rates. The objective results (based on benchmarking with subjective scores) demonstrate that JPEG XT can encode HDR images at bit rates varying from 1.1 to 1.9 bit/pixel for estimated mean opinion score (MOS) values above 4.5 out of 5, which is considered as fully transparent in many applications. This corresponds to 23-times bitstream reduction compared to lossless OpenEXR PIZ compression.
Resolution limit of the eye — how many pixels can we see?
As large engineering efforts go towards improving the resolution of mobile, augmented reality (AR), and virtual reality (VR) displays, it is important to know the maximum resolution at which further improvements bring no noticeable benefit. This limit is often referred to as the retinal resolution, although the limiting factor may not necessarily be attributed to the retina. To determine the ultimate resolution at which an image appears sharp to our eyes with no perceivable blur, we created an experimental setup with a sliding display, which allows for continuous control of the resolution. The lack of such control was the main limitation of the previous studies. We measure achromatic (black-white) and chromatic (red-green and yellow-violet) resolution limits for foveal vision, and at two eccentricities (10° and 20°). Our results demonstrate that the resolution limit is higher than what was previously believed, reaching 94 pixels per degree (ppd) for foveal achromatic vision, 89 ppd for red-green patterns, and 53 ppd for yellow-violet patterns. We also observe a much larger drop in the resolution limit for chromatic patterns (red-green and yellow-violet) than for achromatic patterns. Our results provide quantitative benchmarks for display development, with implications for future imaging, rendering, and video coding technologies. This work determines the maximum display resolution that the human eye can discriminate. The limit was determined for achromatic and chromatic vision, for both the fovea and periphery.
Joint estimation of spectral illuminant and reflectance from an RGB image
This paper addresses the challenge of accurately estimating both the full illuminant spectral power distribution (SPD) and the per-pixel spectral reflectance from an RGB image captured with a known camera. Full spectral information allows us to perform more accurate white balance, or render the scene under another illuminant. Traditional color constancy methods focus on predicting the illuminant color as a 3-dimensional projection of the infinite-dimensional illuminant SPD onto the camera spectral sensitivity functions (SSFs) space. Because different illuminants can have the same projection in the 3-dimensional SSFs space, those traditional methods cannot differentiate between such illuminants (metamers in the camera response space) and hence remove the color cast resulting from the illumination. We reconstruct the spectral information using a neural network with two interconnected branches: one branch predicts the illuminant SPD, and the other reconstructs the per-pixel spectral reflectance by integrating the predicted SPD within its intermediate layers. Experimental results demonstrate that our framework achieves superior performance in estimating both the illuminant SPD and the per-pixel spectral reflectance compared to the previous approach.
A Neural Quality Metric for BRDF Models
Accurately evaluating the quality of bidirectional reflectance distribution function (BRDF) models is essential for photo-realistic rendering. Traditional BRDF-space metrics often employ numerical error measures that fail to capture perceptual differences evident in rendered images. In this paper, we introduce the first perceptually informed neural quality metric for BRDF evaluation that operates directly in BRDF space, eliminating the need for rendering during quality assessment. Our metric is implemented as a compact multi-layer perceptron (MLP), trained on a dataset of measured BRDFs supplemented with synthetically generated data and labelled using a perceptually validated image-space metric. The network takes as input paired samples of reference and approximated BRDFs and predicts their perceptual quality in terms of just-objectionable-difference (JOD) scores. We show that our neural metric achieves significantly higher correlation with human judgments than existing BRDF-space metrics. While its performance as a loss function for BRDF fitting remains limited, the proposed metric offers a perceptually grounded alternative for evaluating BRDF models.
Stereoscopic depth perception through foliage
Both humans and computational methods struggle to discriminate the depths of objects hidden beneath foliage. However, such discrimination becomes feasible when we combine computational optical synthetic aperture sensing with the human ability to fuse stereoscopic images. For object identification tasks, as required in search and rescue, wildlife observation, surveillance, and early wildfire detection, depth assists in differentiating true from false findings, such as people, animals, or vehicles vs. sun-heated patches at the ground level or in the tree crowns, or ground fires vs. tree trunks. We used video captured by a drone above dense woodland to test users’ ability to discriminate depth. We found that this is impossible when viewing monoscopic video and relying on motion parallax. The same was true with stereoscopic video because of the occlusions caused by foliage. However, when synthetic aperture sensing was used to reduce occlusions and disparity-scaled stereoscopic video was presented, whereas computational (stereoscopic matching) methods were unsuccessful, human observers successfully discriminated depth. This shows the potential of systems which exploit the synergy between computational methods and human vision to perform tasks that neither can perform alone.
Robust estimation of exposure ratios in multi-exposure image stacks
Merging multi-exposure image stacks into a high dynamic range (HDR) image requires knowledge of accurate exposure times. When exposure times are inaccurate, for example, when they are extracted from a camera's EXIF metadata, the reconstructed HDR images reveal banding artifacts at smooth gradients. To remedy this, we propose to estimate exposure ratios directly from the input images. We derive the exposure time estimation as an optimization problem, in which pixels are selected from pairs of exposures to minimize estimation error caused by camera noise. When pixel values are represented in the logarithmic domain, the problem can be solved efficiently using a linear solver. We demonstrate that the estimation can be easily made robust to pixel misalignment caused by camera or object motion by collecting pixels from multiple spatial tiles. The proposed automatic exposure estimation and alignment eliminates banding artifacts in popular datasets and is essential for applications that require physically accurate reconstructions, such as measuring the modulation transfer function of a display. The code for the method is available.
Resolution limit of the eye: how many pixels can we see?
As large engineering efforts go towards improving the resolution of mobile, AR and VR displays, it is important to know the maximum resolution at which further improvements bring no noticeable benefit. This limit is often referred to as the \"retinal resolution\", although the limiting factor may not necessarily be attributed to the retina. To determine the ultimate resolution at which an image appears sharp to our eyes with no perceivable blur, we created an experimental setup with a sliding display, which allows for continuous control of the resolution. The lack of such control was the main limitation of the previous studies. We measure achromatic (black-white) and chromatic (red-green and yellow-violet) resolution limits for foveal vision, and at two eccentricities (10 and 20 deg). Our results demonstrate that the resolution limit is higher than what was previously believed, reaching 94 pixels-per-degree (ppd) for foveal achromatic vision, 89 ppd for red-green patterns, and 53 ppd for yellow-violet patterns. We also observe a much larger drop in the resolution limit for chromatic patterns (red-green and yellow-violet) than for achromatic. Our results set the north star for display development, with implications for future imaging, rendering and video coding technologies.
Single-frame Regularization for Temporally Stable CNNs
Convolutional neural networks (CNNs) can model complicated non-linear relations between images. However, they are notoriously sensitive to small changes in the input. Most CNNs trained to describe image-to-image mappings generate temporally unstable results when applied to video sequences, leading to flickering artifacts and other inconsistencies over time. In order to use CNNs for video material, previous methods have relied on estimating dense frame-to-frame motion information (optical flow) in the training and/or the inference phase, or by exploring recurrent learning structures. We take a different approach to the problem, posing temporal stability as a regularization of the cost function. The regularization is formulated to account for different types of motion that can occur between frames, so that temporally stable CNNs can be trained without the need for video material or expensive motion estimation. The training can be performed as a fine-tuning operation, without architectural modifications of the CNN. Our evaluation shows that the training strategy leads to large improvements in temporal smoothness. Moreover, for small datasets the regularization can help in boosting the generalization performance to a much larger extent than what is possible with na\"ive augmentation strategies.
Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping
Many image enhancement or editing operations, such as forward and inverse tone mapping or color grading, do not have a unique solution, but instead a range of solutions, each representing a different style. Despite this, existing learning-based methods attempt to learn a unique mapping, disregarding this style. In this work, we show that information about the style can be distilled from collections of image pairs and encoded into a 2- or 3-dimensional vector. This gives us not only an efficient representation but also an interpretable latent space for editing the image style. We represent the global color mapping between a pair of images as a custom normalizing flow, conditioned on a polynomial basis of the pixel color. We show that such a network is more effective than PCA or VAE at encoding image style in low-dimensional space and lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement over the state-of-the-art methods.
HDR image reconstruction from a single exposure using deep CNNs
Camera sensors can only capture a limited range of luminance simultaneously, and in order to create high dynamic range (HDR) images a set of different exposures are typically combined. In this paper we address the problem of predicting information that have been lost in saturated image areas, in order to enable HDR reconstruction from a single exposure. We show that this problem is well-suited for deep learning algorithms, and propose a deep convolutional neural network (CNN) that is specifically designed taking into account the challenges in predicting HDR values. To train the CNN we gather a large dataset of HDR images, which we augment by simulating sensor saturation for a range of cameras. To further boost robustness, we pre-train the CNN on a simulated HDR dataset created from a subset of the MIT Places database. We demonstrate that our approach can reconstruct high-resolution visually convincing HDR results in a wide range of situations, and that it generalizes well to reconstruction of images captured with arbitrary and low-end cameras that use unknown camera response functions and post-processing. Furthermore, we compare to existing methods for HDR expansion, and show high quality results also for image based lighting. Finally, we evaluate the results in a subjective experiment performed on an HDR display. This shows that the reconstructed HDR images are visually convincing, with large improvements as compared to existing methods.