Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
231
result(s) for
"Chan, Kelvin C. K"
Sort by:
Exploiting Diffusion Prior for Real-World Image Super-Resolution
by
Yue, Zongsheng
,
Zhou, Shangchen
,
Loy, Chen Change
in
Computer vision
,
Controllability
,
Image resolution
2024
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR.
Journal Article
Temporally consistent video colorization with deep feature propagation and self-regularization learning
by
Zhao, Hengyuan
,
Qiao, Yu
,
Dong, Chao
in
Artificial Intelligence
,
Colorization
,
Computer Graphics
2024
Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization, and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfactory colorization. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization (TCVC) framework. TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the differences in predictions obtained using different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only provide visually pleasing colorized video, but also with clearly better temporal consistency than state-of-the-art methods. A video demo is provided at
https://www.youtube.com/watch?v=c7dczMs-olE
, while code is available at
https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization
.
Journal Article
Skin fungal community and its correlation with bacterial community of urban Chinese individuals
by
Chan, Kelvin C. K.
,
Lee, Patrick K. H.
,
Leung, Marcus H. Y.
in
Analysis
,
Arthrodermataceae - classification
,
Arthrodermataceae - genetics
2016
Background
High-throughput sequencing has led to increased insights into the human skin microbiome. Currently, the majority of skin microbiome investigations are limited to characterizing prokaryotic communities, and our understanding of the skin fungal community (mycobiome) is limited, more so for cohorts outside of the western hemisphere. Here, the skin mycobiome across healthy Chinese individuals in Hong Kong are characterized.
Results
Based on a curated fungal reference database designed for skin mycobiome analyses, previously documented common skin colonizers are also abundant and prevalent in this cohort. However, genera associated with local terrains, food, and medicine are also detected. Fungal community composition shows interpersonal (Bray-Curtis ANOSIM = 0.398) and household (Bray-Curtis ANOSIM = 0.134) clustering. Roles of gender and age on diversity analyses are test- and site-specific, and, contrary to bacteria, the effect of household on fungal community composition dissimilarity between samples is insignificant. Site-specific, cross-domain positive and negative correlations at both community and operational taxonomic unit levels may uncover potential relationships between fungi and bacteria on skin.
Conclusions
The studied Chinese population presents similar major fungal skin colonizers that are also common in western populations, but local outdoor environments and lifestyles may also contribute to mycobiomes of specific cohorts. Cohabitation plays an insignificant role in shaping mycobiome differences between individuals in this cohort. Increased understanding of fungal communities of non-western cohorts will contribute to understanding the size of the global skin pan-mycobiome, which will ultimately help understand relationships between environmental exposures, microbial populations, and the health of global humans.
Journal Article
A Convex Model for Edge-Histogram Specification with Applications to Edge-Preserving Smoothing
by
Nikolova, Mila
,
Chan, Kelvin C. K.
,
Chan, Raymond H.
in
Algorithms
,
Constraint modelling
,
Convexity
2018
The goal of edge-histogram specification is to find an image whose edge image has a histogram that matches a given edge-histogram as much as possible. Mignotte has proposed a non-convex model for the problem in 2012. In his work, edge magnitudes of an input image are first modified by histogram specification to match the given edge-histogram. Then, a non-convex model is minimized to find an output image whose edge-histogram matches the modified edge-histogram. The non-convexity of the model hinders the computations and the inclusion of useful constraints such as the dynamic range constraint. In this paper, instead of considering edge magnitudes, we directly consider the image gradients and propose a convex model based on them. Furthermore, we include additional constraints in our model based on different applications. The convexity of our model allows us to compute the output image efficiently using either Alternating Direction Method of Multipliers or Fast Iterative Shrinkage-Thresholding Algorithm. We consider several applications in edge-preserving smoothing including image abstraction, edge extraction, details exaggeration, and documents scan-through removal. Numerical results are given to illustrate that our method successfully produces decent results efficiently.
Journal Article
Collaborative Diffusion for Multi-Modal Face Generation and Editing
2023
Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.
ReVersion: Diffusion-Based Relation Inversion from Images
2024
Diffusion models gain increasing popularity for their generative capabilities. Recently, there have been surging needs to generate customized images by inverting diffusion models from exemplar images, and existing inversion methods mainly focus on capturing object appearances (i.e., the \"look\"). However, how to invert object relations, another important pillar in the visual world, remains unexplored. In this work, we propose the Relation Inversion task, which aims to learn a specific relation (represented as \"relation prompt\") from exemplar images. Specifically, we learn a relation prompt with a frozen pre-trained text-to-image diffusion model. The learned relation prompt can then be applied to generate relation-specific images with new objects, backgrounds, and styles. To tackle the Relation Inversion task, we propose the ReVersion Framework. Specifically, we propose a novel \"relation-steering contrastive learning\" scheme to steer the relation prompt towards relation-dense regions, and disentangle it away from object appearances. We further devise \"relation-focal importance sampling\" to emphasize high-level interactions over low-level appearances (e.g., texture, color). To comprehensively evaluate this new task, we contribute the ReVersion Benchmark, which provides various exemplar images with diverse relations. Extensive experiments validate the superiority of our approach over existing methods across a wide range of visual relations. Our proposed task and method could be good inspirations for future research in various domains like generative inversion, few-shot learning, and visual relation detection.
Exploring CLIP for Assessing the Look and Feel of Images
by
Chen Change Loy
,
Wang, Jianyi
,
Chan, Kelvin C K
in
Computer vision
,
Image quality
,
Mathematical models
2022
Measuring the perception of visual content is a long-standing problem in computer vision. Many mathematical models have been developed to evaluate the look or quality of an image. Despite the effectiveness of such tools in quantifying degradations such as noise and blurriness levels, such quantification is loosely coupled with human language. When it comes to more abstract perception about the feel of visual content, existing methods can only rely on supervised models that are explicitly trained with labeled data collected via laborious user study. In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. In particular, we discuss effective prompt designs and show an effective prompt pairing strategy to harness the prior. We also provide extensive experiments on controlled datasets and Image Quality Assessment (IQA) benchmarks. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments. Code is avaliable at https://github.com/IceClear/CLIP-IQA.
Exploiting Diffusion Prior for Real-World Image Super-Resolution
2024
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR.
Effective Adapter for Face Recognition in the Wild
2024
In this paper, we tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions. Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains. To overcome these issues, we propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets. The key of our adapter is to process both the unrefined and enhanced images using two similar structures, one fixed and the other trainable. Such design can confer two benefits. First, the dual-input system minimizes the domain gap while providing varied perspectives for the face recognition model, where the enhanced image can be regarded as a complex non-linear transformation of the original one by the restoration model. Second, both two similar structures can be initialized by the pre-trained models without dropping the past knowledge. The extensive experiments in zero-shot settings show the effectiveness of our method by surpassing baselines of about 3%, 4%, and 7% in three datasets. Our code will be publicly available.
Dual Associated Encoder for Face Restoration
2024
Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild. The existing codebook prior mitigates the ill-posedness by leveraging an autoencoder and learned codebook of high-quality (HQ) features, achieving remarkable quality. However, existing approaches in this paradigm frequently depend on a single encoder pre-trained on HQ data for restoring HQ images, disregarding the domain gap between LQ and HQ images. As a result, the encoding of LQ inputs may be insufficient, resulting in suboptimal performance. To tackle this problem, we propose a novel dual-branch framework named DAEFR. Our method introduces an auxiliary LQ branch that extracts crucial information from the LQ inputs. Additionally, we incorporate association training to promote effective synergy between the two branches, enhancing code prediction and output quality. We evaluate the effectiveness of DAEFR on both synthetic and real-world datasets, demonstrating its superior performance in restoring facial details. Project page: https://liagm.github.io/DAEFR/