Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
78
result(s) for
"Zhang, Kaibing"
Sort by:
SECANet: A structure‐enhanced attention network with dual‐domain contrastive learning for scene text image super‐resolution
by
Zhang, Hui
,
He, Xin
,
Zhang, Yuhong
in
Design
,
dual‐domain contrastive learning
,
Image enhancement
2023
In this letter, we developed novel Structure Enhanced Channel Attention Network (SECANet) for scene text image super‐resolution (STISR). The newly proposed SECANet integrates a group of Structure‐Enhanced Attention Modules to focus more on both local and global structural features in the character regions of text images. Moreover, we elaborately formulate a Dual‐Domain Contrastive Learning framework that integrates one pixel‐level contrastive loss and the other semantic‐level contrastive loss to jointly optimize the SECANet for generating more visually pleasing yet better recognizable high‐quality SR images without introducing any additional prior generators in both the training and testing stages, showing promising computational efficiency. Experimental results on the Textzoom dataset indicate that our method can achieve both decent performance in super‐resolving more impressive scene text images from low‐resolution ones and better recognition accuracy than other competitors. The proposed structure enhanced channel attention network assembles a group of structure‐enhanced attention blocks to learn both global and local structure features for the detailed recovery of scene text images. Moreover, a joint dual‐domain contrastive loss function is formulated to optimize the model parameters, benefiting to synthesizing more recognizable text images.
Journal Article
Deformable channel non‐local network for crowd counting
2023
Both global dependency and local correlation are crucial for solving the scale variation of crowd. However, most of previous methods fail to take two factors into consideration simultaneously. Against the aforementioned issue, a deformable channel non‐local network, abbreviated as DCNLNet for crowd counting, which can simultaneously learn global context information and adaptive local receptive field is proposed. Specifically, the proposed DCNLNet consists of two well‐crafted designed modules: deformable channel non‐local block (DCNL) and spatial attention feature fusion block (SAFF). The DCNL encodes long‐range dependencies between pixels and the adaptive local correlation with channel non‐local and deformable convolution, respectively, benefiting for improving the spatial discrimination of features. While the SAFF aims to aggregate the cross‐level information, which interacts these features from different depths and learns specific weights for the feature maps with spatial attention. Extensive experiments are performed on three crowd counting benchmark datasets and experimental results indicate that the proposed DCNLNet achieves compelling performance compared to other representative counting models. In the letter, a deformable channel non‐local network (DCNLNet) has been proposed for crowd counting. In order to explore global and local information, we develop a deformable channel non‐local module, which contains two branches, deformable convolution branch and channel non‐local branch, to learn adaptive local correlation and long‐range dependency. Moreover, we introduce a spatial attention feature fusion module to aggregate cross‐level features obtained from the encoder and the decoder.
Journal Article
Cross-range self-attention single hyperspectral image super-resolution method based on U-Net architecture
2025
Hyperspectral image super-resolution (HSI-SR) aims to reconstruct high-resolution hyperspectral images from low-resolution inputs, which is particularly challenging due to the high dimensionality and complex spatial-spectral correlations inherent in hyperspectral data. While attention-based approaches have shown potential by enhancing contextual feature extraction, most existing methods either focus on local neighborhoods or treat spatial and spectral information independently, leading to limited modeling of long-range dependencies and suboptimal multi-scale feature fusion. Furthermore, conventional U-Net architectures are not tailored to address the unique redundancies and limited training data often observed in hyperspectral imaging tasks. To address these issues, we propose Cs_Unet, a cross-range self-attention approach for single HSI-SR built upon the U-Net framework. The core of proposed model integrates cross-range spatial self-attention (CSA) and cross-range spectral self-attention (CSE) to explicitly capture associations between distant spatial locations and spectral bands. Our cross-range spatial-spectral self-attention interaction (CAI) module processes spatial and spectral features in parallel and fuses them for improved image reconstruction. Additionally, we incorporate a cross-range grouped convolution upsampling (GCUc) module at the top of the U-Net to enhance information flow and leverage progressive upsampling. By embedding these modules within the U-Net architecture, our model achieves effective multi-scale feature fusion and exploits both global context and fine-grained details. Extensive experiments demonstrate that Cs_Unet outperforms other competitors in terms of both visual fidelity and quantitative metrics.
Journal Article
PRC-Light YOLO: An Efficient Lightweight Model for Fabric Defect Detection
2024
Defect detection holds significant importance in improving the overall quality of fabric manufacturing. To improve the effectiveness and accuracy of fabric defect detection, we propose the PRC-Light YOLO model for fabric defect detection and establish a detection system. Firstly, we have improved YOLOv7 by integrating new convolution operators into the Extended-Efficient Layer Aggregation Network for optimized feature extraction, reducing computations while capturing spatial features effectively. Secondly, to enhance the performance of the feature fusion network, we use Receptive Field Block as the feature pyramid of YOLOv7 and introduce Content-Aware ReAssembly of FEatures as upsampling operators for PRC-Light YOLO. By generating real-time adaptive convolution kernels, this module extends the receptive field, thereby gathering vital information from contexts with richer content. To further optimize the efficiency of model training, we apply the HardSwish activation function. Additionally, the bounding box loss function adopts the Wise-IOU v3, which incorporates a dynamic non-monotonic focusing mechanism that mitigates adverse gradients from low-quality instances. Finally, in order to enhance the PRC-Light YOLO model’s generalization ability, we apply data augmentation techniques to the fabric dataset. In comparison to the YOLOv7 model, multiple experiments indicate that our proposed fabric defect detection model exhibits a decrease of 18.03% in model parameters and 20.53% in computational load. At the same time, it has a notable 7.6% improvement in mAP.
Journal Article
DenseConv for 3D point set upsampling
2021
Recently, a data‐driven approach from point cloud upsampling network (PU‐Net) has been used to upsample point set, making it from sparse to dense. However, PU‐Net first needs to downsample the point set before upsampling, and extract the point features of fixed‐level in the sampling area. The local features are not enough to extract, and then restoring its global representation will lead to point cloud coordinates to be inaccurate. In view of the insufficient local feature extraction, EdgeConv module is utilised to merge the points and the information in the local neighbourhood. In addition, a dense connection is proposed to maintain the effective transmission of information across the entire network. In this letter, EdgeConv and dense connection are combined into DenseConv that are nested for recursive use on PU‐Net to ensure the accuracy of the generation of point cloud. The experimental results show that the average error of the proposed method is reduced by about 5% on the PU‐Net dataset and about 10% on the ModelNet10 dataset, which verifies the effectiveness of the proposed method.
Journal Article
Image recognition method of cashmere and wool based on SVM-RFE selection with three types of features
2025
Cashmere and wool fibers are important raw materials in the textile industry, but their similar morphological structures make accurate distinctions challenging. Image preprocessing methods will cause some damage to the fiber contours, resulting in the loss of feature information. The keypoint features that do not require image preprocessing are added to the library of morphological and texture features. At the same time, existing methods of feature selection often ignore the relation between features and the classifier. Therefore, we propose a novel feature selection method with support vector machine-recursive feature elimination (SVM-RFE). The SVM-RFE method recursively removes the features of the least contribution to SVM classification, ultimately generating the optimal feature set. Our approach achieves a recognition accuracy of 98.06%, which is 8.34% higher than the traditional two-feature method and 6.12% higher than the three-feature method, both without feature selection. Experimental results demonstrate that keypoint features effectively compensate for the information loss caused by image preprocessing, while the SVM-RFE feature selection method can select the optimal feature subset relevant to the classifier so as to accurately distinguish cashmere and wool fibers.
Journal Article
Pseudo-label growth dictionary pair learning for crowd counting
2021
Crowd counting has received increasing attention in the field of video surveillance and urban security system. However, many previous models are prone to poor generalization capability to unknown samples when limited labeled samples are available. To improve or mitigate the above weakness, we develop a novel Pseudo-label Growth Dictionary Pair Learning (PG-DPL) method for crowd counting. To be exact, we treat crowd counting as a task of classification and leverage dictionary learning-based (DL) strategy to target the task. Considering that being short of diverse training samples and imbalanced distribution across different classes in crowd scene inevitably result in large prediction deviation caused by the DL model, we propose to apply pseudo-label growth (PG) and adaptive dictionary size (ADS) to improve the accuracy of crowd counting with limited labeled samples. In the proposed method, PG optimizes the initial prediction via reconstructing the discriminant term to improve the robustness of learned dictionary, while ADS explores the imbalanced distribution among different classes to adapt to the size of class-specific dictionary. Extensive validation experiments on five benchmark databases indicate that the proposed PG-DPL can achieve compelling performance compared to other state-of-the-art methods.
Journal Article
Learning stacking regressors for single image super-resolution
by
Zhang Kaibing
,
Xiong Zenggang
,
Li, Minqi
in
Feature extraction
,
Image enhancement
,
Image resolution
2020
Example learning-based single image super-resolution (SR) technique has been widely recognized for its effectiveness in restoring a high-resolution (HR) image with finer details from a given low-resolution (LR) input. However, most popular approaches only choose one type of image features to learn the mapping relationship between LR and HR images, making it difficult to fit into the diversity of different natural images. In this paper, we propose a novel stacking learning-based SR framework by extracting both the gradient features and the texture features of images simultaneously to train two complementary models. Since the gradient features are helpful to represent the edge structures while the texture features are beneficial to restore the texture details, the newly proposed method cleverly combines the merits of two complementary features and makes the resultant HR images more faithful to their original counterparts. Moreover, we enhance the SR capacity by using a residual cascaded scheme to further reduce the gap between the super-resolved images and the corresponding original images. Experimental results carried out on seven benchmark datasets indicate that the proposed SR framework performs better than other seven state-of-the-art SR methods in both quantitative and qualitative quality assessments.
Journal Article
Modeling optimization for a typical VOCs thermal conversion process
2023
Aiming at the current environmental problems, the thermal oxidation treatment for industrial VOCs emission is a common and effective measure. This paper studies on the optimization effect of one optimization method for direct VOCs thermal oxidation of a color aluminum spraying production line based on Aspen-Plus. According to the direct VOCs thermal oxidation process with a 30000 m³/h circulating air volume, propose the flue gas reflux and coating room drainage technology. Use the second law of thermodynamics, and the exergy flow analysis shows the methane consumption could be reduced 12%. Carbon emissions also decreased significantly, with 3.42% reduction. These findings are practical for industrial production cost saving and environmental protection problems solving.
Journal Article
Manifold transfer subspace learning based on double relaxed discriminative regression
2023
By leveraging the labeled data samples of the source domain to learn the unlabeled data samples of the target domain, unsupervised domain adaptation (DA) has achieved promising performance. However, it is still a vital problem for unsupervised domain adaptation to deal with cross-domain distribution mismatch. Therefore, we present a new model framework for cross-domain image classification in the paper, which is termed manifold transfer subspace learning based on double relaxed discriminative regression (MTSL-DRDR). First, the global geometry information of the samples from the source and target domain can be preserved by utilizing the low-rank constraint. Second, the two transformation projections are employed to project both domains to a unified subspace, in which each data sample of the target domain can be represented by some samples from the source domain with the sparse and low-rank coefficient matrix. Third, the local structure information of the data points with the same semantics from the different domains is preserved by means of the adaptive weight graph based on the low-rank coefficient matrix. Last, for fully use the discriminative information of data from the source domain, the discriminant information of the source domain based on intra-class and inter-class graphs is encoded to the target domain. Our MTSL-DRDR algorithm is evaluated on challenging benchmark datasets, and a large number of experiment results show the superiority of the proposed method.
Journal Article