Catalogue Search | MBRL

Distributed Method for the Backup of Massive Unstructured Data

بواسطة Yang, Bo , Duan, Junhong في Algorithms , Back up systems , Physics

2021

In order to solve the existing problems for the backup of massive unstructured data with its performance bottleneck of single server, in this paper we propose a distributed method for the backup of massive unstructured data. In the backup method, through using a combination of load balancing algorithm and distributed algorithm, the backup system assign task of backup to each production server. Multiple servers backup massive unstructured data stored in shared storage to the backend server. The backup system can make full use of the performance resources of servers in environment production, reduce the single server performance bottlenecks, to avoid the single server hunger, increased the speed of backup.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Application of Deep Learning-Based Image Compression Restoration Technology in Power System Unstructured Data Management

بواسطة Liu, Zihan , Zha, Junjie , Shan, Xinwen في Algorithms , Computer science , Data management

2025

In power-system unstructured-data management, a large volume of images from inspection drones, substation cameras, and smart meters is heavily compressed due to bandwidth and storage constraints, resulting in lower resolution that hinders defect detection and maintenance decisions. Although deep-learning super-resolution (SR) techniques have made significant advances, real-world deployments still require a balance between reconstruction accuracy and model lightweightness. To meet this need, we introduce a channel-attention-embedded Transformer SR method (CAET). The approach adaptively injects channel attention into both the Transformer’s global features and the convolutional local features, harnessing their complementary strengths while dynamically enhancing critical information. Tested on five public datasets and compared with six representative algorithms, CAET achieves the best or second-best performance across all upscaling factors; at 4× enlargement, it outperforms the advanced SwinIR method by 0.09 dB in PSNR on Urban100 and by 0.30 dB on Manga109, with noticeably improved visual quality. Experiments demonstrate that CAET delivers high-precision, low-latency restoration of compressed images for the power sector while keeping model complexity low.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Usability enhancement model for unstructured text in big data

بواسطة Akbar, Rehan , Wang, Khor Siak , Adnan, Kiran في Attitudes , Big Data , Confidentiality

2023

The task of insights extraction from unstructured text poses significant challenges for big data analytics because it contains subjective intentions, different contextual perspectives, and information about the surrounding real world. The technical and conceptual complexities of unstructured text degrade its usability for analytics. Unlike structured data, the existing literature lacks solutions to address the usability of unstructured text big data. A usability enhancement model has been developed to address this research gap, incorporating various usability dimensions, determinants, and rules as key components. This paper adopted Delphi technique to validate the usability enhancement model to ensure its correctness, confidentiality, and reliability. The primary goal of model validation is to assess the external validity and suitability of the model through domain experts and professionals. Therefore, the subject matter experts of industry and academia from different countries were invited to this Delphi, which provides more reliable and extensive opinions. A multistep iterative process of Knowledge Resource Nomination Worksheet (KRNW) has been adopted for expert identification and selection. Average Percent of Majority Opinions (APMO) method has been used to produce the cut-off rate to determine the consensus achievement. The consensus was not achieved after the first round of Delphi, whereas APMO cut-off rate was 70.9%. The model has been improved based on the opinions of 10 subject matter experts. After second round, the analysis has shown majority agreement for the revised model and consensus achievement for all improvements that validate the improved usability enhancement model. The final proposed model provides a systematic and structured approach to enhance the usability of unstructured text big data. The outcome of the research is significant for researchers and data analysts.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Big Data Management Using Hadoop

بواسطة Hamad, Murtadha M. , khalil, Majida yaseen في Algorithms , Big Data , Computer networks

2021

Today, one of the key issues is the design of systems and software to deal with the storage, management and processing of large amounts of data as a result of the exponential rise in data. In unstructured forms, these data are found. Due to the large and complex data sizes, data management with traditional approaches is unacceptable. Hadoop is an appropriate solution for the continuous growth of data sizes. We have suggested in this paper techniques and algorithms dealing with big data including data collection, preprocessing of data. The Fragmentation algorithm will take the function of a distributed implementation of the traditional file system time-sharing model, where various users share files and storage resources. Also, in this research we used a framework to improve the performance of a query and reduce the response time called the HADOOP. The Apache Hadoop project for safe, scalable and distributed computing. The results showed that Hadoop is the best way to deal with big data during calculating the rate of response time of a complex query for example at (00:00:01) per second and comparing it with the response time of the same queries on the fragmentation algorithm at (00: 01:11) per second and the standard database at (00:05:13) per second. We concluded that Total time Access for complex queries in distributed processing is faster than in non-distributed processing.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

HCAT: Advancing Unstructured Healthcare Data Analysis Through Hierarchical and Context-Aware Mechanisms

بواسطة Mir, Mohammad Shuaib , Bhutani, Monica , Onn, Choo Wou في Accuracy , Architecture , Artificial intelligence

2025

To that end, this study presents the Hierarchical Context-Aware Transformer (HCAT), a new model to perform analysis on unstructured healthcare data that resolves significant problems related to medical text. In the proposed model, the hierarchical structure of the system is integrated with the context-sensitive mechanisms to process the healthcare documents at sentence level and document levels. HCAT complies with domain knowledge by a specific attention module and uses a detailed loss function that focuses on classification accuracy besides encouraging domain adaptation. The quantitative experiment shows that HCAT is a better choice than Bi-LSTM and BERT for sentence representation. The model attains 92.30% test accuracy on medical text classification, conversing with high computational efficiency; batch processing time is about 150ms, while the memory consumed is 320 MB. The proposed architecture for clinical text representation facilitates the incorporation of long-range dependencies for clinical story representation, whereas the context-sensitive layer supports a better understanding of medical language. Precision and recall are significant because of the healthcare application of the model; the model has an accuracy of 91.8% and a recall of 93.2%. From these results, it can be concluded that HCAT presented significant progress in computing healthcare data. It provides a highly practical application for real-world extraction of medical data from unformatted text.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

A Model for Enhancing Unstructured Big Data Warehouse Execution Time

بواسطة Farhan, Marwa Salah , Youssef, Amira , Abdelhamid, Laila في Analysis , Big Data , Computational linguistics

2024

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract–Transform–Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract–Clean–Load–Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

ORGANIZING SMART CITY DATA BASED ON 3D POINT CLOUD IN UNSTRUCTURED DATABASE – AN OVERVIEW

بواسطة Mohd Ariff, S. A. , Ujang, U. , Choon, T. L. في Data acquisition , Electronic devices , Information technology

2022

The concept of the 3D smart city is an integration of smart cities and information technology. One of the data sources of a smart city is point cloud data that are produced from various data acquisition tools such as LiDAR, Terrestrial Laser Scanning, and Unmanned Aerial Vehicle. Due to the large size of point cloud data input, traditional databases could not handle the data efficiently. Alternatively, unstructured databases have become an option. Furthermore, data for smart city applications are considered being complex and large. Storing data in the unstructured database can easily be retrieved from various front ends such as web and mobile devices. However, unstructured databases do not have fixed schema and data types that often limit the uses of 3D point cloud data in relational databases. There are four categories of the data model in the unstructured database: document store, key-value, column store, and graph store. Each of the categories has different characteristics and approaches to handling data. Thus, this paper aims to summarise an overview of each category and determine the most suitable data organisation and environment for a 3D point cloud of a smart city. The overview will aid the developer or user select and comparing available data models in the unstructured database to handle 3D point clouds.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Summary of web crawler technology research

بواسطة Yu, Linxuan , Li, Yeli , Bian, Yuning في Human resources , Physics , Unstructured data

2020

With the continuous development of network information technology, there is a large amount of unstructured data called big data on the network. Human resources to collect information laborious, so web crawler technology came into being. This paper explores the basic principle and characteristics of web crawler and the classification of current popular crawler, introduces the key technology of crawler, compares two search strategies and the current application of crawler. Finally, the future research direction of web crawler is introduced.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

A comprehensive survey on feature selection in the various fields of machine learning

بواسطة Dhal Pradip , Azad Chandrashekhar في Algorithms , Domains , Feature selection

2022

In Machine Learning (ML), Feature Selection (FS) plays a crucial part in reducing data’s dimensionality and enhancing any proposed framework’s performance. However, in real-world applications, FS work suffers from high dimensionality, computational and storage complexity, noisy or ambiguous nature, high performance, etc. The area of FS is very vast and challenging in its nature. There are lots of work that have been reported on FS over the various area of applications. This paper has discussed FS’s framework and the multiple models of FS with detailed descriptions. We have also classified the various FS algorithms with respect to the data, i.e., structured or labeled data and unstructured data for the different applications of ML. We have also discussed what essential features are, the commonly used FS methods, the widely used datasets, and the widely used work done in the various ML fields for the FS task. Here we try to view the multiple comparison experimental results of FS work in different result discussions. This paper draws a descriptive survey on FS with the associated area of real-world problem domains. This paper’s main objective is to understand the main idea of FS work and identify the core idea of how FS will be applicable in various problem domains.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Unstructured Big Data Threat Intelligence Parallel Mining Algorithm

بواسطة Qian, Junhao , Li, Zhihua , Yu, Xinye في Algorithms , Classification , Cybersecurity

2024

To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

محدد اللغة

MBRLGlobalSearch

محدد اللغة

Catalogue Search | MBRL

نتائج البحث

استكشف المجموعة الواسعة من العناوين المتاحة.

MBRLSearchResults

MBRLHappinessMeter