Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
2,258 result(s) for "data preprocessing"
Sort by:
Method for Preprocessing Video Data for Training Deep-Learning Models for Identifying Behavioral Events in Bio-Objects
Monitoring moving bio-objects is currently of great interest for both fundamental and practical research. The advent of deep-learning algorithms has made it possible to automate the qualitative and quantitative analysis of the behavior of bio-objects recorded in video format. When processing such data, it is necessary to consider additional factors, such as background noise in the frame, the speed of the bio-object, and the need to reflect information about the previous (past) and subsequent (future) pose of the bio-object in one video frame. The preprocessed dataset must be suitable for verification by experts. This article proposes a method for preprocessing data to identify the behavior of a bio-object, a clear example of which is experiments on laboratory animals with the collection of video data. The method is based on combining information about a behavioral event presented in a sequence of frames with the addition of a native image and subsequent boundary detection using the Sobel filter. The resulting representation of a behavioral event is easily perceived by both human experts and neural networks of various architectures. The article presents the results of training several neural networks on the obtained dataset and proposes an effective neural network architecture (F1-score = 0.95) for identifying discrete events of biological objects’ behavior.
BibFusion: A Python package to integrate, deduplicate, and harmonize exported bibliographic records from Scopus and Web of Science for bibliometric analysis
Objective. The study presented BibFusion, a Python software package that harmonizes bibliographic exports from Scopus and Web of Science into a single, traceable, analysis-ready corpus for bibliometric and scientometric research. Design/Methodology/Approach. BibFusion was capable of ingesting Scopus CSV and WoS TXT files, applying systematic normalization (e.g., ASCII/uppercase standardization of titles and SR keys, affiliation parsing with country extraction), and optionally enriching records via DOI‑based resolution against OpenAlex to recover persistent identifiers (e.g., OpenAlex work IDs, ORCID when available, and OpenAlex author IDs). Cross-database integration employed a DOI-first deduplication cascade with a conservative fallback (title–year–first author) in the event that a DOI is absent. The authors were disambiguated through a canonical PersonID hierarchy (ORCID → OpenAlexAuthorID → normalized name). Citation strings were cleaned and remapped to ensure the preservation of consistent citation links, and journal/Scimago information was consolidated using ISSN/EISSN rules. Results. In a demonstration on an entrepreneurial marketing query, BibFusion consolidated 436 source records into 253 unique main works and materialized a unified corpus of 8,569 articles. The resulting dataset demonstrated high levels of identifier and geographic completeness, and it provided an analysis-ready citation layer. Conclusions/Value. BibFusion offers a reusable, auditable integration workflow that has been demonstrated to reduce duplicate inflation and metadata fragmentation. This workflow facilitates the explicit determination of merge decisions and residual uncertainty, thereby ensuring transparency in downstream analyses.
Continuous Student Knowledge Tracing Using SVD and Concept Maps
Abstract-One of the critical aspects of building intelligent tutoring systems regards proper monitoring of student's activity and academic performance. This paper presents a continuous student knowledge tracing method implemented for Tesys e-Learning platform at the Faculty of Automation, Computers and Electronics in the University of Craiova. The student's knowledge level is continuously monitored and, after each recommended test by the SVD-based mechanism, a new set of knowledge weights are computed. We aim to achieve a comprehensive monitoring environment which can provide an accurate insight upon the student's knowledge level at any moment. In our approach, we added weights for both students and tests to improve the student's evolution monitoring and provide more accurate feedback. The setup for validation consisted of ten tests with eight questions per test and we used both current and past year tests data. Results revealed that assigning weights to questions, tests and students and using them in the recommendation process offers a better view of the student's evolution along with more accurate recommendations. Progress in this direction will provide more insight into available teaching materials and SVD-based recommender system such that the e-learning platform that integrates the presented mechanism will provide a better learning experience.
Comparison of text preprocessing methods
Text preprocessing is not only an essential step to prepare the corpus for modeling but also a key area that directly affects the natural language processing (NLP) application results. For instance, precise tokenization increases the accuracy of part-of-speech (POS) tagging, and retaining multiword expressions improves reasoning and machine translation. The text corpus needs to be appropriately preprocessed before it is ready to serve as the input to computer models. The preprocessing requirements depend on both the nature of the corpus and the NLP application itself, that is, what researchers would like to achieve from analyzing the data. Conventional text preprocessing practices generally suffice, but there exist situations where the text preprocessing needs to be customized for better analysis results. Hence, we discuss the pros and cons of several common text preprocessing methods: removing formatting, tokenization, text normalization, handling punctuation, removing stopwords, stemming and lemmatization, n-gramming, and identifying multiword expressions. Then, we provide examples of text datasets which require special preprocessing and how previous researchers handled the challenge. We expect this article to be a starting guideline on how to select and fine-tune text preprocessing methods.
Transforming variables to central normality
Many real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.
Study on Preprocessing Method of TCM Prescription Data in Data Mining
Traditional Chinese medicine (TCM) prescriptions have been developed for thousands of years. Data forms are diverse, content is discrete and missing, and there are many uncertainties due to cultural and regional differences. Therefore, it has brought some difficulties to the mining of TCM prescriptions. Data based on the 3108 prescriptions for the treatment of typhoid fever, for example, is given priority to with data cleaning and data transformation of data preprocessing, prescriptions combined with multiple functions, expounds the unqualified prescriptions data cleansing, drug name normalization, dose for solving the problems of the unification, the data structured method, make the processed data can be effectively mining, It provides a strong support for exploring the compatibility law of prescription and the development of new drugs.
Evaluating the Performance of Several Data Preprocessing Methods Based on GRU in Forecasting Monthly Runoff Time Series
The optimal planning and management of modern water resources depends highly on reliable and accurate runoff forecasting. Data preprocessing technology can provide new possibilities for improving the accuracy of runoff forecasting when basic physical relationships cannot be captured using a single prediction model. Yet, few studies have evaluated the performances of various data preprocessing technologies in predicting monthly runoff time series so far. In order to fill this research gap, this paper investigates the potential of five data preprocessing techniques based on the gated recurrent unit network (GRU) model for monthly runoff prediction, namely variational mode decomposition (VMD), wavelet packet decomposition (WPD), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), extreme-point symmetric mode decomposition (ESMD), and singular spectrum analysis (SSA). In this study, the original monthly runoff data is first decomposed into a set of subcomponents using five data preprocessing methods; second, each component is predicted by developing an appropriate GRU model; and finally, the forecasting results of different two-stage hybrid models are obtained by aggregating the forecast results of the corresponding subcomponents. Four performance metrics are employed as the evaluation benchmarks. The experimental results from two Hydropower Stations in China show that five data preprocessing techniques can attain satisfying prediction results, while VMD and WPD methods can yield better performance than CEEMDAN, ESMD, and SSA in both training and testing periods in terms of four indexes. Indeed, it is significantly important to carefully determine an appropriate data preprocessing method according to the actual characteristics of the study area.
Battery State-of-Health Estimation Using Machine Learning and Preprocessing with Relative State-of-Charge
Because lithium-ion batteries are widely used for various purposes, it is important to estimate their state of health (SOH) to ensure their efficiency and safety. Despite the usefulness of model-based methods for SOH estimation, the difficulties of battery modeling have resulted in a greater emphasis on machine learning for SOH estimation. Furthermore, data preprocessing has received much attention because it is an important step in determining the efficiency of machine learning methods. In this paper, we propose a new preprocessing method for improving the efficiency of machine learning for SOH estimation. The proposed method consists of the relative state of charge (SOC) and data processing, which transforms time-domain data into SOC-domain data. According to the correlation analysis, SOC-domain data are more correlated with the usable capacity than time-domain data. Furthermore, we compare the estimation results of SOC-based data and time-based data in feedforward neural networks (FNNs), convolutional neural networks (CNNs), and long short-term memory (LSTM). The results show that the SOC-based preprocessing outperforms conventional time-domain data-based techniques. Furthermore, the accuracy of the simplest FNN model with the proposed method is higher than that of the CNN model and the LSTM model with a conventional method when training data are small.
Enhanced lung image segmentation using deep learning
With the advances in technology, assistive medical systems are emerging with rapid growth and helping healthcare professionals. The proactive diagnosis of diseases with artificial intelligence (AI) and its aligned technologies has been an exciting research area in the last decade. Doctors usually detect tuberculosis (TB) by checking the lungs’ X-rays. Classification using deep learning algorithms is successfully able to achieve accuracy almost similar to a doctor in detecting TB. It is found that the probability of detecting TB increases if classification algorithms are implemented on segmented lungs instead of the whole X-ray. The paper’s novelty lies in detailed analysis and discussion of U-Net +  + results and implementation of U-Net +  + in lung segmentation using X-ray. A thorough comparison of U-Net +  + with three other benchmark segmentation architectures and segmentation in diagnosing TB or other pulmonary lung diseases is also made in this paper. To the best of our knowledge, no prior research tried to implement U-Net +  + for lung segmentation. Most of the papers did not even use segmentation before classification, which causes data leakage. Very few used segmentations before classification, but they only used U-Net, which U-Net +  + can easily replace because accuracy and mean_iou of U-Net +  + are greater than U-Net accuracy and mean_iou , discussed in results, which can minimize data leakage. The authors achieved more than 98% lung segmentation accuracy and mean_iou 0.95 using U-Net +  + , and the efficacy of such comparative analysis is validated.
Comparison of Data Preprocessing Approaches for Applying Deep Learning to Human Activity Recognition in the Context of Industry 4.0
According to the Industry 4.0 paradigm, all objects in a factory, including people, are equipped with communication capabilities and integrated into cyber-physical systems (CPS). Human activity recognition (HAR) based on wearable sensors provides a method to connect people to CPS. Deep learning has shown surpassing performance in HAR. Data preprocessing is an important part of deep learning projects and takes up a large part of the whole analytical pipeline. Data segmentation and data transformation are two critical steps of data preprocessing. This study analyzes the impact of segmentation methods on deep learning model performance, and compares four data transformation approaches. An experiment with HAR based on acceleration data from multiple wearable devices was conducted. The multichannel method, which treats the data for the three axes as three overlapped color channels, produced the best performance. The highest overall recognition accuracy achieved was 97.20% for eight daily activities, based on the data from seven wearable sensors, which outperformed most of the other machine learning techniques. Moreover, the multichannel approach was applied to three public datasets and produced satisfying results for multi-source acceleration data. The proposed method can help better analyze workers’ activities and help to integrate people into CPS.