Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
423
result(s) for
"data pre-processing"
Sort by:
Sentiment Analysis for Multi-Attribute Data in OSNs Using Hybrid Approach
by
Pallavi, P.
,
Thallapalli, Ravikumar
,
Narasimha, G.
in
Classification
,
Data analysis
,
Data mining
2020
Increasing popularity of social networks like LinkedIn, MySpace and other networks in present days. Communication is also increased in between users present in social networks. Large amount of data being move on social media because of increase data outsourcing. Sentiment analysis is impressive and interest concept for online social networks, while different types of existing methods to find sentiment in online social networks to define communication between different users to categorize patterns with respect to similar attributes to analyze large data. We present and suggest the Hybrid Machine Learning method in this paper.(which is combination of Balanced Window and Classification based on Parts of Speech) to handle outsourced data of social networks from Face Book and other blogging services are trained and then classify the relation based on emotional aspect like positive or negative and other relations in social streams. The performance of our proposed approach is to extensively close to machine learning and identify important relevant features randomly and perform sentiment analysis in different data streams. Our experimental results show exhaustive level of classification results with comparison of existing approaches in real time environment.
Journal Article
Optimisation of mobile intelligent terminal data pre-processing methods for crowd sensing
by
Zeng, Yuefan
,
Sun, Bo
,
Chen, Lina
in
Algorithms
,
B6135 Optical, image and video signal processing
,
B6140B Filtering methods in signal processing
2018
Sensor data pre-processing is an essential phase of crowd sensing application. Existing studies do not effectively solve the problem, and there still exist various sensor data pre-processing optimisation problems at the acquisition end in crowd-sensing process. This study presents an improved sliding average method to achieve data compression and reduce the time complexity by using a dynamic window with improved processing time. Through adopting locally sorting and gradient change of the filter window, an improved extremum median filtering method is proposed to relieve the time-consuming problem when denoising high pixel images. A transmission strategy for optimisation is also proposed, in which only the demarcation points of each group of data and the data points with large differences when comparing with the demarcation points are recorded. This strategy reduces the storage pressure and the amount of data transmission of mobile terminal and improves the efficiency of data transmission. The experimental results show that their methods have higher speed and lower cost, and thus they can run better in crowd-sensing environment.
Journal Article
AdapterRemoval v2: rapid adapter trimming, identification, and read merging
by
Schubert, Mikkel
,
Lindgreen, Stinus
,
Orlando, Ludovic
in
Algorithms
,
Base Sequence
,
Bioinformatics
2016
Background
As high-throughput sequencing platforms produce longer and longer reads, sequences generated from short inserts, such as those obtained from fossil and degraded material, are increasingly expected to contain adapter sequences. Efficient adapter trimming algorithms are also needed to process the growing amount of data generated per sequencing run.
Findings
We introduce AdapterRemoval v2, a major revision of AdapterRemoval v1, which introduces
(i)
striking improvements in throughput, through the use of single instruction, multiple data (SIMD; SSE1 and SSE2) instructions and multi-threading support,
(ii)
the ability to handle datasets containing reads or read-pairs with different adapters or adapter pairs,
(iii)
simultaneous demultiplexing and adapter trimming,
(iv)
the ability to reconstruct adapter sequences from paired-end reads for poorly documented data sets, and
(v)
native gzip and bzip2 support.
Conclusions
We show that AdapterRemoval v2 compares favorably with existing tools, while offering superior throughput to most alternatives examined here, both for single and multi-threaded operations.
Journal Article
Classification of stroke disease using machine learning algorithms
by
Ramachandran, Manikandan
,
Soundarapandian Ravichandran Kattur
,
Gandomi, Amir H
in
Algorithms
,
Artificial neural networks
,
Classification
2020
This paper presents a prototype to classify stroke that combines text mining tools and machine learning algorithms. Machine learning can be portrayed as a significant tracker in areas like surveillance, medicine, data management with the aid of suitably trained machine learning algorithms. Data mining techniques applied in this work give an overall review about the tracking of information with respect to semantic as well as syntactic perspectives. The proposed idea is to mine patients’ symptoms from the case sheets and train the system with the acquired data. In the data collection phase, the case sheets of 507 patients were collected from Sugam Multispecialty Hospital, Kumbakonam, Tamil Nadu, India. Next, the case sheets were mined using tagging and maximum entropy methodologies, and the proposed stemmer extracts the common and unique set of attributes to classify the strokes. Then, the processed data were fed into various machine learning algorithms such as artificial neural networks, support vector machine, boosting and bagging and random forests. Among these algorithms, artificial neural networks trained with a stochastic gradient descent algorithm outperformed the other algorithms with a higher classification accuracy of 95% and a smaller standard deviation of 14.69.
Journal Article
A Deep Learning Ensemble for Network Anomaly and Cyber-Attack Detection
by
Choraś, Michał
,
Dutta, Vibekananda
,
Pawlicki, Marek
in
Accuracy
,
anomaly detection
,
Artificial intelligence
2020
Currently, expert systems and applied machine learning algorithms are widely used to automate network intrusion detection. In critical infrastructure applications of communication technologies, the interaction among various industrial control systems and the Internet environment intrinsic to the IoT technology makes them susceptible to cyber-attacks. Given the existence of the enormous network traffic in critical Cyber-Physical Systems (CPSs), traditional methods of machine learning implemented in network anomaly detection are inefficient. Therefore, recently developed machine learning techniques, with the emphasis on deep learning, are finding their successful implementations in the detection and classification of anomalies at both the network and host levels. This paper presents an ensemble method that leverages deep models such as the Deep Neural Network (DNN) and Long Short-Term Memory (LSTM) and a meta-classifier (i.e., logistic regression) following the principle of stacked generalization. To enhance the capabilities of the proposed approach, the method utilizes a two-step process for the apprehension of network anomalies. In the first stage, data pre-processing, a Deep Sparse AutoEncoder (DSAE) is employed for the feature engineering problem. In the second phase, a stacking ensemble learning approach is utilized for classification. The efficiency of the method disclosed in this work is tested on heterogeneous datasets, including data gathered in the IoT environment, namely IoT-23, LITNET-2020, and NetML-2020. The results of the evaluation of the proposed approach are discussed. Statistical significance is tested and compared to the state-of-the-art approaches in network anomaly detection.
Journal Article
Enhanced Short-Term Load Forecasting Using Artificial Neural Networks
by
Daskalopulu, Aspassia
,
Arvanitidis, Athanasios Ioannis
,
Tsoukalas, Lefteri H.
in
Algorithms
,
Artificial intelligence
,
artificial neural networks
2021
The modernization and optimization of current power systems are the objectives of research and development in the energy sector, which is motivated by the ever-increasing electricity demands. The goal of such research and development is to render power electronic equipment more controllable, to ensure maximal use of current circuits, system flexibility and efficiency, as well as the relatively easy integration of renewable energy resources at all voltage levels. The current revolution in communication technologies and the Internet of Things (IoT) offers us an opportunity to supervise and regulate the power grid, in order to achieve more reliable, efficient, and cost-effective services. One of the most critical aspects of efficient power system operation is the ability to predict energy load requirements, i.e., load forecasting. Load forecasting is essential for balancing demand and supply and for determining electricity prices. Typically, load forecasting has been supported through the use of Artificial Neural Networks (ANNs), which, once trained on a set of data, can predict future loads. The accuracy of the ANNs’ prediction depends on the quality and availability of the training data. In this paper, we propose novel data pre-processing strategies, which we apply to the data used to train an ANN, and subsequently evaluate the quality of the predictions it produces, to demonstrate the benefits gained. The proposed strategies and the obtained results are illustrated using consumption data from the Greek interconnected power system.
Journal Article
Electricity Theft Detection Using Supervised Learning Techniques on Smart Meter Data
by
Shafiq, Muhammad
,
Khan, Zahoor Ali
,
Javaid, Nadeem
in
Artificial intelligence
,
Classification
,
Datasets
2020
Due to the increase in the number of electricity thieves, the electric utilities are facing problems in providing electricity to their consumers in an efficient way. An accurate Electricity Theft Detection (ETD) is quite challenging due to the inaccurate classification on the imbalance electricity consumption data, the overfitting issues and the High False Positive Rate (FPR) of the existing techniques. Therefore, intensified research is needed to accurately detect the electricity thieves and to recover a huge revenue loss for utility companies. To address the above limitations, this paper presents a new model, which is based on the supervised machine learning techniques and real electricity consumption data. Initially, the electricity data are pre-processed using interpolation, three sigma rule and normalization methods. Since the distribution of labels in the electricity consumption data is imbalanced, an Adasyn algorithm is utilized to address this class imbalance problem. It is used to achieve two objectives. Firstly, it intelligently increases the minority class samples in the data. Secondly, it prevents the model from being biased towards the majority class samples. Afterwards, the balanced data are fed into a Visual Geometry Group (VGG-16) module to detect abnormal patterns in electricity consumption. Finally, a Firefly Algorithm based Extreme Gradient Boosting (FA-XGBoost) technique is exploited for classification. The simulations are conducted to show the performance of our proposed model. Moreover, the state-of-the-art methods are also implemented for comparative analysis, i.e., Support Vector Machine (SVM), Convolution Neural Network (CNN), and Logistic Regression (LR). For validation, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), Receiving Operating Characteristics Area Under Curve (ROC-AUC), and Precision Recall Area Under Curve (PR-AUC) metrics are used. Firstly, the simulation results show that the proposed Adasyn method has improved the performance of FA-XGboost classifier, which has achieved F1-score, precision, and recall of 93.7%, 92.6%, and 97%, respectively. Secondly, the VGG-16 module achieved a higher generalized performance by securing accuracy of 87.2% and 83.5% on training and testing data, respectively. Thirdly, the proposed FA-XGBoost has correctly identified actual electricity thieves, i.e., recall of 97%. Moreover, our model is superior to the other state-of-the-art models in terms of handling the large time series data and accurate classification. These models can be efficiently applied by the utility companies using the real electricity consumption data to identify the electricity thieves and overcome the major revenue losses in power sector.
Journal Article
Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators
by
Paramasivam, Sivajothi
,
Chua, Hui Na
,
Pranavanand, S.
in
Accuracy
,
Algorithms
,
attribute evaluation
2021
Cardiovascular diseases (CVDs) kill about 20.5 million people every year. Early prediction can help people to change their lifestyles and to ensure proper medical treatment if necessary. In this research, ten machine learning (ML) classifiers from different categories, such as Bayes, functions, lazy, meta, rules, and trees, were trained for efficient heart disease risk prediction using the full set of attributes of the Cleveland heart dataset and the optimal attribute sets obtained from three attribute evaluators. The performance of the algorithms was appraised using a 10-fold cross-validation testing option. Finally, we performed tuning of the hyperparameter number of nearest neighbors, namely, ‘k’ in the instance-based (IBk) classifier. The sequential minimal optimization (SMO) achieved an accuracy of 85.148% using the full set of attributes and 86.468% was the highest accuracy value using the optimal attribute set obtained from the chi-squared attribute evaluator. Meanwhile, the meta classifier bagging with logistic regression (LR) provided the highest ROC area of 0.91 using both the full and optimal attribute sets obtained from the ReliefF attribute evaluator. Overall, the SMO classifier stood as the best prediction method compared to other techniques, and IBk achieved an 8.25% accuracy improvement by tuning the hyperparameter ‘k’ to 9 with the chi-squared attribute set.
Journal Article
Comparison of data filtering methods effects on smartgrid load forecasting
2024
The integration of advanced metering technology in power systems has enabled real-time data access for every node in a smart grid. As a result, the power system can now access large volumes of data. This vast amount of data requires an alternative method of analysis. Machine learningbased load forecasting technologies are being applied in this scenario. However, this massive data collection needs to be processed through the appropriate data pre-processing method, such as the removal of noise, outliers, and erroneous data, the detection of missing data, the normalization of widely divergent datasets, etc., to improve the effectiveness of the load forecaster. Thus, to eliminate the various kinds of errors and outliers present in the data that was directly obtained from smart meters, this study analyses and compares the efficacy of eight distinct smoothing and filtering techniques as a novel contribution of this work. Using the processed data acquired, a neural network-based load forecasting model was developed to compare the efficacy of the various pre-processing approaches. This study makes use of real-time data obtained from the smart meter placed at a node within the NIT Patna campus. The proposed moving average filter surpasses the other methods for filtering and smoothing the raw data by an average MAPE of 2.66, according to the load forecasting results that were obtained.
Journal Article
Advances in Data Pre-Processing Methods for Distributed Fiber Optic Strain Sensing
2024
Because of their high spatial resolution over extended lengths, distributed fiber optic sensors (DFOS) enable us to monitor a wide range of structural effects and offer great potential for diverse structural health monitoring (SHM) applications. However, even under controlled conditions, the useful signal in distributed strain sensing (DSS) data can be concealed by different types of measurement principle-related disturbances: strain reading anomalies (SRAs), dropouts, and noise. These disturbances can render the extraction of information for SHM difficult or even impossible. Hence, cleaning the raw measurement data in a pre-processing stage is key for successful subsequent data evaluation and damage detection on engineering structures. To improve the capabilities of pre-processing procedures tailored to DSS data, characteristics and common remediation approaches for SRAs, dropouts, and noise are discussed. Four advanced pre-processing algorithms (geometric threshold method (GTM), outlier-specific correction procedure (OSCP), sliding modified z-score (SMZS), and the cluster filter) are presented. An artificial but realistic benchmark data set simulating different measurement scenarios is used to discuss the features of these algorithms. A flexible and modular pre-processing workflow is implemented and made available with the algorithms. Dedicated algorithms should be used to detect and remove SRAs. GTM, OSCP, and SMZS show promising results, and the sliding average is inappropriate for this purpose. The preservation of crack-induced strain peaks’ tips is imperative for reliable crack monitoring.
Journal Article