Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
12
result(s) for
"C6130 Data handling techniques"
Sort by:
Deep learning approach for microarray cancer data classification
by
Basavegowda, Hema Shekar
,
Dagnew, Guesh
in
7-layer deep neural network architecture
,
Accuracy
,
adaptive moment estimation
2020
Analysis of microarray data is a highly challenging problem due to the inherent complexity in the nature of the data associated with higher dimensionality, smaller sample size, imbalanced number of classes, noisy data-structure, and higher variance of feature values. This has led to lesser classification accuracy and over-fitting problem. In this work, the authors aimed to develop a deep feedforward method to classify the given microarray cancer data into a set of classes for subsequent diagnosis purposes. They have used a 7-layer deep neural network architecture having various parameters for each dataset. The small sample size and dimensionality problems are addressed by considering a well-known dimensionality reduction technique namely principal component analysis. The feature values are scaled using the Min–Max approach and the proposed approach is validated on eight standard microarray cancer datasets. To measure the loss, a binary cross-entropy is used and adaptive moment estimation is considered for optimisation. The performance of the proposed approach is evaluated using classification accuracy, precision, recall, f-measure, log-loss, receiver operating characteristic curve, and confusion matrix. A comparative analysis with state-of-the-art methods is carried out and the performance of the proposed approach exhibit better performance than many of the existing methods.
Journal Article
Big data analytics in smart grids: state-of-the-art, challenges, opportunities, and future directions
by
Zhao, Power
,
Bhattarai, Bishnu P.
,
Luo, Yusheng
in
B8110D Power system planning and layout
,
Big Data
,
big data analytics
2019
Big data has potential to unlock novel groundbreaking opportunities in power grid that enhances a multitude of technical, social, and economic gains. As power grid technologies evolve in conjunction with measurement and communication technologies, this results in unprecedented amount of heterogeneous big data. In particular, computational complexity, data security, and operational integration of big data into power system planning and operational frameworks are the key challenges to transform the heterogeneous large dataset into actionable outcomes. In this context, suitable big data analytics combined with visualization can lead to better situational awareness and predictive decisions. This paper presents a comprehensive state-of-the-art review of big data analytics and its applications in power grids, and also identifies challenges and opportunities from utility, industry, and research perspectives. The paper analyzes research gaps and presents insights on future research directions to integrate big data analytics into power system planning and operational frameworks. Detailed information for utilities looking to apply big data analytics and insights on how utilities can enhance revenue streams and bring disruptive innovation are discussed. General guidelines for utilities to make the right investment in the adoption of big data analytics by unveiling interdependencies among critical infrastructures and operations are also provided.
Journal Article
Efficient algorithm for big data clustering on single machine
by
Alguliyev, Rasim M.
,
Sukhostat, Lyudmila V.
,
Aliguliyev, Ramiz M.
in
Accelerometers
,
Algorithms
,
Big Data
2020
Big data analysis requires the presence of large computing powers, which is not always feasible. And so, it became necessary to develop new clustering algorithms capable of such data processing. This study proposes a new parallel clustering algorithm based on the k-means algorithm. It significantly reduces the exponential growth of computations. The proposed algorithm splits a dataset into batches while preserving the characteristics of the initial dataset and increasing the clustering speed. The idea is to define cluster centroids, which are also clustered, for each batch. According to the obtained centroids, the data points belong to the cluster with the nearest centroid. Real large datasets are used to conduct the experiments to evaluate the effectiveness of the proposed approach. The proposed approach is compared with k-means and its modification. The experiments show that the proposed algorithm is a promising tool for clustering large datasets in comparison with the k-means algorithm.
Journal Article
Imputing missing values using cumulative linear regression
2019
The concept of missing data is important to apply statistical methods on the dataset. Statisticians and researchers may end up to an inaccurate illation about the data if the missing data are not handled properly. Of late, Python and R provide diverse packages for handling missing data. In this study, an imputation algorithm, cumulative linear regression, is proposed. The proposed algorithm depends on the linear regression technique. It differs from the existing methods, in that it cumulates the imputed variables; those variables will be incorporated in the linear regression equation to filling in the missing values in the next incomplete variable. The author performed a comparative study of the proposed method and those packages. The performance was measured in terms of imputation time, root-mean-square error, mean absolute error, and coefficient of determination $\\lpar {\\bi R}^2\\rpar $(R2). On analysing on five datasets with different missing values generated from different mechanisms, it was observed that the performances vary depending on the size, missing percentage, and the missingness mechanism. The results showed that the performance of the proposed method is slightly better.
Journal Article
Influence of kernel clustering on an RBFN
2019
Classical radial basis function network (RBFN) is widely used to process the non-linear separable data sets with the introduction of activation functions. However, the setting of parameters for activation functions is random and the distribution of patterns is not taken into account. To process this issue, some scholars introduce the kernel clustering into the RBFN so that the clustering results are related to the parameters about activation functions. On the base of the original kernel clustering, this study further discusses the influence of kernel clustering on an RBFN when the setting of kernel clustering is changing. The changing involves different kernel-clustering ways [bubble sort (BS) and escape nearest outlier (ENO)], multiple kernel-clustering criteria (static and dynamic) etc. Experimental results validate that with the consideration of distribution of patterns and the changes of setting of kernel clustering, the performance of an RBFN is improved and is more feasible for corresponding data sets. Moreover, though BS always costs more time than ENO, it still brings more feasible clustering results. Furthermore, dynamic criterion always cost much more time than static one, but kernel number derived from dynamic criterion is fewer than the one from static.
Journal Article
Adapting big data standards, maturity models to smart grid distributed generation: critical review
by
Sarwat, Arif I.
,
Sundararajan, Aditya
,
Hernandez, Alexander S.
in
B8120K Distributed power generation
,
Big Data
,
big data standards
2020
Big data standards and capability maturity models (CMMs) help developers build applications with reduced coupling and increased breadth of deployment. In smart grids, stakeholders currently work with data management techniques that are unique and customised to their own goals, thereby posing challenges for grid-wide integration and deployment. Although big data standards and CMMs exist for other domains, no work in the literature considers adapting them to smart grids, which will benefit from both. Further, existing smart grid standards and CMMs do not fully account for big data challenges. This study bridges the gap by analysing the role of big data in smart grids, and explores if and how big data standards and CMMs can be adapted specifically to ten distributed generation (DG) use-cases that use big data. In doing so, this work provides a useful starting point for researchers and industry members developing standards and CMM assessments for smart grid DG.
Journal Article
Slang feature extraction by analysing topic change on social media
by
Matsumoto, Kazuyuki
,
Ren, Fuji
,
Matsuoka, Masaya
in
Accuracy
,
analysing topic change
,
automatic information collection
2019
Recently, the authors often see words such as youth slang, neologism and Internet slang on social networking sites (SNSs) that are not registered on dictionaries. Since the documents posted to SNSs include a lot of fresh information, they are thought to be useful for collecting information. It is important to analyse these words (hereinafter referred to as ‘slang’) and capture their features for the improvement of the accuracy of automatic information collection. This study aims to analyse what features can be observed in slang by focusing on the topic. They construct topic models from document groups including target slang on Twitter by latent Dirichlet allocation. With the models, they chronologically the analyse change of topics during a certain period of time to find out the difference in the features between slang and general words. Then, they propose a slang classification method based on the change of features.
Journal Article
Ensemble multi-objective evolutionary algorithm for gene regulatory network reconstruction based on fuzzy cognitive maps
by
He, Shan
,
Chi, Yaxiong
,
Liu, Jing
in
A0210 Algebra, set theory, and graph theory
,
A0250 Probability theory, stochastic processes, and statistics
,
A8715B Biomolecular structure, configuration, conformation, and active sites
2019
Many methods aim to use data, especially data about gene expression based on high throughput genomic methods, to identify complicated regulatory relationships between genes. The authors employ a simple but powerful tool, called fuzzy cognitive maps (FCMs), to accurately reconstruct gene regulatory networks (GRNs). Many automated methods have been carried out for training FCMs from data. These methods focus on simulating the observed time sequence data, but neglect the optimisation of network structure. In fact, the FCM learning problem is multi-objective which contains network structure information, thus, the authors propose a new algorithm combining ensemble strategy and multi-objective evolutionary algorithm (MOEA), called EMOEAFCM-GRN, to reconstruct GRNs based on FCMs. In EMOEAFCM-GRN, the MOEA first learns a series of networks with different structures by analysing historical data simultaneously, which is helpful in finding the target network with distinct optimal local information. Then, the networks which receive small simulation error on the training set are selected from the Pareto front and an efficient ensemble strategy is provided to combine these selected networks to the final network. The experiments on the DREAM4 challenge and synthetic FCMs illustrate that EMOEAFCM-GRN is efficient and able to reconstruct GRNs accurately.
Journal Article
Two-phase clustering algorithm with density exploring distance measure
by
Jiang, Xiangming
,
Ma, Jingjing
,
Gong, Maoguo
in
Algorithms
,
C1140Z Other topics in statistics
,
C1160 Combinatorial mathematics
2018
Here, the authors propose a novel two-phase clustering algorithm with a density exploring distance (DED) measure. In the first phase, the fast global K-means clustering algorithm is used to obtain the cluster number and the prototypes. Then, the prototypes of all these clusters and representatives of points belonging to these clusters are regarded as the input data set of the second phase. Afterwards, all the prototypes are clustered according to a DED measure which makes data points locating in the same structure to possess high similarity with each other. In experimental studies, the authors test the proposed algorithm on seven artificial as well as seven UCI data sets. The results demonstrate that the proposed algorithm is flexible to different data distributions and has a stronger ability in clustering data sets with complex non-convex distribution when compared with the comparison algorithms.
Journal Article
Outlier detection in neutrosophic sets by using rough entropy based weighted density method
2020
Neutrosophy is the study of neutralities, which is an extension of discussing the truth of opinions. Neutrosophic logic can be applied to any field, to provide the solution for indeterminacy problem. Many of the real-world data have a problem of inconsistency, indeterminacy and incompleteness. Fuzzy sets provide a solution for uncertainties, and intuitionistic fuzzy sets handle incomplete information, but both concepts failed to handle indeterminate information. To handle this complicated situation, researchers require a powerful mathematical tool, naming, neutrosophic sets, which is a generalised concept of fuzzy and intuitionistic fuzzy sets. Neutrosophic sets provide a solution for both incomplete and indeterminate information. It has mainly three degrees of membership such as truth, indeterminacy and falsity. Boolean values are obtained from the three degrees of membership by cut relation method. Data items which contrast from other objects by their qualities are outliers. The weighted density outlier detection method based on rough entropy calculates weights of each object and attribute. From the obtained weighted values, the threshold value is fixed to determine outliers. Experimental analysis of the proposed method has been carried out with neutrosophic movie dataset to detect outliers and also compared with existing methods to prove its performance.
Journal Article