Catalogue Search | MBRL

Dimensionality Reduction of Hyperspectral Images Based on Improved Spatial–Spectral Weight Manifold Embedding

by Xia, Kewen , Liu, Hong , Li, Tiejun in curse of dimensionality , dimensionality reduction , ground-truth classification accuracy

2020

Due to the spectral complexity and high dimensionality of hyperspectral images (HSIs), the processing of HSIs is susceptible to the curse of dimensionality. In addition, the classification results of ground truth are not ideal. To overcome the problem of the curse of dimensionality and improve classification accuracy, an improved spatial–spectral weight manifold embedding (ISS-WME) algorithm, which is based on hyperspectral data with their own manifold structure and local neighbors, is proposed in this study. The manifold structure was constructed using the structural weight matrix and the distance weight matrix. The structural weight matrix was composed of within-class and between-class coefficient representation matrices. These matrices were obtained by using the collaborative representation method. Furthermore, the distance weight matrix integrated the spatial and spectral information of HSIs. The ISS-WME algorithm describes the whole structure of the data by the weight matrix constructed by combining the within-class and between-class matrices and the spatial–spectral information of HSIs, and the nearest neighbor samples of the data are retained without changing when embedding to the low-dimensional space. To verify the classification effect of the ISS-WME algorithm, three classical data sets, namely Indian Pines, Pavia University, and Salinas scene, were subjected to experiments for this paper. Six methods of dimensionality reduction (DR) were used for comparison experiments using different classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM). The experimental results show that the ISS-WME algorithm can represent the HSI structure better than other methods, and effectively improves the classification accuracy of HSIs.

Journal Article

Share this book

Add to My Shelf

The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality

by Lučić, Bono , Batista, Jadranko , Vikić-Topić, Dražen in class imbalance , classification accuracy difference , classification model

2016

The simplest and the most commonly used measure for assess the classification model quality is parameter Q2 = 100 (p + n) / N (%) named the classification accuracy, p, n and N are the total numbers of correctly predicted compounds in the first and in the second class, and the total number of elements of classes (compounds) in data set, respectively. Moreover, the most probable accuracy that can be obtained by a random model is calculated for two-state model by the formulae Q2,rnd = 100 [(p + u) (p + o) + (n + u) (n + o)] / N2 (%), where u and o are the total number of under-predictions (when class 1 is predicted by the model as class 2) and over-predictions (when class 2 is predicted by the model as class 1) in data set, respectively. Finally, the difference between these two parameter ΔQ2 = Q2 – Q2,rnd is introduced, and it is suggested to compute and give ΔQ2 for each two-state classification model to assess its contribution over the accuracy of the corresponding random model. When data set is ideally balanced having the same numbers of elements in both classes, the two-state classification problem is the most difficult with maximal Q2 = 100 % and Q2,rnd = 50 %, giving the maximal ΔQ2 = 50 %. The usefulness of ΔQ2 parameter is illustrated in comparative analysis on two-class classification models from literature for prediction of secondary structure of membrane proteins and on several quanti¬tative structure-property models. Real contributions of these models over the random level of accuracy is calculated, and their ΔQ2 values are compared mutually and with the value of ΔQ2 (= 50 %) for the most difficult two-state classification model.

Journal Article

Share this book

Add to My Shelf

Estimating the Water Quality Class of a Major Irrigation Canal in Odisha, India: A Supervised Machine Learning Approach

by Mallick, C. , Bhoi, S. K. , Mohanty, C. R. in Accuracy , Anthropogenic factors , Artificial intelligence

2022

Contamination of surface water by rapid industrialization, natural and anthropogenic activities is of great concern over the last few decades. Nowadays, canal water systems are no exception to this form of contamination, which results in water quality degradation. To classify the canal water based on the Bureau of Indian Standards (BIS), it was thought to develop a quick and inexpensive approach as an alternative to the time-consuming analysis approach. With this motivation, the present study explores building a machine learning model for water quality classification of a major canal namely the Talaldanda canal operating in the state of Odisha, India. The water quality class is predicted using supervised machine learning (ML) prediction models for the new canal water input parameters. The water quality parameters such as pH, dissolved oxygen (DO), biochemical oxygen demand (BOD), and total coliform (TC) at six strategic locations of the canal from the year 2013-2020 were collected from Odisha State Pollution Control Board for the training phase. The supervised ML models used in the study are Decision Tree (DT), Neural Network (NN), k-NN (k-Nearest Neighbor), Naïve Bayes (NV), Support Vector Machine (SVM), and Random Forest (RF). The predictions of the models are evaluated using the Orange-3.29.3 data analytics tool. When analyzing the performance parameters by sampling the training data into training and testing using cross-validation, the results show that DT has a higher classification accuracy (CA) of 96.6 percent than other ML models. In addition, the likelihood of DT correctly predicting water quality class for the testing dataset is higher than that of other prediction models.

Journal Article

Share this book

Add to My Shelf

Effect of various dimension convolutional layer filters on traffic sign classification accuracy

by Sichkar, V.N. , Kolyubin, S.A. in classiﬁcation accuracy , convolutional layer ﬁlters , convolutional neural network

2019

The paper presents the study of an effective classiﬁcation method for trafﬁc signs on the basis of a convolutional neural network with various dimension ﬁlters. Every model of convolutional neural network has the same architecture but different dimension of ﬁlters for convolutional layer. The studied dimensions of the convolution layer ﬁlters are: 3 × 3, 5 × 5, 9 × 9, 13 × 13, 15 × 15, 19 × 19, 23 × 23, 25 × 25 and 31 ×31. In each experiment, the input image is convolved with the ﬁlters of certain dimension and with certain processing depth of image borders, which depends directly on the dimension of the ﬁlters and varies from 1 to 15 pixels. Performances of the proposed methods are evaluated with German Trafﬁc Sign Benchmarks (GTSRB). Images from this dataset were reduced to 32 × 32 pixels in dimension. The whole dataset was divided into three subsets: training, validation and testing. The effect of the dimension of the convolutional layer ﬁlters on the extracted feature maps is analyzed in accordance with the classiﬁcation accuracy and the average processing time. The testing dataset contains 12000 images that do not participate in convolutional neural network training. The experiment results have demonstrated that every model shows high testing accuracy of more than 82%. The models with ﬁlter dimensions of 9 × 9, 15 × 15 and 19 × 19 achieve top three with the best results on classiﬁcation accuracy equal to 86.4 %, 86 % and 86.8 %, respectively. The models with ﬁlter dimensions of 5 × 5, 3 × 3 and 13 × 13 achieve top three with the best results on the average processing time equal to 0.001879, 0.002046 and 0.002364 seconds, respectively. The usage of convolutional layer ﬁlter with middle dimension has shown not only the high classiﬁcation accuracy of more than 86 %, but also the fast classiﬁcation rate, that enables these models to be used in real-time applications.

Journal Article

Share this book

Add to My Shelf

Selection of optimum frequency bands for detection of epileptiform patterns

by Tripathi, Manjari , Gandhi, Tapan K. , Swami, Piyush in Approximation , automated seizure detection system , Automation

2019

The significant research effort in the domain of epilepsy has been directed toward the development of an automated seizure detection system. In their usage of the electrophysiological recordings, most of the proposals thus far have followed the conventional practise of employing all frequency bands following signal decomposition as input features for a classifier. Although seemingly powerful, this approach may prove counterproductive since some frequency bins may not carry relevant information about seizure episodes and may, instead, add noise to the classification process thus degrading performance. A key thesis of the work described here is that the selection of frequency subsets may enhance seizure classification rates. Additionally, the authors explore whether a conservative selection of frequency bins can reduce the amount of training data needed for achieving good classification performance. They have found compelling evidence that using spectral components with <25 Hz frequency in scalp electroencephalograms can yield state-of-the-art classification accuracy while reducing training data requirements to just a tenth of those employed by current approaches.

Journal Article

Share this book

Add to My Shelf

EAGA-MLP—An Enhanced and Adaptive Hybrid Classification Model for Diabetes Diagnosis

by Mishra, Sushruta , Bhoi, Akash Kumar , Mallick, Pradeep Kumar in Accuracy , Adaptation , Algorithms

2020

Disease diagnosis is a critical task which needs to be done with extreme precision. In recent times, medical data mining is gaining popularity in complex healthcare problems based disease datasets. Unstructured healthcare data constitutes irrelevant information which can affect the prediction ability of classifiers. Therefore, an effective attribute optimization technique must be used to eliminate the less relevant data and optimize the dataset for enhanced accuracy. Type 2 Diabetes, also called Pima Indian Diabetes, affects millions of people around the world. Optimization techniques can be applied to generate a reliable dataset constituting of symptoms that can be useful for more accurate diagnosis of diabetes. This study presents the implementation of a new hybrid attribute optimization algorithm called Enhanced and Adaptive Genetic Algorithm (EAGA) to get an optimized symptoms dataset. Based on readings of symptoms in the optimized dataset obtained, a possible occurrence of diabetes is forecasted. EAGA model is further used with Multilayer Perceptron (MLP) to determine the presence or absence of type 2 diabetes in patients based on the symptoms detected. The proposed classification approach was named as Enhanced and Adaptive-Genetic Algorithm-Multilayer Perceptron (EAGA-MLP). It is also implemented on seven different disease datasets to assess its impact and effectiveness. Performance of the proposed model was validated against some vital performance metrics. The results show a maximum accuracy rate of 97.76% and 1.12 s of execution time. Furthermore, the proposed model presents an F-Score value of 86.8% and a precision of 80.2%. The method is compared with many existing studies and it was observed that the classification accuracy of the proposed Enhanced and Adaptive-Genetic Algorithm-Multilayer Perceptron (EAGA-MLP) model clearly outperformed all other previous classification models. Its performance was also tested with seven other disease datasets. The mean accuracy, precision, recall and f-score obtained was 94.7%, 91%, 89.8% and 90.4%, respectively. Thus, the proposed model can assist medical experts in accurately determining risk factors of type 2 diabetes and thereby help in accurately classifying the presence of type 2 diabetes in patients. Consequently, it can be used to support healthcare experts in the diagnosis of patients affected by diabetes.

Journal Article

Share this book

Add to My Shelf

SmoteAdaNL: a learning method for network traffic classification

by Wang, Ruoyu , Tao, Ming , Liu, Zhen in Algorithms , Artificial Intelligence , Classification

2016

Machine learning based network traffic classification is a critical technique for network management, and has attracted much attention. Recently, most of the researchers focus on achieving high flow classification accuracy (FCA). However the amount of “mice” flows is more than that of “elephant” flows in the Internet, these classifiers hence are more suitable for “mice” flows, but have low byte classification accuracy (BCA). To address this issue, the notion of byte misclassification is firstly explored. According to the exploration that most misclassified bytes belong to the minority class, a novel method of network traffic classification is proposed by combining the data re-sampling and ensemble learning algorithms. To enhance the classification accuracy of the minority class, the data re-sampling algorithm is employed to increase the number of minority class flows. The data re-sampling however will change the data distribution and degrade the generalization of a classifier. A boosting-style ensemble learning algorithm with the consideration of ensemble diversity hence is employed to improve the generalization. The experiments conducted on the real-world traffic datasets show that the proposed method achieves over 90 % BCA and 96 % FCA on average, and improves about 7.15 % BCA by comparing with the existing methods.

Journal Article

Share this book

Add to My Shelf

A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain

by Tkachenko, Roman , Shakhovska, Nataliya , Ilchyshyn, Bohdan in Accuracy , Adequacy , Algorithms

2022

Data normalization is a data preprocessing task and one of the first to be performed during intellectual analysis, particularly in the case of tabular data. The importance of its implementation is determined by the need to reduce the sensitivity of the artificial intelligence model to the values of the features in the dataset to increase the studied model’s adequacy. This paper focuses on the problem of effectively preprocessing data to improve the accuracy of intellectual analysis in the case of performing medical diagnostic tasks. We developed a new two-step method for data normalization of numerical medical datasets. It is based on the possibility of considering both the interdependencies between the features of each observation from the dataset and their absolute values to improve the accuracy when performing medical data mining tasks. We describe and substantiate each step of the algorithmic implementation of the method. We also visualize the results of the proposed method. The proposed method was modeled using six different machine learning methods based on decision trees when performing binary and multiclass classification tasks. We used six real-world, freely available medical datasets with different numbers of vectors, attributes, and classes to conduct experiments. A comparison between the effectiveness of the developed method and that of five existing data normalization methods was carried out. It was experimentally established that the developed method increases the accuracy of the Decision Tree and Extra Trees Classifier by 1–5% in the case of performing the binary classification task and the accuracy of the Bagging, Decision Tree, and Extra Trees Classifier by 1–6% in the case of performing the multiclass classification task. Increasing the accuracy of these classifiers only by using the new data normalization method satisfies all the prerequisites for its application in practice when performing various medical data mining tasks.

Journal Article

Share this book

Add to My Shelf

Performance Evaluation of Sentinel-2 and Landsat 8 OLI Data for Land Cover/Use Classification Using a Comparison between Machine Learning Algorithms

by Chen, Wei , Gholamnia, Mehdi , Shirzadi, Ataollah in Accuracy , Algorithms , Artificial neural networks

2021

With the development of remote sensing algorithms and increased access to satellite data, generating up-to-date, accurate land use/land cover (LULC) maps has become increasingly feasible for evaluating and managing changes in land cover as created by changes to ecosystem and land use. The main objective of our study is to evaluate the performance of Support Vector Machine (SVM), Artificial Neural Network (ANN), Maximum Likelihood Classification (MLC), Minimum Distance (MD), and Mahalanobis (MH) algorithms and compare them in order to generate a LULC map using data from Sentinel 2 and Landsat 8 satellites. Further, we also investigate the effect of a penalty parameter on SVM results. Our study uses different kernel functions and hidden layers for SVM and ANN algorithms, respectively. We generated the training and validation datasets from Google Earth images and GPS data prior to pre-processing satellite data. In the next phase, we classified the images using training data and algorithms. Ultimately, to evaluate outcomes, we used the validation data to generate a confusion matrix of the classified images. Our results showed that with optimal tuning parameters, the SVM classifier yielded the highest overall accuracy (OA) of 94%, performing better for both satellite data compared to other methods. In addition, for our scenes, Sentinel 2 date was slightly more accurate compared to Landsat 8. The parametric algorithms MD and MLC provided the lowest accuracy of 80.85% and 74.68% for the data from Sentinel 2 and Landsat 8. In contrast, our evaluation using the SVM tuning parameters showed that the linear kernel with the penalty parameter 150 for Sentinel 2 and the penalty parameter 200 for Landsat 8 yielded the highest accuracies. Further, ANN classification showed that increasing the hidden layers drastically reduces classification accuracy for both datasets, reducing zero for three hidden layers.

Journal Article

Share this book

Add to My Shelf

CLASSIFICATION ACCURACY AS A PROXY FOR TWO-SAMPLE TESTING

by Ramdas, Aaditya , Wasserman, Larry , Kim, Ilmun in Accuracy , Approximation , Classifiers

2021

When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing a two-sample test. We investigate the statistical properties of this flexible approach in the high-dimensional setting. We prove two results that hold for all classifiers in any dimensions: if its true error remains ϵ-better than chance for some ϵ > 0 as d,n → ∞, then (a) the permutation-based test is consistent (has power approaching to one), (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent. To get a finer understanding of the rates of consistency, we study a specialized setting of distinguishing Gaussians with mean-difference δ and common (known or unknown) covariance Σ, when d/n → c ∈ (0,∞). We study variants of Fisher’s linear discriminant analysis (LDA) such as “naive Bayes” in a nontrivial regime when ϵ → 0 (the Bayes classifier has true accuracy approaching 1/2), and contrast their power with corresponding variants of Hotelling’s test. Surprisingly, the expressions for their power match exactly in terms of n, d, δ, Σ, and the LDA approach is only worse by a constant factor, achieving an asymptotic relative efficiency (ARE) of 1/√π for balanced samples. We also extend our results to high-dimensional elliptical distributions with finite kurtosis. Other results of independent interest include minimax lower bounds, and the optimality of Hotelling’s test when d = o(n). Simulation results validate our theory, and we present practical takeaway messages along with natural open problems.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter