Catalogue Search | MBRL

Detection and classification of pneumonia using the Orange3 data mining tool

by Altayeb, Muneera , Al-Ghraibah, Amani , Arabiat, Areen

2024

A chest X-ray can convey a lot about a patient's condition. However, it requires a specialized and skilled doctor to determine the type of lung disease with high accuracy. Here comes the role of deep learning techniques (DL) and artificial intelligence (AI) in accelerating the process of detecting lung diseases and classifying them with high precision, which saves time and effort for the patient and the doctor alike. This work presents a proposed model for a machine learning (ML) and AI system to analyze chest X-ray images and categorize them into four cases normal, viral pneumonia, bacterial pneumonia, and coronavirus disease 2019 (COVID-19). The system relies on extracting Mel frequency cepstral coefficient (MFCC) features from a dataset consisting of 4,800 chest X-ray images, and then these features are used to train four basic classifiers based on the data mining tool Orange3, which are adaptive boosting (AdaBoost), decision trees (DTs), gradient boosting (GB), and random forest (RF). The model was tested and evaluated, where the AdaBoost classifier excelled with an accuracy of 100%, followed by RF with an accuracy of 99.5%. Finally, GB and DTs came with a classification accuracy of 98.5%, and 97.2%, respectively.

Journal Article

Share this book

Add to My Shelf

Heart disease classification using data mining tools and machine learning techniques

by El Mhamdi, Jamal , Tougui, Ilias , Jilbab, Abdelilah in Accuracy , Algorithms , Artificial intelligence

2020

Nowadays, in healthcare industry, data analysis can save lives by improving the medical diagnosis. And with the huge development in software engineering, different data mining tools are available for researchers, and used to conduct studies and experiments. For this, we have decided to compare six common data mining tools: Orange, Weka, RapidMiner, Knime, Matlab, and Scikit-Learn, using six machine learning techniques: Logistic Regression, Support Vector Machine, K Nearest Neighbors, Artificial Neural Network, Naïve Bayes, and Random Forest by classifying heart disease. The dataset used in this study has 13 features, one target variable, and 303 instances in which 139 suffers from cardiovascular disease and 164 are healthy subjects. Three performance measures were used to compare the performance of the techniques in each tool: the accuracy, the sensitivity, and the specificity. The results showed that Matlab was the best performing tool, and Matlab’s Artificial Neural Network model was the best performing technique. We concluded this research by plotting the Receiver operating characteristic curve of Matlab and by giving several recommendations on which tool to choose taking into account the users experience in the field of data mining.

Journal Article

Share this book

Add to My Shelf

Crack detection based on mel-frequency cepstral coefficients features using multiple classifiers

by Altayeb, Muneera , Arabiat, Areen

2024

Crack detection plays an essential role in evaluating the strength of structures. In recent years, the use of machine learning and deep learning techniques combined with computer vision has emerged to assess the strength of structures and detect cracks. This research aims to use machine learning (ML) to create a crack detection model based on a dataset consisting of 2432 images of different surfaces that were divided into two groups: 70% of the training dataset and 30% of the testing dataset. The Orange3 data mining tool was used to build a crack detection model, where the support vector machine (SVM), gradient boosting (GB), naive Bayes (NB), and artificial neural network (ANN) were trained and verified based on 3 sets of features, mel-frequency cepstral coefficients (MFCC), delta MFCC (DMFCC), and delta-delta MFCC (DDMFCC) were extracted using MATLAB. The experimental results showed the superiority of SVM with a classification accuracy of (100%), while for NB the accuracy reached (93.9%-99.9%), and (99.9%) for ANN, and finally in GB the accuracy reached (99.8%).

Journal Article

Share this book

Add to My Shelf

Survey on Data Mining Techniques, Process and Algorithms

by Vijayalakshmi, S. , Nivethithaa, K.K. in algorithm , Algorithms , Classification

2021

The term “Data Mining” is refers to the extraction of patterns and knowledge from large amounts of raw data and often defined as finding hidden information in a database. It insinuate analyzing data patterns in large volume of data using one or more software. Data mining involves effective data collection and warehousing as well as computer processing.

Journal Article

Share this book

Add to My Shelf

Second primary cancers: a retrospective analysis of real world data using the enhanced medical research engine ConSoRe in a French comprehensive cancer center

by Chvetzoff Gisèle , Bachelot, Thomas , Chassagne-Clement, Catherine in Breast cancer , Data mining , Head & neck cancer

2021

BackgroundSecond primary cancers (SPC) account for 18% of all cancers. We used the enhanced medical/health data mining tool ConSoRe to search aggregated data, analyze electronic patient records (EPR), and better characterize patients with SPC.MethodsThis retrospective cohort study used ConSoRe to identify EPRs from patients with SPC referred to the regional cancer center Leon Bérard from 1993 to 2017, and examined characteristics of patients with SPC, frequencies of first primary cancer (FPC) localization in the global population of patients with SPC, and time to SPC. Data set was extracted on January 1, 2018.ResultsAmong 296,530 EPRs, we identified 157,187 patients with FPC, including 13,002 (8%) patients with SPC. Between 2000 and 2010, the rate of SPC was 34%, and 52% of SPC were identified in the last years (2010–2017). In men, main cancers were head and neck cancer, lymphoma, and prostate carcinoma accounting for 15.6%, 12.8%, and 10.5% of FPC, while the three most common SPC were head and neck cancer (13.2%), lung cancer (11.8%) and lymphoma (9.2%). In women, breast cancers, lymphoma, and skin cancers accounted for 48.8%, 8%, and 5.1% of first cancers, and for 31.1%, 7% and 6% of SPC.ConclusionThe data mining tool ConSoRe contributes to access to real world data, and to better characterize patients with SPC. Expanding such approach to any comprehensive center will allow a global overview of the follow-up of patients with cancer, and help to improve long-term management and adapt surveillance.

Journal Article

Share this book

Add to My Shelf

Big Data Application and its Impact on Education

by Alqahtani, Salihah , Khan, Shakir in Algorithms , Big Data , Distance Education

2020

Big data is employed in widely different fields; we here study how education uses big data. We review the literature of the research about big data in education in the time interval from 2010 to 2020 then review the process of big educational data mining, the tools, and the applications of big data in education. This paper, with the help of these applications, explores the idea to improve the education process. Two methods are applied to validate education process and many parameters are discussed to complete the research.

Journal Article

Share this book

Add to My Shelf

Data mining tools -a case study for network intrusion detection

by Hosseini, Soodeh , Sardo, Saman Rafiee in Computer Communication Networks , Computer Science , Data mining

2021

With the growth of data mining and machine learning approaches in recent years, many efforts have been made to generalize these sciences so that researchers from any field can easily utilize these sciences. One of the most important of these efforts is the development of data mining tools that try to hide the complexities from researchers so that they can achieve a professional output with any level of knowledge. This paper is focused on reviewing and comparing data mining and machine learning tools including WEKA, KNIME, Keel, Orange, Azure, IBM SPSS Modeler, R and Scikit-Learn to show what approach each of these methods has taken in the face of the complexities and problems of different scenarios of generalization of data mining and machine learning. In addition, for a more detailed review, this paper examines the challenge of network intrusion detection in two tools, Knime with graphical interface and Scikit-Learn with coding environment.

Journal Article

Share this book

Add to My Shelf

Data envelopment analysis and data mining to efficiency estimation and evaluation

by Bou-Hamad, Imad , Anouze, Abdel Latef M in Accuracy , Banking , Classification

2019

Purpose This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance. Design/methodology/approach Different statistical and data mining techniques are used to second-stage DEA for bank performance as a part of an attempt to produce a powerful model for bank performance with effective predictive ability. The projected data mining tools are classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, artificial neural networks and their statistical counterpart, logistic regression. Findings The results showed that random forests and bagging outperform other methods in terms of predictive power. Originality/value This is the first study to assess the impact of environmental factors on banking performance in Middle East and North Africa countries.

Journal Article

Share this book

Add to My Shelf

Development of a Model Using Data Mining Technique to Test, Predict and Obtain Knowledge from the Academics Results of Information Technology Students

by Abdullaev, Sanjar , Adelaja, Oluwaseun A. , Alkattan, Hussein in Academic achievement , Accuracy , Algorithms

2022

Due to the huge amount of data obtained from students’ academic results in most tertiary institutions such as the colleges, polytechnics and universities, data mining has become one of the most effective tools for discovering vital knowledge from students’ dataset. The discovered knowledge can be productive in understanding numerous challenges in the scope of education and providing possible solutions to these challenges. The main objective of this research is to utilize the J48 decision algorithm model to test, classify and predict the students’ dataset by identifying some important attributes and instances. The analysis was conducted on the final year students’ academic results in C# programming amongst five universities which was imported in csv excel file dataset in WEKA environment. These training datasets contained the scores obtained in the examinations, grade remarks, grades, gender, and department. The knowledge extracted for the prediction model will help both the tutors and students to determine the success grade performance in the future. Flow lines, J48 decision trees, confusion matrices and a program flowchart were generated from the students’ dataset. The KAPPA value obtained from the prediction in this research ranges from 0.9070–0.9582 which perfectly agrees with the standard for an ideal analysis on datasets.

Journal Article

Share this book

Add to My Shelf

Research on Industry Data Analytics on Processing Procedure of Named 3-4-8-2 Components Combination for the Application Identification in New Chain Convenience Store

by Chen, You-Shyang , Wang, Shang-Wen , Lin, Chien-Ku in Accuracy , Bayesian analysis , Classifiers

2023

With the rapid economic boom of Asian countries, the president of Country-A has made great efforts to reform in recent years. The prospect of economic development is promising, and business opportunities are emerging gradually, depicting a prosperous scene; accordingly, people’s livelihood consumption also has changed significantly. The original main point of consumption for urban and rural people was the old and traditional grocery store with poor sanitation, but due to the economic improvement, the quality of consumption has also improved, and convenience stores are gradually replacing grocery store. However, convenience store management involves performance, logistic, competition, and personnel costs. Both whether the store can create a net profit and evaluate and select a new store will be important keys that significantly influence business performance. Therefore, this study attempts to use the industry data analysis method for highlighting a concept of processing an experience procedure of named 3-4-8-2 components combination in two stages. First, in the data preprocessing stage, this research considers 22 condition attributes and two types of decision factors, that include net profit and new store selection, and use both techniques of attribute selection and data discretization through the analysis and prediction of data mining tools. Next, in the experiment execution stage, three well-known classifiers (Bayes net, logistic regression, and J48 decision tree) with past good performance and four models (without preprocessing, with attribute selection, with data discretization, and with attribute selection and data discretization) are used for eight different experiments through two data verification methods (percentage split and cross-validation). Conclusively, three key results are identified from empirical analysis: (1) It is found that the prediction accuracy of the J48 decision tree classifier is relatively high and stable among the three classifiers in this study; at the same time, the J48 decision tree can yield comprehensible knowledge-based rules to instruct interested parties. (2) The results of this study show that the important attributes for the net profit decision attribute include the store type, POS number, and cashier number, while the important attributes for the new store selection include the store type and cashier number. (3) There is a difference in the selection of important attributes. Furthermore, four key valuable contributions are addressed from the empirical results, including academic contributions, enterprise contributions, application contributions, and management contributions. It is expected that the direction of store layout expansion can be found and identified through this study, but there are still many risks hidden behind the considerable business opportunities that need to be carefully managed.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter