Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
57
result(s) for
"gradient-boosted decision trees"
Sort by:
Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China
2022
The identification of underground formation lithology is fundamental in reservoir characterization during petroleum exploration. With the increasing availability and diversity of well-logging data, automated interpretation of well-logging data is in great demand for more efficient and reliable decision making for geologists and geophysicists. This study benchmarked the performances of an array of machine learning models, from linear and nonlinear individual classifiers to ensemble methods, on the task of lithology identification. Cross-validation and Bayesian optimization were utilized to optimize the hyperparameters of different models and performances were evaluated based on the metrics of accuracy—the area under the receiver operating characteristic curve (AUC), precision, recall, and F1-score. The dataset of the study consists of well-logging data acquired from the Baikouquan formation in the Mahu Sag of the Junggar Basin, China, including 4156 labeled data points with 9 well-logging variables. Results exhibit that ensemble methods (XGBoost and RF) outperform the other two categories of machine learning methods by a material margin. Within the ensemble methods, XGBoost has the best performance, achieving an overall accuracy of 0.882 and AUC of 0.947 in classifying mudstone, sandstone, and sandy conglomerate. Among the three lithology classes, sandy conglomerate, as in the potential reservoirs in the study area, can be best distinguished with accuracy of 97%, precision of 0.888, and recall of 0.969, suggesting the XGBoost model as a strong candidate machine learning model for more efficient and accurate lithology identification and reservoir quantification for geologists.
Journal Article
GIS-based evolutionary optimized Gradient Boosted Decision Trees for forest fire susceptibility mapping
2018
Rampant pasture burning has lead to various forest fires taking their toll over the health of many forests. Nanda Devi Biosphere Reserve, located in the northern part of India, witnessed a majority of these incidents in the recent past, though, it remains comprehensively untouched from research studies. The scale of these wildfires has led to an immense requirement of preventive measures to be taken for recuperating from such events. This requires for an in-depth analysis of the study area, its history of wildfires and their causes. These efforts would assist in laying a blueprint for a contingency plan in the event of a wildfire. This work proposes an evolutionary optimized gradient boosted decision trees for preparing wildfire susceptibility maps for the study area that would aid in the government’s forest preservation and disaster management activities. The study took 18 ignition factors of elevation, slope, aspect, plan curvature, topographic position index, topographic water index, normalized difference vegetation index, soil texture, temperature, rainfall, aridity index, potential evapotranspiration, relative humidity, wind speed, land cover and distance from roads, rivers and habitations into consideration. The study revealed that approximately 1432.025 km2 of area was very highly susceptible to forest fires while 1202.356 km2 was highly susceptible to forest fires. The proposed model was compared against various machine learning models such as random forest, neural networks and support vector machines, and it outperformed them by achieving an overall accuracy of 95.5%. The proposed model demonstrated good prospects for application in the field of hazard susceptibility mappings.
Journal Article
Routine Laboratory Blood Tests Predict SARS-CoV-2 Infection Using Machine Learning
2020
Abstract
Background
Accurate diagnostic strategies to identify SARS-CoV-2 positive individuals rapidly for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours.
Method
We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual’s SARS-CoV-2 infection status. Laboratory testing results obtained within 2 days before the release of SARS-CoV-2 RT-PCR result were used to train a gradient boosting decision tree (GBDT) model from 3,356 SARS-CoV-2 RT-PCR tested patients (1,402 positive and 1,954 negative) evaluated at a metropolitan hospital.
Results
The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within 2 days.
Conclusion
This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-CoV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-CoV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.
Journal Article
Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India
2021
In the drought prone district of Dholpur in Rajasthan, India, groundwater is a lifeline for its inhabitants. With population explosion and rapid urbanization, the groundwater is being critically over-exploited. Hence the current groundwater potential mapping study was undertaken to ascertain the areas that are more likely to yield a larger volume of groundwater against those areas that have poor groundwater potential and accordingly perpetuate the much needed damage control. Thematic layers for 14 groundwater influencing factors were considered for the study region, including elevation, slope, aspect, plan curvature, profile curvature, topographic wetness index (TWI), geology, soil, land use, normalized difference vegetation index (NDVI), surface temperature, precipitation, distance from roads, and distance from rivers. These were then subjected to an overlay operation, with the groundwater inventory which comprised of the locations of observational groundwater wells. The resulting geospatial database was then used to train two decision tree based ensemble models: gradient boosted decision trees (GBDT) and random forest (RF). The predictive performance of these models was then compared using various performance metrics such as area under curve (AUC) of receiver operating characteristics (ROC), sensitivity, accuracy, etc. It was found that GBDT (AUC: 0.79) outperformed RF (AUC: 0.71). The validated GBDT model was then used to construct the groundwater potential zonation map. The generated map showed that about 20.2% of the region has very high potential, while 22.6% has high potential to yield groundwater, and approximately 19.9–17.5% of the study region has very low to low groundwater potential.
Journal Article
Automated detection of driver fatigue based on EEG signals using gradient boosting decision tree model
2018
Driver fatigue is increasingly a contributing factor for traffic accidents, so an effective method to automatically detect driver fatigue is urgently needed. In this study, in order to catch the main characteristics of the EEG signals, four types of entropies (based on the EEG signal of a single channel) were calculated as the feature sets, including sample entropy, fuzzy entropy, approximate entropy and spectral entropy. All feature sets were used as the input of a gradient boosting decision tree (GBDT), a fast and highly accurate boosting ensemble method. The output of GBDT determined whether a driver was in a fatigue state or not based on their EEG signals. Three state-of-the-art classifiers, k-nearest neighbor, support vector machine and neural network were also employed. To assess our method, several experiments including parameter setting and classification performance comparison were performed on 22 subjects. The results indicated that it is possible to use only one EEG channel to detect a driver fatigue state. The average highest recognition rate in this work was up to 94.0%, which could meet the needs of daily applications. Our GBDT-based method may assist in the detection of driver fatigue.
Journal Article
Offshore application of landslide susceptibility mapping using gradient-boosted decision trees: a Gulf of Mexico case study
by
Dyer, Alec S
,
Duran, Rodrigo
,
Mark-Moser, MacKenzie
in
Accuracy
,
Case studies
,
Decision trees
2024
Among natural hazards occurring offshore, submarine landslides pose a significant risk to offshore infrastructure installations attached to the seafloor. With the offshore being important for current and future energy production, there is a need to anticipate where future landslide events are likely to occur to support planning and development projects. Using the northern Gulf of Mexico (GoM) as a case study, this paper performs Landslide Susceptibility Mapping (LSM) using a gradient-boosted decision tree (GBDT) model to characterize the spatial patterns of submarine landslide probability over the United States Exclusive Economic Zone (EEZ) where water depths are greater than 120 m. With known spatial extents of historic submarine landslides and a Geographic Information System (GIS) database of known topographical, geomorphological, geological, and geochemical factors, the resulting model was capable of accurately forecasting potential locations of sediment instability. Results of a permutation modelling approach indicated that LSM accuracy is sensitive to the number of unique training locations with model accuracy becoming more stable as the number of training regions was increased. The influence that each input feature had on predicting landslide susceptibility was evaluated using the SHapely Additive exPlanations (SHAP) feature attribution method. Areas of high and very high susceptibility were associated with steep terrain including salt basins and escarpments. This case study serves as an initial assessment of the machine learning (ML) capabilities for producing accurate submarine landslide susceptibility maps given the current state of available natural hazard-related datasets and conveys both successes and limitations.
Journal Article
Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data
by
Bini, Giorgio
,
Krithara, Anastasia
,
Tartaglia, Gian Gaetano
in
Advertising executives
,
Alzheimer Disease - classification
,
Alzheimer Disease - genetics
2025
Complex diseases pose challenges in prediction due to their multifactorial and polygenic nature. This study employed machine learning (ML) to analyze genomic data from the UK Biobank, aiming to predict the genomic predisposition to complex diseases like multiple sclerosis (MS) and Alzheimer’s disease (AD). We tested logistic regression (LR), ensemble tree methods, and deep learning models for this purpose. LR displayed remarkable stability across various subsets of data, outshining deep learning approaches, which showed greater variability in performance. Additionally, ML methods demonstrated an ability to maintain optimal performance despite correlated genomic features due to linkage disequilibrium. When comparing the performance of polygenic risk score (PRS) with ML methods, PRS consistently performed at an average level. By employing explainability tools in the ML models of MS, we found that the results confirmed the polygenicity of this disease. The highest-prioritized genomic variants in MS were identified as expression or splicing quantitative trait loci located in non-coding regions within or near genes associated with the immune response, with a prevalence of human leukocyte antigen (HLA) gene annotations. Our findings shed light on both the potential and the challenges of employing ML to capture complex genomic patterns, paving the way for improved predictive models.
Journal Article
TNT: An Interpretable Tree-Network-Tree Learning Framework using Knowledge Distillation
by
Cai, Yun
,
Xiang, Xingchun
,
Li, Yiming
in
deep neural networks
,
distillable gradient boosted decision tree
,
interpretable machine learning
2020
Deep Neural Networks (DNNs) usually work in an end-to-end manner. This makes the trained DNNs easy to use, but they remain an ambiguous decision process for every test case. Unfortunately, the interpretability of decisions is crucial in some scenarios, such as medical or financial data mining and decision-making. In this paper, we propose a Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs. Specifically, the proposed TNT learning framework exerts the advantages of different models at different stages: (1) a novel James–Stein Decision Tree (JSDT) is proposed to generate better knowledge representations for DNNs, especially when the input data are in low-frequency or low-quality; (2) the DNNs output high-performing prediction result from the knowledge embedding inputs and behave as a teacher model for the following tree model; and (3) a novel distillable Gradient Boosted Decision Tree (dGBDT) is proposed to learn interpretable trees from the soft labels and make a comparable prediction as DNNs do. Extensive experiments on various machine learning tasks demonstrated the effectiveness of the proposed method.
Journal Article
Porosity prediction of tight reservoir rock using well logging data and machine learning
2025
The accurate quantification of porosity in tight reservoirs is crucial for optimizing oil and gas exploration and production. Traditional predictive models often face challenges such as high costs, low efficiency, and limited accuracy, hindering effective exploration activities. To address these issues, we apply advanced machine learning algorithms—gradient boosting decision tree (GBDT), random forest, XGBoost, and multilayer perceptron—using well logging data, including acoustic time (AC), well logging (CAL), compensating neutrons (CNL), density (DEN), natural gamma (GR), resistivity (RT), and spontaneous potential (SP). These models are further optimized with the particle swarm optimization (PSO) algorithm to enhance their predictive accuracy. Comparative analysis reveals that the PSO-GBDT model outperforms other models, achieving an R
2
exceeding 0.99. Validation on two additional wells confirms the model’s robustness, showcasing its superior predictive precision and efficiency. These findings suggest that the PSO-GBDT model has strong potential for improving porosity prediction in tight reservoirs, offering significant implications for future exploration and development efforts.
Journal Article
Nonlinear and Synergistic Effects of Built Environment Indicators on Street Vitality: A Case Study of Humid and Hot Urban Cities
2024
Street vitality has become an important indicator for evaluating the attractiveness and potential for the sustainable development of urban neighborhoods. However, research on this topic may overestimate or underestimate the effects of different influencing factors, as most studies overlook the prevalent nonlinear and synergistic effects. This study takes the central urban districts of humid–hot cities in developing countries as an example, utilizing readily available big data sources such as Baidu Heat Map data, Baidu Map data, Baidu Building data, urban road network data, and Amap’s Point of Interest (POI) data to construct a Gradient-Boosting Decision Tree (GBDT) model. This model reveals the nonlinear and synergistic effects of different built environment factors on street vitality. The study finds that (1) construction intensity plays a crucial role in the early stages of urban street development (with a contribution value of 0.71), and as the city matures, the role of diversity gradually becomes apparent (with the contribution value increasing from 0.03 to 0.08); (2) the built environment factors have nonlinear impacts on street vitality; for example, POI density has different thresholds in the three cities (300, 200, and 500); (3) there are significant synergistic effects between different dimensions and indicators of the built environment, such as when the POI density is high and integration exceeds 1.5, a positive synergistic effect is notable, whereas a negative synergistic effect occurs when POI is low. This article further discusses the practical implications of the research findings, providing nuanced and targeted policy suggestions for humid–hot cities at different stages of development.
Journal Article