Catalogue Search | MBRL

A large-scale, high-quality dataset for lithology identification: Construction and applications

by Song, Shun-Yao , Zhao, Xian-Zheng , Zhao, Zheng-Guang in Accuracy , Artificial intelligence , Artificial neural networks

2025

Lithology identification is a critical aspect of geoenergy exploration, including geothermal energy development, gas hydrate extraction, and gas storage. In recent years, artificial intelligence techniques based on drill core images have made significant strides in lithology identification, achieving high accuracy. However, the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets. This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset (DCID), addressing the need for large-scale, high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation. DCID consists of 35 lithology categories and a total of 98,000 high-resolution images (512 × 512 pixels), making it the most comprehensive drill core image dataset in terms of lithology categories, image quantity, and resolution. This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks (CNNs) such as VGG, ResNet, DenseNet, MobileNet, as well as for the Vision Transformer (ViT) and MLP-Mixer, based on DCID. Additionally, the sensitivity of model performance to various parameters and image resolution is evaluated. In response to real-world challenges, we propose a real-world data augmentation (RWDA) method, leveraging slightly defective images from DCID to enhance model robustness. The study also explores the impact of real-world lighting conditions on the performance of lithology identification models. Finally, we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets, advancing the application and development of new lithology identification models for geoenergy exploration.

Journal Article

Share this book

Add to My Shelf

Displacement Prediction of Newly-Established Monitoring Slopes Based on Lithology-Classified Integrated Dataset

by Yuan, Tian , Yanglanduo, Deng , Jianxue, Zhang in Datasets , Landslides , Lithology

2025

Limited monitoring data of newly-established monitoring slopes in universal landslide monitoring projects and their unavoidable poor representation ability of the deformation patterns have made the traditional single-slope modelling impossible. This paper proposes to classify the multi-slope integrated monitoring dataset based on the lithology of slopes and thus construct pre-trained models to apply to the newly-established monitoring slopes to improve the prediction performance. By integrating the monitoring data, the pre-train models can learn more deformation characteristics from the dataset than from only single-slope data. Moreover, by further classifying the integrated dataset based on the lithology of slopes, constructing different pre-training models, and applying them to newly-established slopes with corresponding lithology, it is feasible to enhance the classified dataset's ability to represent the deformation patterns of corresponding kind of slopes while still ensuring the volume of dataset of eac

Journal Article

Share this book

Add to My Shelf

A real-time intelligent lithology identification method based on a dynamic felling strategy weighted random forest algorithm

by Hou, Zhao-Kai , Xu, Rui , Yan, Tie in Accuracy , Algorithms , Artificial intelligence

2024

Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction, low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm (DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling (LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the real-time intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.

Journal Article

Share this book

Add to My Shelf

Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China

by Zhang, Yuan , Zhang, Junjie , Zhang, Junlong in Accuracy , Algorithms , Classification

2022

The identification of underground formation lithology is fundamental in reservoir characterization during petroleum exploration. With the increasing availability and diversity of well-logging data, automated interpretation of well-logging data is in great demand for more efficient and reliable decision making for geologists and geophysicists. This study benchmarked the performances of an array of machine learning models, from linear and nonlinear individual classifiers to ensemble methods, on the task of lithology identification. Cross-validation and Bayesian optimization were utilized to optimize the hyperparameters of different models and performances were evaluated based on the metrics of accuracy—the area under the receiver operating characteristic curve (AUC), precision, recall, and F1-score. The dataset of the study consists of well-logging data acquired from the Baikouquan formation in the Mahu Sag of the Junggar Basin, China, including 4156 labeled data points with 9 well-logging variables. Results exhibit that ensemble methods (XGBoost and RF) outperform the other two categories of machine learning methods by a material margin. Within the ensemble methods, XGBoost has the best performance, achieving an overall accuracy of 0.882 and AUC of 0.947 in classifying mudstone, sandstone, and sandy conglomerate. Among the three lithology classes, sandy conglomerate, as in the potential reservoirs in the study area, can be best distinguished with accuracy of 97%, precision of 0.888, and recall of 0.969, suggesting the XGBoost model as a strong candidate machine learning model for more efficient and accurate lithology identification and reservoir quantification for geologists.

Journal Article

Share this book

Add to My Shelf

A global temperature control of silicate weathering intensity

by Yang, Shouye , Deng, Kai , Guo, Yulong in 704/106/47/4113 , 704/47/4113 , Catchment scale

2022

Silicate weathering as an important negative feedback can regulate the Earth’s climate over time, but much debate concerns its response strength to each climatic factor and its evolution with land surface reorganisation. Such discrepancy arises from lacking weathering proxy validation and scarce quantitative paleo-constraints on individual forcing factors. Here we examine the catchment-scale link of silicate weathering intensity with various environmental parameters using a global compilation of modern sediment dataset ( n = 3828). We show the primary control of temperature on silicate weathering given the monotonic increase of feldspar dissolution with it (0–30 °C), while controls of precipitation or topographic-lithological factors are regional and subordinate. We interpret the non-linear forcing of temperature on feldspar dissolution as depletion of more reactive plagioclase (relative to orthoclase) at higher temperature. Our results hint at stronger temperature-weathering feedback at lower surface temperature and support the hypothesis of increased land surface reactivity during the late Cenozoic cooling. How silicate weathering responds to and regulates Earth’s climate remain controversial. This study suggests the primary control of temperature on weathering intensity globally and the temperature-weathering feedback may be stronger in cold Earth.

Journal Article

Share this book

Add to My Shelf

SoilGrids250m: Global gridded soil information based on machine learning

by Shangguan, Wei , Batjes, Niels H. , Guevara, Mario Antonio in Accuracy , Algorithms , Artificial intelligence

2017

This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods-random forest and gradient boosting and/or multinomial logistic regression-as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10-fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.

Journal Article

Share this book

Add to My Shelf

Well Logging Based Lithology Identification Model Establishment Under Data Drift: A Transfer Learning Method

by Li, Zerui , Cao, Yingchang , Wu, Yuping in Accuracy , Adaptation , Algorithms

2020

Recent years have witnessed the development of the applications of machine learning technologies to well logging-based lithology identification. Most of the existing work assumes that the well loggings gathered from different wells share the same probability distribution; however, the variations in sedimentary environment and well-logging technique might cause the data drift problem; i.e., data of different wells have different probability distributions. Therefore, the model trained on old wells does not perform well in predicting the lithologies in newly-coming wells, which motivates us to propose a transfer learning method named the data drift joint adaptation extreme learning machine (DDJA-ELM) to increase the accuracy of the old model applying to new wells. In such a method, three key points, i.e., the project mean maximum mean discrepancy, joint distribution domain adaptation, and manifold regularization, are incorporated into extreme learning machine. As found experimentally in multiple wells in Jiyang Depression, Bohai Bay Basin, DDJA-ELM could significantly increase the accuracy of an old model when identifying the lithologies in new wells.

Journal Article

Share this book

Add to My Shelf

A Data-Driven Approach for Lithology Identification Based on Parameter-Optimized Ensemble Learning

by Jiang, Baosheng , Xiao, Kang , Sun, Zhixue in Accuracy , Artificial intelligence , Bayesian Optimization

2020

The identification of underground formation lithology can serve as a basis for petroleum exploration and development. This study integrates Extreme Gradient Boosting (XGBoost) with Bayesian Optimization (BO) for formation lithology identification and comprehensively evaluated the performance of the proposed classifier based on the metrics of the confusion matrix, precision, recall, F1-score and the area under the receiver operating characteristic curve (AUC). The data of this study are derived from Daniudui gas field and the Hangjinqi gas field, which includes 2153 samples with known lithology facies class with each sample having seven measured properties (well log curves), and corresponding depth. The results show that BO significantly improves parameter optimization efficiency. The AUC values of the test sets of the two gas fields are 0.968 and 0.987, respectively, indicating that the proposed method has very high generalization performance. Additionally, we compare the proposed algorithm with Gradient Tree Boosting-Differential Evolution (GTB-DE) using the same dataset. The results demonstrated that the average of precision, recall and F1 score of the proposed method are respectively 4.85%, 5.7%, 3.25% greater than GTB-ED. The proposed XGBoost-BO ensemble model can automate the procedure of lithology identification, and it may also be used in the prediction of other reservoir properties.

Journal Article

Share this book

Add to My Shelf

A Novel Method of Multitype Hybrid Rock Lithology Classification Based on Convolutional Neural Networks

by Liu, Zida , Zhao, Junjie , Li, Diyuan in Accuracy , Algorithms , Analytical chemistry

2022

Rock lithology recognition plays a fundamental role in geological survey research, mineral resource exploration, mining engineering, etc. However, the objectivity of researchers, rock variable natures, and tedious experimental processes make it difficult to ensure the accurate and effective identification of rock lithology. Additionally, multitype hybrid rock lithology identification is challenging, and few studies on this issue are available. In this paper, a novel multitype hybrid rock lithology detection method was proposed based on convolutional neural network (CNN), and neural network model compression technology was adopted to guarantee the model inference efficiency. Four fundamental single class rock datasets: sandstone, shale, monzogranite, and tuff were collected. At the same time, multitype hybrid rock lithologies datasets were obtained based on data augmentation method. The proposed model was then trained on multitype hybrid rock lithologies datasets. Besides, for comparison purposes, the other three algorithms, were trained and evaluated. Experimental results revealed that our method exhibited the best performance in terms of precision, recall, and efficiency compared with the other three algorithms. Furthermore, the inference time of the proposed model is twice as fast as the other three methods. It only needs 11 milliseconds for single image detection, making it possible to be applied to the industry by transforming the algorithm to an embedded hardware device or Android platform.

Journal Article

Share this book

Add to My Shelf

Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran

by Bayat, Mahmoud , Pham, Binh Thai , Ahmadisharaf, Ebrahim in Datasets , Decision making , Floods

2019

Floods are some of the most destructive and catastrophic disasters worldwide. Development of management plans needs a deep understanding of the likelihood and magnitude of future flood events. The purpose of this research was to estimate flash flood susceptibility in the Tafresh watershed, Iran, using five machine learning methods, i.e., alternating decision tree (ADT), functional tree (FT), kernel logistic regression (KLR), multilayer perceptron (MLP), and quadratic discriminant analysis (QDA). A geospatial database including 320 historical flood events was constructed and eight geo-environmental variables—elevation, slope, slope aspect, distance from rivers, average annual rainfall, land use, soil type, and lithology—were used as flood influencing factors. Based on a variety of performance metrics, it is revealed that the ADT method was dominant over the other methods. The FT method was ranked as the second-best method, followed by the KLR, MLP, and QDA. Given a few differences between the goodness-of-fit and prediction success of the methods, we concluded that all these five machine-learning-based models are applicable for flood susceptibility mapping in other areas to protect societies from devastating floods.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter