Application of automated machine learning and clustering algorithm for data-driven site characterization: Predicting the soil-rock interface

Research output: Contribution to journalArticlepeer-review

Abstract

The development of underground spaces requires detailed insight into subsurface conditions, particularly the soil– rock interfaces, as this information is crucial for the effective design and safe construction of underground infrastructures. Traditional geotechnical site investigations rely mainly on direct drilling and sampling; however, these methods yield data only at specific investigation points, thus posing limitations in comprehensively capturing ground conditions across an entire area. To address this limitation, various studies have aimed to predict unknown subsurface sections using existing borehole data. Conventional methods use geospatial interpolation, while machine learning has emerged as a strong alternative. The selection and proper tuning of an appropriate model are critical to achieving optimal performance. This study applies automated machine learning, focusing on predicting soil-rock interfaces in unsampled regions using borehole data. AutoGluon is used as the machine learning framework to automate data preprocessing, model selection, hyperparameter tuning, and model ensemble. For this study, approximately 20,000 boreholes from the Seoul metropolitan area were collected and employed. Additionally, various digital maps were used to extract input variables. To capture non-linearity among input variables, Uniform Manifold Approximation and Projection were employed to reduce the dimensionality of the dataset, while Hierarchical Density-Based Spatial Clustering of Applications and Noise was implemented as the clustering algorithm. When compared to a model tuned using Bayesian optimization, AutoGluon exhibited superior predictive performance and reduced errors. Furthermore, although the focus of this study is on predicting the soil-rock interface, the methodology can be extended to the prediction of other geotechnical parameters.

Original languageEnglish
Pages (from-to)321-332
Number of pages12
JournalGeomechanics and Engineering
Volume42
Issue number5
DOIs
StatePublished - 10 Sep 2025

Keywords

  • automated ML
  • clustering
  • data-driven
  • soil-rock interface
  • spatial prediction

Fingerprint

Dive into the research topics of 'Application of automated machine learning and clustering algorithm for data-driven site characterization: Predicting the soil-rock interface'. Together they form a unique fingerprint.

Cite this