To main content

Evaluating the generalizability and transferability of water distribution deterioration models

Abstract

Small utilities often lack the required amount of data to train machine learning-based models to predict pipe failures, and hence are unable to harness the possibilities and predictive power of machine learning. This study evaluates the generalizability and transferability of a machine learning model to see if small utilities can benefit from the data and models of other utilities. Using nine Norwegian utilities’ datasets, we trained nine global models (by merging multiple datasets) and nine local models (by utilizing each utility's dataset) using random survival forest. Several pre-processing techniques including addressing left-truncated break data and break data scarcity are also presented. The global models and three of the local models were tested to predict the pipe failure of the utilities which were not included in their training datasets. The results indicate that the global models can predict other utilities with sufficient accuracy while local models have some limitations. However, if a representative utility with a sufficiently large (and information rich) dataset is selected, its model can predict the other utility's pipe breaks as accurate as the global models. Furthermore, survival curves for defined cohorts as proxies for uncertainty, and variable importance show that pipes with and without previous breaks behave extremely different. With the understanding of models’ generalizability and transferability, small utilities can benefit from the data and models of other utilities.
Read the publication

Category

Academic article

Language

English

Author(s)

  • Shamsuddin Daulat
  • Marius Møller Rokstad
  • Stian Bruaset
  • Jeroen Langeveld
  • Franz Tscheikner-Gratl

Affiliation

  • SINTEF Community / Infrastructure
  • Delft University of Technology
  • Norwegian University of Science and Technology

Year

2024

Published in

Reliability Engineering & System Safety

ISSN

0951-8320

Volume

241

View this publication at Norwegian Research Information Repository