Abstract
Machine learning (ML) has grown extensively in most industries, with learning models driving state of the art performance in a variety of tasks. The in-line inspection (ILI) industry is no exception: applications of machine learning techniques have provided promising results for a wide range of needs. Metal loss anomaly sizing, fitting classification, and identification of interacting threats have all benefited from different forms of learning models. The success of any learning model requires detailed attention at all stages of the process, as small nuances often manifest misleading results.
The ability of an ILI data analyst to accurately identify pipeline anomalies, most importantly anomalies that affect the integrity of a pipeline, is based on experience. Experience can be sub-divided into two categories, observations and truth, which allow a data analyst to identify patterns and make predictions. Supervised learning models mimic this process by using a mapped set of inputs (observations) to outputs (truth) to develop a mathematical function that can be applied to new examples. The input parameters, referred to as training data, are a vector of engineered features relating to the desired output. Like a data analyst’s experience, the quantity, quality and representation of the training data directly influence the performance of predictions made by the model.
Data curation, a process that includes the collection, analysis and labeling of data used to train the mathematical model, is critical and time-consuming. When curating data for the development of a supervised model, two main obstacles must be overcome. First, data from two measurement methods, for example, magnetic flux leakage (MFL) and in-the-ditch non-destructive examination (NDE), must be precisely correlated. Second, the training data must represent the population of future predictions and minimize coverage error by containing enough appropriate examples. Metal loss anomalies examined by NDE are correlated to the predicted geometries of interacting ILI signatures that do not always align one-to-one. The estimated position on the pipe is made from onboard ILI tool instruments and is subject to measurement accuracy. The small errors contained in each measurement method can make precisely matching the NDE measurements to the ILI measurements a tedious effort and minimizing the compounding errors requires diligence.
The one-to-many and direct-to-indirect measurement relationships between NDE and ILI make generalizing input features and labeling training data a challenging task. In addition to overcoming the difficulties associated with correlating and labeling training data, significant attention must also be given to the distribution of features being represented. This includes metal loss geometry as well as the engineered model input features. With dozens of dimensions included in the input vector of supervised learning models, representing the possible permutations can be overwhelming. Using NDE external laser scans to maximize training data and dimensionality reduction techniques can help, but do not remove all underlying contributions.
After curating, developing, and training a model, it is possible to assess the associated performance, but this should be approached with cautious optimism. To accomplish this, a general machine learning practice is to hold out a portion of the training data to test model predictions while an iterative development cycle refines the engineered features, model architecture and training process. Since no two pipelines or inspections are identical, validation against blind data is crucial. Although the standard training hold-out practice may make data appear to be blind, it often includes highly correlated features and the iterative development cycle may inadvertently lead to bias. Attention to the distribution of blind hold-out data helps verify the generalization of a model and more accurately represents future performance.
This paper describes the development of an ILI machine learning model at each stage of the process and contrasts the performance of correctly utilizing data in training and testing to that of incorrectly utilized data.