Abstract

Performing reliability assessments for a large asset inventory of pipeline facility equipment, such as compressor station assets, requires a substantial dataset of attributes for a diverse range of equipment types. In many cases, equipment data inventories have gaps, with one or more required attributes unknown, such as diameter, wall thickness, operating pressure or material properties. The identification and collection of complete records is typically labor-intensive and time consuming, so data gaps are often filled with assumptions while ongoing data collection improves. A standard approach to fill these gaps is to use conservative assumptions for missing attributes. This results in missing data producing higher assessed risk than complete records. The benefit of this conservative approach is that it appropriately penalizes the incomplete records, driving action toward collecting the information where it matters. However, this approach is simple, does not leverage all the information available within the available dataset, and can produce a distorted representation of risk that may reduce the credibility of the risk assessment.

This paper describes a process to use unsupervised machine learning algorithms to organize large asset inventories into groups and fill data gaps with reasonable, but conservative assumptions. We used a non-hierarchical clustering method to group asset records into clusters. Instead of using the most conservative value to fill data gaps across all records, gaps are filled using the most conservative value from similar records. This method provides estimates for data gaps that are more realistic while still maintaining conservatism, striking a balance between prioritizing equipment with confirmed attributes that indicate higher risk and equipment with little information.

The approach described in this study relies on K-means clustering. We discuss the practical uses of dimensionality reduction, heuristic techniques for selecting the number of clusters, and sensitivity analysis.

This content is only available via PDF.
You do not currently have access to this content.