Abstract
Data is the cornerstone of data science development. This implies the significance of emphasizing information protection found in oil and gas pipeline operating data generated throughout the development of digitalization in relevant companies. Despite of the abundance of sensitive information contained in real data, the availability of open public data sets for professionals in the industry is considerly scarce. This has in turn become the greatest obstacle in producing any forms of research achievements in the industry. At the same time, the sensor deviation leads to a weak reference of the simulation data obtained by the traditional physical formula. Generative deep learning networks are able to directly learn from data and generate high-fidelity as well as diversified samples, allowing them to complete data set generation and information desensitization. This paper presents the application of two deep generative model algorithms, i.e., VAE (Variational Auto-Encoder) and GAN (Generative Adversarial Networks), for pipeline operating data generation. Three main contributions have been made with this approach: (1) Research and release the pipeline operation simulation algorithm generated by the deep generation model based on the big historical data. (2) the algorithms were found to be capable of exporting operating data as a reference for the pumps according to specific pipeline parameters, e.g., pressure corresponding to a certain flow. A data set with the same probability distribution as the actual pipeline operation data could be generated; (3) data-distribution-based detection of abnormalities in pipeline operating conditions using a GAN algorithm identifier was proposed for the very first time. The experimental results presented < 1% mean square error (MSE) between the optimal model and the normalized real data — 0.4% to be more specific, verifying the algorithm’s capability of generating a desensitized data set similar to the real data. Moreover, a classifier trained with the generated data using the GAN algorithm identifier was able to detect as much as 69% abnormal operating conditions, further verifying the effectiveness of the proposed algorithm. Code are publicly available in https://github.com/ShawnHoo7256/BOGC_Project.