Abstract
The total volume of data in the subsurface is tremendous and grows exponentially each year. Sufficiently and effectively utilizing the subsurface data from multiple sites will advance the knowledge required for innovations that power the future growth of the energy industry. This research article explores the increasing needs and challenges for cross-domain knowledge discovery and sharing in the digital subsurface, and demonstrates the application of federated learning for improving the generalization ability and prediction accuracy of machine learning models using Rate of Penetration prediction as a field case study.
The lack of effective privacy and ownership protection solutions restricts the effective utilization of subsurface data across different organizations. However, due to the significant geological heterogeneity in reservoir formations and the severe long-tailed distribution, the machine learning models trained from a single data site suffer from overfitting and weak generalization ability. Federated learning provides a possible solution for this dilemma by training machine learning algorithms from multiple client datasets without exposing the data. Following the client-server architecture, clients train local models on their local datasets and exchange parameters with the central server at some frequency to generate and improve the global model.
In this paper, we demonstrate the application of federated learning for knowledge discovery and sharing across multiple sites using field data from multiple resources to train an advanced Long Short-Term Memory (LSTM) artificial neural network for drilling Rate of Penetration prediction. We discuss the learning and predictive performance of the global model, which outperforms local models in generalization ability and prediction accuracy due to the additional datasets contributing to revealing the actual data distribution. While this study provides promising results, potential limitations of this approach in the energy industry are also discussed.