This paper describes a selection of Baker Hughes, a GE company (BHGE) activities to support Gas Turbine (GT) design and operation from simple to more elaborate applications of Machine Learning (ML).


In the era of supercomputers, high-speed networks, and reliable sensors, the amount and quality of data on Gas Turbine (GT) performance, operation, and reliability constantly grows. Data come from computer simulations with different degree of accuracy, from testing, and from engines real-time monitoring. Regardless of their origin, the challenge today is how to take full advantage of the information hidden in terabytes of data. As field data may deviate from prediction of performance, emissions and life, it is necessary to reconcile field data and highfidelity computer simulations with simplified physics-based engineering models routinely used in design and monitoring activities. Field data allow improving performance, reliability and life of assets, and can flow down to the design tools to improve accuracy and enable better performance and reduced risks. Baker Hughes, a GE company (BHGE) along with General Electric, embraced the “Digital Twin” (DT) concept. Digital Twin consists of pairing every asset in the field with a numerical model built from components and component-to-component interaction sub-models constantly interrogated on machine operation input data. The analysis of model predictions and field data is used to determine if the GT is experiencing a malfunction, if it could be operated more efficiently, or if the digital model requires a re-calibration. A similar approach is valid for high-fidelity computer simulations, where the results allow to challenge and tune design models. The DT predictive capability improves with different Machine-Learning (ML) methods that extract physics-based engineering relevant information from both field and computer simulation data. Along these lines, this paper describes a selection of BHGE activities to support GT design and operation from simple to more elaborate applications of ML.

Component Design

ML is used to improve the heat transfer prediction accuracy in GT high pressure turbines (HPT). The design is carried out by proprietary correlations, while verifications use Reynolds averaged Navier-Stokes (RANS), which has difficulties to predict film cooling efficacy. ML is used to improve heat transfer models in RANS by interrogating large scale numerical data sets generated by Large Eddy Simulations (LES), that guarantee very high accuracy, but a computational effort incompatible with design iterations. Figure 1a shows the instantaneous flow field in an HPT nozzle predicted by LES. While these models are not fully applicable yet, Figure 1b shows how RANS film cooling effectiveness prediction accuracy in the HPT trailing edge region can improve thanks to ML (Sandberg et al. 2018).


Asset Operation & Maintenance

With the maturation of Industrial Internet of Things (IIoT) technologies, BHGE is able to monitor over 1,000 assets in more than 30 countries, collecting millions of running hours data through three global centers strategically located in different time zones (Florence-Italy, HoustonUSA and Kuala Lumpur-Malaysia). The monitoring service consists of continuous acquisitions of more than 1,000 parameters per asset, automatically processed through ML tools to detect anomalies or deviations from expectations, and to calibrate GT models, as described below.

A) Fault Detection and Diagnostics

Analytics must satisfy stringent requirements to ensure proper management of the monitored fleet and must be capable to flag anomalies when specific fault signatures are detected. ML classifies events into a given set of categories based on past observations. The classification algorithm training is a supervised process, in which data labeled normal or faulty is fed to the algorithm, which learns to predict the output class based on the provided input. GT combustion monitoring is a typical example from BHGE. A multiclass classification algorithm is applied to GT exhaust temperature profiles to determine, nearly real-time, the conditions in which combustors may be operating. Here, a logistic regression classifier is trained on labeled patterns extracted from real data corresponding to GT combustor normal and faulty conditions. Figure 2 shows the combustor exhaust temperature (EGT) polar plots in four possible cases that, if promptly detected, allow better performance if compared to conventional diagnostic systems based on the monitoring of the exhaust spread. The automatic recognition of these specific fault signatures in the EGT profile not only warns operators about possible issues, but it indicates their severity and suggests the problem root cause (Allegorico and Mantini, 2014).


B) Data-Driven Tools for Early Warning Diagnostic

In real world applications, classification methods suffer of the scarcity of fault conditions data. A solution consists of using ML to predict only normal conditions via Auto-Associative Kernel Regression (AAKR) nonparametric empirical models based on historical fault-free observations that allow to monitor the deviation from measured values. Deviations may indicate changes from reference conditions, which can be precursors of incipient failures. Early warning algorithms allow to intervene faster than standard rules based on static thresholds. BHGE routinely uses AAKR to discover any deviations from expected operation. Figure 3 shows the case in which the monitoring of the residuals of a temperature signal can flag alarms much earlier than the use of a static threshold.


Maintenance Optimization

Analyses of GT part’s durability aim at enhancing assets availability. Detailed models based on computational fluid dynamics (CFD) and finite element methods (FEM) provide full machine aero-mechanical-thermal analyses, but their computational cost limits the application to design-conditions.


CDM predicts the residual life using measured data from BHGE Remote Monitoring & Diagnostics centers and key GT operating parameters computed by Artificial Neural Networks (ANN) as a high-speed surrogate of the full GT thermodynamic model. The full model is used once to feed the training data to the ANN model. CDM is calibrated with field failure data by using a Bayesian approach that includes data uncertainty. The advantages of the CDM models are their computational efficiency, that allows to quickly assess different operational scenarios, their ability to incorporate actual field operation to avoid risky operational data approximations, and the possibility to combine multiple models for risk assessment together with actual failure information tracking on failure modes.

Challenges and Conclusions

OEM can improve their physics-based assets models through data driven ML. Nevertheless, the big challenge is to guarantee high data quality and proper domain expertise to extract relevant insights that can guide immediate actions on asset operation or drive design improvements by means of ML techniques. Such considerations equally apply to components, full assets and systems.


, et al. 
Applying Machine Learnt Explicit Algebraic Stress and Scalar Flux Models to a Fundamental Trailing Edge Slot.
ASME Turbo Expo. Lillestrøm, Norway: ASME.
A data-driven approach for on-line gas turbine combustion monitoring using classification models. European Conference of the Prognostics and Health Management Society
July 8-10, Nantes, France.