## Abstract

In this paper, we propose a sparse modeling method for automatically creating a surrogate model for nonlinear time-variant systems from a very small number of time series data with nonconstant time steps. We developed three machine learning methods, namely, (1) a data preprocessing method for considering the correlation between errors, (2) a sequential thresholded non-negative least-squares method based on term size criteria, and (3) a solution space search method involving similarity model classification—to apply sparse identification of nonlinear dynamical systems, as first proposed in 2016, to temperature prediction simulations. The proposed method has the potential for wide application to fields where the concept of equivalent circuits can be applied. The effectiveness of the proposed method was verified using time series data obtained by thermofluid analysis of a power module. Two types of cooling systems were verified: forced air cooling and natural air cooling. The model created from the thermofluid analysis results with fewer than the number of input parameters, predicted multiple test data, including extrapolation, with a mean error of less than 1 K. Because the proposed method can be applied using a very small number of data, has a high extrapolation accuracy, and is easy to interpret, it is expected not only that design parameter can be fine-tuned and actual loads can be taken into account, but also that condition-based maintenance can be realized through real-time simulation.

## 1 Introduction

Methods have been developed for creating models that can be easily used in simulations by analyzing data obtained from experiments or detailed numerical analyses performed by machine learning. Such models are called surrogate models because they can be a substitute for detailed numerical analyses, which are computationally inefficient. Surrogate models represented in lower dimensional space are also called reduced order models (ROMs). ROMs describe the system of interest as a parametric dynamical system. This method can be applied in various fields, including the mechanics of materials and electrical engineering, as well as heat transfer engineering, which is the subject of this research. In general, surrogate models are created using a data-driven approach and they can handle complex shapes directly. With advantages in terms of accuracy and computational efficiency, surrogate models can be used to improve the sophistication and efficiency of the detailed design and operation of products and systems. Accordingly, these models attracted increasing attention as a candidate for digital twin technology for realizing cyber-physical systems.

Previous studies have used machine learning to create surrogate models that take physical information into account, including devising inductive bias, with the aim of improving generalizability, reducing the amount of learning data, and improving interpretability [1–3]. Taking neural networks as an example, methods for devising loss functions [4–15] and model structures [16–24] have been reported.

As for devising loss functions, Raissi et al. [4] considered physical properties such as partial differential equations, initial conditions, and boundary conditions in the loss function by using automatic differentiation. This method has been widely applied [5–10]. Greydanus et al. [11] defined a loss function that satisfies the canonical equation used in Hamiltonian mechanics; this method was later extended to Lagrangian mechanics [12,13].

As for devising model structures, Chen et al. [21] proposed a method for treating the neural network as an ordinary differential equation by making the layers continuous, applying the idea of the residual network [25]. That method has been widely applied because ordinary differential equations are often used as mathematical models of physical phenomena. In addition, the application of graph and neural network theory [26–28] has been attracting attention as a general approach for handling complex shapes. For example, Sanchez-Gonzalez et al. [17] applied graph and neural network theory to particle-based simulation, while Pfaff et al. [18] and Horie et al. [19,20] applied it to the finite element method. A framework incorporating the method proposed by Raissi et al. [4] was recently proposed for solving partial differential equation-governed forward and inverse problems [29].

These methods are superior to neural networks that do not consider physical information in terms of generalization ability, number of learning data, and interpretability. However, the surrogate model remains insufficient for solving social problems. For example, when targeting actual complex products or systems, there are often challenges such as memory usage and the large amount of time required for learning, even with high-performance computers.

In this paper, we propose a nonlinear time-variant reduced-order model, a sparse modeling method for system identification that is superior to conventional methods in terms of generalization ability, number of learning data, and interpretability. The proposed method improves the applicability to engineering simulations of sparse identification of nonlinear dynamics (SINDy), which was proposed by Brunton et al. [30] as an extension of symbolic regression [31]. Symbolic regression has evolved in the framework of genetic programming [32–36]. Recently, neural network approaches [37–39] and sparse modeling approaches [30,40] have been increasingly introduced to improve search efficiency. Among them, SINDy is a heuristic method based on simple sparse linear regression that is highly extensible and has been widely applied [41]. In SINDy, the coefficients of a library composed of basis function candidates were estimated using a newly developed sparse regression method, under the assumption that the true model can be represented by a linear combination of basis functions, including nonlinear functions. Using basis function candidates to place constraints on the form of the model has both advantages and disadvantages, and the proposed method maximizes the advantages of this constraint.

Specifically, we constrain the form of the output model by using the thermal network method, which applies the framework of equivalent circuits to predict temperature. The mathematical model of the thermal network method is a simultaneous ordinary differential equation, and it is highly compatible because it can be denoted in the form of a linear regression equation, which is the framework of SINDy. The thermal network model can be interpreted as a graph because the nodes, which are collocation points, are connected by thermal resistance. Each node stores a state quantity (i.e., temperature), and its changeability is represented by its heat capacity. The number of nodes is very small compared with the detailed numerical analysis performed using the finite volume method or the finite element method. In other words, dimensionality reduction methods such as proper orthogonal decomposition and convolutional neural network-based auto-encoder (CNN-AE) are not necessary. Therefore, the library can be configured in a form that follows physical laws, including theoretical and empirical formulas developed over the long history of heat transfer engineering, and the created model has superior generalization ability and interpretability.

The proposed nonlinear time-variant reduced-order model, based on a sparse modeling method, comprises the following three novel machine learning methods. (1) A data preprocessing method that considers the correlation between errors. This method predicts not only a few steps ahead but also very many steps ahead with high accuracy. (2) A sequential thresholded non-negative least-squares method based on term size criteria efficiently performs the selection of basis functions as well as coefficient estimation by suppressing the effect of multicollinearity among basis function candidates. It improves upon the sequential thresholded least-squares algorithm (STLS) sparse regression method proposed in the SINDy framework. (3) A space search method based on similar basis function classification assigns the basis function candidates to groups and controls each group individually to achieve a physically valid combination of basis functions.

The proposed sparse modeling method has the following features. (1) The created ROM satisfies the law of conservation of energy, has good generalization ability, and can be extrapolated to some extent. (2) It is capable of modeling nonlinear time-variant systems. It can handle a wider range of phenomena compared with the method for creating ROMs under time-invariant assumptions [42,43]. In addition, it is widely applicable to fields where the concept of equivalent circuits can be applied. (3) It can be applied to systems where the parameter values used in the governing equations differ according to the area. (4) There are few constraints on the time series data to be learned. It can handle cases in which the time-step $\Delta t$ of the time series data is not fixed. (5) The content of the ROM is easy to interpret. (6) ROM creation does not require a large amount of resources. It is also possible to create ROM from a number of detailed numerical analysis trials that is less than the number of input parameters. Therefore, less effort is required for data collection. In addition, little knowledge of heat transfer engineering is required for ROM creation. This is because the data reveal whether forced convective, natural convective, or radiative heat transfer are occurring.

In addition, the following should be noted. (1) Unknown phenomena cannot be treated with the proposed method. This is because we cannot deal with phenomena when the form of their basis functions cannot be conceptualized. (2) Temperature nodes are selected in advance, although strictness is not required. (3) The proposed method does not derive a governing equation but rather an approximate model that is consistent with physical laws and is easy to interpret.

## 2 Background

### 2.1 Thermal Network Method.

Here $T$ is the temperature, $C$ is the heat capacity, $R$ is the thermal resistance, $Q$ is the caloric value, and $N$ is the number of nodes. Temperatures at the nodes are calculated by solving the simultaneous ordinary differential equations in Eq. (1). Because no discretization error occurs in the thermal network method, it is possible to solve the problem with a dimensionality a few orders of magnitude smaller than that in detailed thermofluid analysis. The thermal network model can be interpreted as a graph, with directed edges for air flow and undirected edges for other variables.

Constructing Eq. (1) deductively requires a deep understanding of physical phenomena and a large amount of time. The actual phenomena and structures, which are simplified from a physical point of view, as well as parameters such as thermal resistance and heat capacity are set. The heat capacity is determined mainly by the shape and physical properties. Thermal resistance is determined not only by the shape and physical properties but also by state quantities such as temperature and velocity. In cases where thermal resistance cannot be set theoretically, it is necessary to select an empirical formula suitable for the target object from the many candidates. Therefore, the quality of the model is highly dependent on the ability of the engineer.

### 2.2 Sparse Identification of Nonlinear Dynamical Systems.

Here $X$ is an $m\xd7n$ matrix, $m$ is the number of time samples, $n$ is the dimension of $X$, $\Theta (X)$ is a library consisting of basis function candidates, and $\Xi $ is a sparse vector of coefficients, which can be estimated using STLS, a novel sparse regression method. An example STLS flowchart is shown in Fig. 1.

In STLS, after estimating the least-squares solution of $\Xi $, the process for modifying coefficients $\xi $, which are elements of $\Xi $ smaller than the cutoff, is repeated. The model selection criteria represent the accuracy and simplicity of the model. The cutoff is a hyperparameter that greatly affects the quality of the model and is modified each time during the outside loop. In the later part of the inside loop, the number of coefficients to be modified can be zero, which is an advantage compared with some other methods, such as the least absolute shrinkage and selection operator method, in which the absolute values of the coefficients are shrunk by the thresholding operation. $X\u02d9$ is contaminated with noise due to its numerical approximation; STLS is a robust approach for such noisy data, and it creates equations that balance accuracy and complexity.

SINDy has been applied in various ways [44–51]. For example, Ruby et al. [44] applied ridge regression as the regression method for STLS in order to identify partial differential equations. In addition, Fukami et al. [47] used thresholded least squares and adaptive least absolute shrinkage and selection operator as the regression methods for STLS and applied them to a flow field with reduced dimensionality by CNN-AE. Furthermore, Mangan et al. [48] proposed a method to find a parsimonious model for a system by combining information criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC) with SINDy. Improvements in noise robustness have also been studied. For example, Cortiella et al. [49] proposed an approach that selects regularization parameters based on the corner point criterion of Pareto curves; Fasel et al. [50] proposed an approach that involves bootstrap aggregation; and Schaeffe et al. [51] proposed an approach based on the integral formulation.

## 3 Proposed Method of Sparse Modeling: Nonlinear Time-Variant Reduced-Order Model for Temperature Prediction

### 3.1 Overview of the Sparse Modeling Method.

Figure 2 shows a schematic flowchart of the proposed new sparse modeling method.

To create a ROM in the form of Eq. (1), rather than reducing the variable to a lower dimension by using proper orthogonal decomposition or CNN-AE, we treated the temperature $Tn$ itself, which is represented by the collocation points. Note that the proposed method includes data-driven aspects, so it is not necessary to perform this task strictly. In other words, the temperature node does not have to be the mean temperature in the control volume; local values at arbitrary points are acceptable. Furthermore, the state quantity (e.g., temperature) is sufficient information to be stored at the nodes, and coordinate data are not required.

The vector $X$ of factors that influence the temperature is composed of factors that vary significantly in the time series data. For example, if the size or material of the modeling target does not change, it is removed from $X$. Except in cases where the temperature range in the time series data is very wide, the effects of the temperature dependence on physical properties can be learned properly.

Equation (9) shows the setting that takes into account the effect of the volumetric thermal expansion assuming an ideal gas. As in the case of forced convective heat transfer, a range can be specified for the exponent $\beta $. Thus, by utilizing domain knowledge, it is possible to appropriately set basis function candidates for known phenomena.

In this research, we first constructed a library for forced convective heat transfer, natural convective heat transfer, radiative heat transfer, and thermal conduction (including thermal contact resistance) as well as a source term to deal with forced and natural air cooling. Then, we performed sparse regression for the selection of basis functions and for coefficient estimation. However, because of the various problems described below, conventional machine learning methods did not work well. Therefore, we developed three new machine learning methods, which are described in Sec. 3.2.

### 3.2 Three Proposed Methods for Machine Learning

#### 3.2.1 Data Preprocessing Method for Considering the Correlation Between Errors.

In the preprocessing step of calculating $T\u02d9$ and $\Theta (T,X)$ in Eq. (3) from the time series data, the time information is lost and the data become independent. The first term of the objective function, which can be the root-mean-squared error (RMSE) or mean absolute error (MAE), does not consider whether the error is positive or negative. In addition, it is difficult to apply seasonal adjustment and high-order autocorrelation in our approach. Therefore, some novel method is needed to create a ROM that can predict the very many steps ahead with high accuracy.

*L*is the number of datasets, and $tl,j$ denotes the time sample number $j$ in the dataset $l$. Equation (12) shows the definition of $\Delta TFI$

Equation (14) which represents the long-term component, includes terms of the correlation between errors that are not included in Eq. (16), which represents the short-term component. To take advantage of this feature, the fixed interval $FI$ should be wider. The number of possible combinations of the first term and the second term in the second equation in Eq. (14) is $FI:\u2009C2FI\u2009$, so that the second term dominates when $FI$ is wide. However, a wide $FI$ is problematic because it allows the correlation of errors between samples that are far apart. Hence, we propose combining multiple long-term components with different $FI$. Figure 3 shows the variance– covariance matrix $\Omega \u2009$of the error when the number of datasets $L$ is 3. The same procedure can be used for difference operators other than the forward difference.

#### 3.2.2 Sequential Thresholded Non-Negative Least-Squares Method Based on Term Size Criteria.

Figure 4 is a flowchart of the novel sparse regression method, which is a sequential thresholded non-negative least-squares method based on term size criteria. This method improves upon the STLS. Steps 1 and 2 in Fig. 4, which are improvements, are described below.

In step 1, coefficient estimation is performed by suppressing the effect of multicollinearity among basis function candidates. In cases with more than a few dozen temperature nodes, there are naturally combinations of highly correlated temperature nodes. In these cases, estimates for coefficients with large absolute values are scattered when the least-squares method is used. Terms having large absolute values of the coefficients are not eliminated in step 2, which selects the basis functions, and thus, an unstable ROM is output. Hence, bearing in mind that the sign of the coefficients can be controlled by devising the setting of the basis function candidates, we applied a regression method with the weak constraint that the coefficients be non-negative [53]. The implementation of this method is simple, and it is also advantageous in that the computational efficiency is not significantly reduced compared with the least-squares method.

Here, $\lambda $ is a hyperparameter that takes a value of less than 1. If the value calculated in Eq. (17) is smaller than some threshold value (e.g., Eq. (18)), the corresponding coefficient is modified to 0, similar to a hard-thresholding operator. The max function in Eqs. (17) and (18) can be changed to a form corresponding to normalization methods such as the min–max or Z-score.

It can be seen from Eqs. (17) and (18) that the coefficients and threshold are corrected based on the data of the short-term component. Figure 5 shows the results when the data preprocessing method described in Sec. 3.2.1 is applied to time series data obtained from thermofluid analysis. Figure 5(a) shows the values on the left-hand side of the equation, and Fig. 5(b) shows the values of a basis function candidate on the right-hand side of the equation. In both graphs, the left side illustrates the values of the short-term component, and the right side illustrates the values of the long-term component. It can be seen that the means and standard deviations of the short-term component and the long-term component are significantly different. Because the long-term component is normalized such that the RMSE of the short-term component dominates, the long-term component does not significantly disturb the mean and standard deviation of the short-term component on the left-hand side of the equation. However, as for the right-hand side of the equation, the long-term component significantly disturbs the mean and standard deviation of the short-term component. The strong effect of the long-term component on the selection of basis functions should be avoided. However, it is difficult to solve this problem by using the normalization of basis function candidates and the concept of hypothesis testing.

Here $SSii$ represents the diagonal elements of the inverse matrix of the mean square deviations and the sum of the products of the deviations, and $Ve$ is the error variance. The data preprocessing method described in Sec. 3.2.1 involves mixed data extracted from different populations, so $SSii$ is meaningless. Therefore, selection of basis functions by using a $t$-test does not work effectively.

The proposed novel sparse regression method can easily remove the effect of the long-term component. In addition, because the proposed method selects basis functions according to a relative criterion, it is particularly effective for thermal problems in which the time constants differ significantly for each temperature node.

#### 3.2.3 Space Search Method Based on Similar Basis Function Classification.

As mentioned earlier, the library is composed of various physical models. Because of the wide variety of heat transfer phenomena, there are many candidate combinations of basis functions that are highly correlated in the library. However, some combinations of physical models are not physically valid. Therefore, it is difficult to obtain the expected ROM only by devising a sparse regression method. For this reason, we developed a method to probabilistically extract basis function candidates and construct a candidate set of simultaneous ordinary differential equations, as shown in Fig. 6, which illustrates a detail of the upper right part of the novel sparse regression method shown in Fig. 2.

Inside the library, we defined sets for each heat transfer mode, which we named sublibraries, such that the library comprises a family of sets. For sublibraries containing multiple physical models, an extraction probability was assigned to each physical model. To illustrate this idea using Eq. (8), $v\alpha 1(Tj\u2212Ti)$ is physical model 1($\theta A1$) and $v\alpha 2(Tj\u2212Ti)$ is physical model 2($\theta A2$). Because there are multiple temperature nodes, each physical model consists of multiple basis function candidates. The classification of basis function candidates is similar to group lasso [54]. The purpose of the proposed method is to limit the combination of physical models.

First, Eq. (13) is constructed using the basis function candidates represented by the physical models extracted according to the extraction probabilities, and then the sparse regression is performed. This process is iterated several times to create several ROM candidates. After that, the model selection criteria, including the information criteria for each ROM candidate, are calculated and the extraction probabilities of the physical models in the next iteration are determined based on the values of these criteria. The model selection criteria can simply be the sum of the error term (e.g., RMSE) and the L0 norm. Knowledge of the thermal network method can also be applied to create better ROMs. The number of basis functions is not linearly related to the quality of the ROM, so it is important to devise a way to keep them within an appropriate range, for example, using a nonlinear function with the L0 norm as a parameter. Methods such as AIC and BIC, which involve the logarithm of the residual sum of squares, did not work well with the proposed method.

We also modify the hyperparameter $\lambda $ of the threshold shown in Eq. (18) along with the extraction probabilities. The hyperparameter λ does not suffer from complex multimodality, so no special method is required to determine the amount and sign of the hyperparameter modification (e.g., comparing the value of the model selection criteria of the current iteration with the value of the previous iteration). As more iteration are repeated, the probability of extracting a physical model that fits the data becomes higher, and thus, more simultaneous equations in the identical iteration overlap; that is, they have the same form. These overlapping simultaneous equations can be combined into a single, and thus, the time required for each iteration becomes progressively shorter.

## 4 Effectiveness of the Proposed Method

The effectiveness of the proposed sparse modeling method was verified using time series data obtained by thermofluid analysis of a power module mounted on a comb-shaped heat sink.

### 4.1 Power Module Mounted on a Forced Air Cooling Heat Sink

#### 4.1.1 Thermofluid Analysis.

Figure 7 shows the target of the analysis. The heat-generating components were 12 chips. Tables 1 and 2 show the dimensions and physical properties of the analysis target, Table 3 shows the thermofluid analysis conditions for the learning and validation data, and Tables 4 and 5 show the analysis conditions for the test data.

Thickness (mm) | Width × depth (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) | |
---|---|---|---|---|---|

Chip | 0.25 | 8 × 8 | 400 | 656 | 3210 |

Part A | 0.1 | 8 × 8 | 73.2 | 226 | 7300 |

Part B | 0.3 | 94 × 25 | 390 | 380 | 8960 |

Part C | 0.3 | 94 × 40 | 70 | 680 | 3200 |

Part D | 0.3 | 94 × 40 | 390 | 380 | 8960 |

Part E | 0.3 | 94 × 40 | 73.2 | 226 | 7300 |

Part F | 3.0 | 120 × 60 | 390 | 380 | 8960 |

Thickness (mm) | Width × depth (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) | |
---|---|---|---|---|---|

Chip | 0.25 | 8 × 8 | 400 | 656 | 3210 |

Part A | 0.1 | 8 × 8 | 73.2 | 226 | 7300 |

Part B | 0.3 | 94 × 25 | 390 | 380 | 8960 |

Part C | 0.3 | 94 × 40 | 70 | 680 | 3200 |

Part D | 0.3 | 94 × 40 | 390 | 380 | 8960 |

Part E | 0.3 | 94 × 40 | 73.2 | 226 | 7300 |

Part F | 3.0 | 120 × 60 | 390 | 380 | 8960 |

Height (mm) | Length (mm) | Width (mm) | Fin thickness (mm) | Number of fins |
---|---|---|---|---|

80 | 80 | 163 | 2 | 24 |

Height (mm) | Length (mm) | Width (mm) | Fin thickness (mm) | Number of fins |
---|---|---|---|---|

80 | 80 | 163 | 2 | 24 |

Base thickness (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) |
---|---|---|---|

3.0 | 225 | 880 | 2,700 |

Base thickness (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) |
---|---|---|---|

3.0 | 225 | 880 | 2,700 |

Dataset no. | L1 | L2 | L3 | L4 | L5 | $\cdots $ | L11 | L12 | L13 | L14 | |
---|---|---|---|---|---|---|---|---|---|---|---|

Ambient temperature (K) | $Ta$ | 283 | 293 | 303 | 313 | 278 | 333 | 303 | 313 | 293 | |

Velocity (m/s) | $v$ | 4 | 4 | 8 | 8 | 6 | 5.6 | 7.2 | 7 | 5 | |

Thermal contact resistance (K/W) | $Rc$ | 0.11 | 0.16 | 0.04 | 0.1 | 0.15 | 0.18 | 0.13 | 0.12 | 0.17 | |

Caloric value (W) | $Q1$ | 10 | 0 | 60 | 12 | 15 | 42 | 65 | 30 | 55 | |

$Q2$ | 20 | 4 | 50 | 0 | 0 | 35 | 1 | 40 | 7 | ||

$Q3$ | 0 | 6 | 40 | 8 | 25 | 28 | 55 | 50 | 0 | ||

$Q4$ | 40 | 8 | 30 | 0 | 4 | 18 | 2 | 60 | 9 | ||

$Q5$ | 50 | 10 | 20 | 4 | 35 | 9 | 45 | 0 | 5 | ||

$Q6$ | 60 | 12 | 10 | 0 | 6 | 55 | 0 | 20 | 11 | ||

$Q7$ | 12 | 10 | 0 | 60 | 45 | 40 | 35 | 9 | 30 | ||

$Q8$ | 10 | 20 | 4 | 50 | 0 | 37 | 3 | 30 | 40 | ||

$Q9$ | 8 | 30 | 6 | 40 | 55 | 20 | 25 | 1 | 50 | ||

$Q10$ | 6 | 40 | 8 | 0 | 10 | 18 | 0 | 11 | 60 | ||

$Q11$ | 4 | 50 | 10 | 20 | 65 | 8 | 15 | 5 | 10 | ||

$Q12$ | 0 | 60 | 12 | 10 | 12 | 58 | 30 | 0 | 20 |

Dataset no. | L1 | L2 | L3 | L4 | L5 | $\cdots $ | L11 | L12 | L13 | L14 | |
---|---|---|---|---|---|---|---|---|---|---|---|

Ambient temperature (K) | $Ta$ | 283 | 293 | 303 | 313 | 278 | 333 | 303 | 313 | 293 | |

Velocity (m/s) | $v$ | 4 | 4 | 8 | 8 | 6 | 5.6 | 7.2 | 7 | 5 | |

Thermal contact resistance (K/W) | $Rc$ | 0.11 | 0.16 | 0.04 | 0.1 | 0.15 | 0.18 | 0.13 | 0.12 | 0.17 | |

Caloric value (W) | $Q1$ | 10 | 0 | 60 | 12 | 15 | 42 | 65 | 30 | 55 | |

$Q2$ | 20 | 4 | 50 | 0 | 0 | 35 | 1 | 40 | 7 | ||

$Q3$ | 0 | 6 | 40 | 8 | 25 | 28 | 55 | 50 | 0 | ||

$Q4$ | 40 | 8 | 30 | 0 | 4 | 18 | 2 | 60 | 9 | ||

$Q5$ | 50 | 10 | 20 | 4 | 35 | 9 | 45 | 0 | 5 | ||

$Q6$ | 60 | 12 | 10 | 0 | 6 | 55 | 0 | 20 | 11 | ||

$Q7$ | 12 | 10 | 0 | 60 | 45 | 40 | 35 | 9 | 30 | ||

$Q8$ | 10 | 20 | 4 | 50 | 0 | 37 | 3 | 30 | 40 | ||

$Q9$ | 8 | 30 | 6 | 40 | 55 | 20 | 25 | 1 | 50 | ||

$Q10$ | 6 | 40 | 8 | 0 | 10 | 18 | 0 | 11 | 60 | ||

$Q11$ | 4 | 50 | 10 | 20 | 65 | 8 | 15 | 5 | 10 | ||

$Q12$ | 0 | 60 | 12 | 10 | 12 | 58 | 30 | 0 | 20 |

Dataset no. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | |
---|---|---|---|---|---|---|---|---|

Ambient temperature (K) | $Ta$ | 293 | 283 | 303 | 323 | 293 | 313 | 293 |

Velocity (m/s) | $v$ | 6 | 10 | 10 | 6 | 5.6 | Condition A | |

Thermal contact resistance (K/W) | $Rc$ | 0.10 | 0.05 | 0.10 | 0.20 | 0.25 | 0.05 | 0.20 |

Caloric value (W) | $Q1$ | 80 | 50 | 50 | 25 | 18 | Condition A (described in Table 5) | |

$Q2$ | 0 | 0 | 0 | 25 | 59 | |||

$Q3$ | 0 | 0 | 0 | 25 | 55 | |||

$Q4$ | 0 | 0 | 0 | 25 | 1 | |||

$Q5$ | 0 | 0 | 0 | 25 | 55 | |||

$Q6$ | 80 | 50 | 50 | 25 | 41 | |||

$Q7$ | 0 | 0 | 0 | 25 | 13 | |||

$Q8$ | 0 | 0 | 0 | 25 | 42 | |||

$Q9$ | 0 | 0 | 0 | 25 | 65 | |||

$Q10$ | 0 | 0 | 0 | 25 | 23 | |||

$Q11$ | 0 | 0 | 0 | 25 | 69 | |||

$Q12$ | 0 | 0 | 0 | 25 | 50 |

Dataset no. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | |
---|---|---|---|---|---|---|---|---|

Ambient temperature (K) | $Ta$ | 293 | 283 | 303 | 323 | 293 | 313 | 293 |

Velocity (m/s) | $v$ | 6 | 10 | 10 | 6 | 5.6 | Condition A | |

Thermal contact resistance (K/W) | $Rc$ | 0.10 | 0.05 | 0.10 | 0.20 | 0.25 | 0.05 | 0.20 |

Caloric value (W) | $Q1$ | 80 | 50 | 50 | 25 | 18 | Condition A (described in Table 5) | |

$Q2$ | 0 | 0 | 0 | 25 | 59 | |||

$Q3$ | 0 | 0 | 0 | 25 | 55 | |||

$Q4$ | 0 | 0 | 0 | 25 | 1 | |||

$Q5$ | 0 | 0 | 0 | 25 | 55 | |||

$Q6$ | 80 | 50 | 50 | 25 | 41 | |||

$Q7$ | 0 | 0 | 0 | 25 | 13 | |||

$Q8$ | 0 | 0 | 0 | 25 | 42 | |||

$Q9$ | 0 | 0 | 0 | 25 | 65 | |||

$Q10$ | 0 | 0 | 0 | 25 | 23 | |||

$Q11$ | 0 | 0 | 0 | 25 | 69 | |||

$Q12$ | 0 | 0 | 0 | 25 | 50 |

Time (s) | $t$ | 0 | 30 | 60 | 90 | 120 | 150 | 180 | 210 | 240 | 270 |

— | — | — | — | — | — | — | — | — | — | ||

30 | 60 | 90 | 120 | 150 | 180 | 210 | 240 | 270 | 300 | ||

Velocity (m/s) | $v$ | 8 | 8 | 10 | 10 | 4 | 4 | 8 | 8 | 6 | 6 |

Caloric value (W) | $Q1$ | 40 | 0 | 30 | 30 | 30 | 0 | 0 | 50 | 50 | 0 |

$Q2$ | 10 | 50 | 0 | 0 | 50 | 50 | 50 | 0 | 0 | 50 | |

$Q3$,$Q5\u2009Q8$,$Q9$ | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | |

$Q4$ | 0 | 50 | 0 | 50 | 0 | 50 | 0 | 50 | 0 | 0 | |

$Q6$ | 15 | 25 | 35 | 45 | 35 | 25 | 0 | 0 | 0 | 0 | |

$Q7$ | 10 | 50 | 0 | 0 | 50 | 50 | 50 | 0 | 0 | 0 | |

$Q10$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |

$Q11$ | 5 | 50 | 50 | 0 | 40 | 40 | 0 | 0 | 60 | 60 | |

$Q12$ | 20 | 0 | 40 | 0 | 50 | 50 | 0 | 40 | 0 | 50 |

Time (s) | $t$ | 0 | 30 | 60 | 90 | 120 | 150 | 180 | 210 | 240 | 270 |

— | — | — | — | — | — | — | — | — | — | ||

30 | 60 | 90 | 120 | 150 | 180 | 210 | 240 | 270 | 300 | ||

Velocity (m/s) | $v$ | 8 | 8 | 10 | 10 | 4 | 4 | 8 | 8 | 6 | 6 |

Caloric value (W) | $Q1$ | 40 | 0 | 30 | 30 | 30 | 0 | 0 | 50 | 50 | 0 |

$Q2$ | 10 | 50 | 0 | 0 | 50 | 50 | 50 | 0 | 0 | 50 | |

$Q3$,$Q5\u2009Q8$,$Q9$ | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | |

$Q4$ | 0 | 50 | 0 | 50 | 0 | 50 | 0 | 50 | 0 | 0 | |

$Q6$ | 15 | 25 | 35 | 45 | 35 | 25 | 0 | 0 | 0 | 0 | |

$Q7$ | 10 | 50 | 0 | 0 | 50 | 50 | 50 | 0 | 0 | 0 | |

$Q10$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |

$Q11$ | 5 | 50 | 50 | 0 | 40 | 40 | 0 | 0 | 60 | 60 | |

$Q12$ | 20 | 0 | 40 | 0 | 50 | 50 | 0 | 40 | 0 | 50 |

The numbers for the caloric values $Q$ in Tables 3 and 4 correspond to the chip numbers in Fig. 7. The velocity is the bulk value upstream of the heat sink. The velocity is set to a Reynolds number $Re$ corresponding to turbulence. Note that there is no bypass, that is, no flow that avoids the heat sink. Thermal contact resistance $Rc$ is the value between part F and the heat sink. Datasets L1 to L14 in Table 3 and datasets T1 to T5 in Table 4 are initial value problems with an analysis time of 1,000 s. The number of input parameters is 15, which is more than the number of learning datasets (14). The learning and validation data are in the range of 4–8 m/s in velocity, 0.04–0.18 K/W in thermal contact resistance, and 0–65 W in caloric value. The test data are in the range of 4–10 m/s in velocity, 0.05–0.25 K/W in thermal contact resistance, and 0–80 W in caloric value, which includes values outside the domain of definition for the learning and validation data. We did not use any special method to determine the analysis conditions, and thus, it is possible that an experimental design might provide better-quality data.

The thermofluid analysis was performed by computational fluid dynamics software (icepak; ansys, Canonsburg, PA) using the finite volume method. The number of nodes was several million. For the turbulence model, we selected one of the Reynolds-averaged Navier–Stokes models, which uses linear and logarithmic laws depending on the dimensionless wall distance. In addition, natural convective heat transfer and radiative heat transfer were set to not occur, and the time-step $\Delta t$ was not fixed. Figure 8 shows some of the temperature measurement points. The output time series data included the temperature at 63 points: each of the chips; parts B, C, and E on the *x*–*y* coordinates of the chip center, 12 locations each; three locations of part F and the heat sink base; and nine locations of the heat sink fins.

#### 4.1.2 Created Model.

In a comparison with the MSE of the learning and validation data, the square of the bias was almost the same value, and the variance was approximately 4%. From this, the created ROM is unlikely to be overfit. The bias can be reduced by increasing the number of nodes—that is, by increasing the locations where heat capacity is assigned—or by modifying the information criteria. However, the tradeoff between small bias and model simplicity should be kept in mind.

#### 4.1.3 Results.

We used the created ROM to predict the test data. The input was the initial temperature $Tn(0)$ at 63 nodes and the time series information of the factors $X$ that influence the temperature. The output was the predicted temperature at 63 nodes other than time $t=0$. We evaluated the effectiveness of the proposed sparse modeling method by comparing the predicted temperature calculated by the ROM with the results of the thermofluid analysis. Table 6 shows the mean error. The mean error for the seven types of test data was 0.8 K. The mean error for the 14 types of learning data was 0.6 K. Furthermore, as mentioned above, the variance was small for the learning and validation data represented by Eq. (13), so we concluded that there was no overfitting. The created ROM was able to perform a surrogate calculation for the thermofluid analysis, which normally takes between half a day to a day using a high-performance computer, in only a few seconds using an ordinary notebook computer.

Dataset no. | T1 | T2 | T3 | T4 | T5 | T6 | T7 |
---|---|---|---|---|---|---|---|

Mean error (K) | 0.7 | 0.3 | 0.3 | 1.0 | 0.9 | 1.4 | 1.2 |

Dataset no. | T1 | T2 | T3 | T4 | T5 | T6 | T7 |
---|---|---|---|---|---|---|---|

Mean error (K) | 0.7 | 0.3 | 0.3 | 1.0 | 0.9 | 1.4 | 1.2 |

The prediction results for dataset T3 are shown in Fig. 9. We plotted seven representative nodes out of the 63 predicted temperatures; the horizontal axis is time and the vertical axis is temperature. The steady-state temperature was predicted with high accuracy. The rapid increases in temperature of chips having small time constants were captured with high accuracy. The temperature of nodes such as heat sinks, whose time constant is much larger than that of the chips, could also be predicted with high accuracy. It can be seen that $TChip1\u2212TChip6$, the temperature difference between chips 1 and 6, which generate the same calorific value, was predicted with high accuracy. The main reason for this small difference in temperature was the thermal spreading resistance. Furthermore, in $TF(C)\u2212THSb(C)$, the large temperature difference between part F and the heat sink base due to thermal contact resistance could also be predicted with high accuracy. In addition, the temperatures of heat sink fins 1–3 indicate that the nonlinear temperature distribution in the direction of the fin height could also be predicted with high accuracy. This can be explained by the theory of fin efficiency, but it is worthwhile that these physical phenomena, including the above-mentioned thermal spreading resistance, was predicted with high accuracy from the data.

The prediction results for dataset T7 are shown in Fig. 10. The created ROM was able to predict temperature with high accuracy even when the input varied with time. We also confirmed that errors did not accumulate.

Figure 11 shows the results of predicting dataset T3 by the ROM created without using the data preprocessing method described in Sec. 3.2.1, that is, with only the short-term component $T\u02d9$. Machine learning considering only the short-term component $T\u02d9$ did not provide a highly accurate prediction for the object of this research. Consequently, we confirmed the importance of applying the long-term component $\Delta TFI$.

### 4.2 Power Modules Mounted on a Natural Air Cooling Heat Sink

#### 4.2.1 Thermofluid Analysis.

Figure 12 shows the target of the analysis. Two power modules aligned in the direction of gravity are cooled by a heat sink. The heat-generating components were 24 chips. Tables 7 and 8 show the dimensions and physical properties of the analysis target. The chips and parts A to F are shown in Table 1. Table 9 shows the thermofluid analysis conditions for the learning and validation data, and Table 10 shows the analysis conditions for the test data.

Thickness (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) | Emissivity | |
---|---|---|---|---|---|

Case | 1.0 | 0.27 | 1500 | 1310 | 0.8 |

Insulating resin | — | 0.30 | 1100 | 1850 | — |

Thickness (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) | Emissivity | |
---|---|---|---|---|---|

Case | 1.0 | 0.27 | 1500 | 1310 | 0.8 |

Insulating resin | — | 0.30 | 1100 | 1850 | — |

Height (mm) | Length (mm) | Width (mm) | Fin thickness (mm) | Number of fins |
---|---|---|---|---|

25 | 160 | 166 | 2.0 | 21 |

Height (mm) | Length (mm) | Width (mm) | Fin thickness (mm) | Number of fins |
---|---|---|---|---|

25 | 160 | 166 | 2.0 | 21 |

Base thickness (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) |
---|---|---|---|

5.0 | 225 | 880 | 2,700 |

Base thickness (mm) | Thermal conductivity (W/m/K) | Specific heat capacity (J/kg/K) | Density (kg/m^{3}) |
---|---|---|---|

5.0 | 225 | 880 | 2,700 |

Dataset no. | L1 | L2 | L3 | $\cdots $ | L7 | L8 | $\cdots $ | L13 | L14 | L15 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Ambient temperature (K) | $Ta$ | 5 | 5 | 5 | 45 | 45 | 35 | 35 | 35 | |||

Initial temperature of the parts (K) | $Tn(0)$ | 5 | 5 | 5 | 45 | 45 | 35 | 35 | 35 | |||

Caloric value (W) | Module 1 | $Q1,1$ | 10 | 1 | 3.7 | 9 | 6 | 11 | 13.5 | 18 | ||

$Q1,2$ | 11 | 2 | 4.4 | 9 | 5 | 13 | 7.5 | 6.0 | ||||

$Q1,3$ | 12 | 3 | 1.8 | 9 | 4 | 15.1 | 7.0 | 8.5 | ||||

$Q1,4$ | 13 | 4 | 1.9 | 9 | 3 | 1.0 | 16 | 15.5 | ||||

$Q1,5$ | 14 | 5 | 2.5 | 9 | 2 | 6.2 | 1.2 | 4.0 | ||||

$Q1,6$ | 15 | 6 | 3.1 | 9 | 1 | 12.3 | 14.1 | 2.2 | ||||

$Q1,7$ | 1 | 15 | 2.0 | 0 | 10 | 11.1 | 4.0 | 13.4 | ||||

$Q1,8$ | 2 | 14 | 3.1 | 0 | 11 | 7.7 | 16.2 | 2.5 | ||||

$Q1,9$ | 3 | 13 | 3.7 | 0 | 12 | 16.8 | 1.1 | 6.8 | ||||

$Q1,10$ | 4 | 12 | 4.2 | 0 | 13 | 11.2 | 6.0 | 6.9 | ||||

$Q1,11$ | 5 | 11 | 2.4 | 0 | 14 | 3.6 | 16.2 | 13.2 | ||||

$Q1,12$ | 6 | 10 | 4.5 | 0 | 15 | 12.6 | 5.0 | 2.1 | ||||

Module 2 | $Q2,1$ | 9 | 15 | 0.6 | 10 | 0 | 2.4 | 8.0 | 13.2 | |||

$Q2,2$ | 9 | 14 | 2.1 | 11 | 0 | 10 | 12.6 | 15.2 | ||||

$Q2,3$ | 9 | 13 | 2.3 | 12 | 0 | 7.2 | 1.6 | 11.1 | ||||

$Q2,4$ | 9 | 12 | 1.3 | 13 | 0 | 14.8 | 5.5 | 17.2 | ||||

$Q2,5$ | 9 | 11 | 3.4 | 14 | 0 | 17.4 | 8.0 | 10.1 | ||||

$Q2,6$ | 9 | 10 | 1.3 | 15 | 0 | 3.0 | 2.1 | 12 | ||||

$Q2,7$ | 0 | 6 | 3.6 | 1 | 9 | 13.8 | 2.0 | 4.2 | ||||

$Q2,8$ | 0 | 5 | 2.0 | 2 | 9 | 7.7 | 13.3 | 7.0 | ||||

$Q2,9$ | 0 | 4 | 3.7 | 3 | 9 | 5.5 | 1.8 | 0.9 | ||||

$Q2,10$ | 0 | 3 | 2.7 | 4 | 9 | 6.1 | 15.1 | 1.2 | ||||

$Q2,11$ | 0 | 2 | 1.7 | 5 | 9 | 3.0 | 13.1 | 8.0 | ||||

$Q2,12$ | 0 | 1 | 0.6 | 6 | 9 | 12 | 13.6 | 2.5 |

Dataset no. | L1 | L2 | L3 | $\cdots $ | L7 | L8 | $\cdots $ | L13 | L14 | L15 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Ambient temperature (K) | $Ta$ | 5 | 5 | 5 | 45 | 45 | 35 | 35 | 35 | |||

Initial temperature of the parts (K) | $Tn(0)$ | 5 | 5 | 5 | 45 | 45 | 35 | 35 | 35 | |||

Caloric value (W) | Module 1 | $Q1,1$ | 10 | 1 | 3.7 | 9 | 6 | 11 | 13.5 | 18 | ||

$Q1,2$ | 11 | 2 | 4.4 | 9 | 5 | 13 | 7.5 | 6.0 | ||||

$Q1,3$ | 12 | 3 | 1.8 | 9 | 4 | 15.1 | 7.0 | 8.5 | ||||

$Q1,4$ | 13 | 4 | 1.9 | 9 | 3 | 1.0 | 16 | 15.5 | ||||

$Q1,5$ | 14 | 5 | 2.5 | 9 | 2 | 6.2 | 1.2 | 4.0 | ||||

$Q1,6$ | 15 | 6 | 3.1 | 9 | 1 | 12.3 | 14.1 | 2.2 | ||||

$Q1,7$ | 1 | 15 | 2.0 | 0 | 10 | 11.1 | 4.0 | 13.4 | ||||

$Q1,8$ | 2 | 14 | 3.1 | 0 | 11 | 7.7 | 16.2 | 2.5 | ||||

$Q1,9$ | 3 | 13 | 3.7 | 0 | 12 | 16.8 | 1.1 | 6.8 | ||||

$Q1,10$ | 4 | 12 | 4.2 | 0 | 13 | 11.2 | 6.0 | 6.9 | ||||

$Q1,11$ | 5 | 11 | 2.4 | 0 | 14 | 3.6 | 16.2 | 13.2 | ||||

$Q1,12$ | 6 | 10 | 4.5 | 0 | 15 | 12.6 | 5.0 | 2.1 | ||||

Module 2 | $Q2,1$ | 9 | 15 | 0.6 | 10 | 0 | 2.4 | 8.0 | 13.2 | |||

$Q2,2$ | 9 | 14 | 2.1 | 11 | 0 | 10 | 12.6 | 15.2 | ||||

$Q2,3$ | 9 | 13 | 2.3 | 12 | 0 | 7.2 | 1.6 | 11.1 | ||||

$Q2,4$ | 9 | 12 | 1.3 | 13 | 0 | 14.8 | 5.5 | 17.2 | ||||

$Q2,5$ | 9 | 11 | 3.4 | 14 | 0 | 17.4 | 8.0 | 10.1 | ||||

$Q2,6$ | 9 | 10 | 1.3 | 15 | 0 | 3.0 | 2.1 | 12 | ||||

$Q2,7$ | 0 | 6 | 3.6 | 1 | 9 | 13.8 | 2.0 | 4.2 | ||||

$Q2,8$ | 0 | 5 | 2.0 | 2 | 9 | 7.7 | 13.3 | 7.0 | ||||

$Q2,9$ | 0 | 4 | 3.7 | 3 | 9 | 5.5 | 1.8 | 0.9 | ||||

$Q2,10$ | 0 | 3 | 2.7 | 4 | 9 | 6.1 | 15.1 | 1.2 | ||||

$Q2,11$ | 0 | 2 | 1.7 | 5 | 9 | 3.0 | 13.1 | 8.0 | ||||

$Q2,12$ | 0 | 1 | 0.6 | 6 | 9 | 12 | 13.6 | 2.5 |

Dataset No. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Time (s) | $t$ | — | — | — | — | — | — | 0–100 | 100–200 | 200–300 | 300–400 | 400–500 | 500–600 | |

Ambient temperature (K) | $Ta$ | 20 | 30 | 60 | 25 | 20 | 30 | 20 | ||||||

Initial temperature of the parts (K) | $Tn(0)$ | 20 | 30 | 60 | 25 | 20 | 30 | 45 | ||||||

Caloric value (W) | Module 1 | $Q1,1$ | 30 | 4.0 | 12 | 12 | 20 | 0 | 7 | 0 | 10 | 15 | 0 | 20 |

$Q1,2$ | 0 | 9.4 | 11 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,3$ | 0 | 2.3 | 10 | 0 | 0 | 0 | 7 | 0 | 10 | 0 | 0 | 0 | ||

$Q1,4$ | 0 | 4.5 | 8 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,5$ | 0 | 2.6 | 7 | 0 | 0 | 0 | 7 | 0 | 10 | 0 | 0 | 0 | ||

$Q1,6$ | 0 | 7.8 | 6 | 8 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,7$ | 0 | 6.6 | 5 | 0 | 0 | 0 | 7 | 0 | 10 | 0 | 0 | 0 | ||

$Q1,8$ | 0 | 1.6 | 4 | 12 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,9$ | 0 | 7.5 | 3 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,10$ | 0 | 9.8 | 2 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,11$ | 0 | 0.8 | 1 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,12$ | 0 | 7.5 | 0 | 8 | 15 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

Module 2 | $Q2,1$ | 0 | 9.9 | 12 | 12 | 20 | 10 | 0 | 7 | 0 | 20 | 0 | 15 | |

$Q2,2$ | 0 | 4.5 | 11 | 0 | 0 | 10 | 0 | 7 | 10 | 0 | 0 | 0 | ||

$Q2,3$ | 0 | 6.5 | 10 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,4$ | 20 | 6.8 | 8 | 0 | 0 | 10 | 0 | 7 | 10 | 0 | 0 | 0 | ||

$Q2,5$ | 0 | 1.2 | 7 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,6$ | 0 | 0.5 | 6 | 8 | 0 | 10 | 0 | 7 | 10 | 0 | 0 | 0 | ||

$Q2,7$ | 0 | 6.0 | 5 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,8$ | 0 | 2.9 | 4 | 12 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,9$ | 0 | 9.5 | 3 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,10$ | 20 | 8.5 | 2 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,11$ | 0 | 5.7 | 1 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,12$ | 0 | 6.1 | 0 | 8 | 15 | 10 | 0 | 7 | 0 | 0 | 0 | 0 |

Dataset No. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Time (s) | $t$ | — | — | — | — | — | — | 0–100 | 100–200 | 200–300 | 300–400 | 400–500 | 500–600 | |

Ambient temperature (K) | $Ta$ | 20 | 30 | 60 | 25 | 20 | 30 | 20 | ||||||

Initial temperature of the parts (K) | $Tn(0)$ | 20 | 30 | 60 | 25 | 20 | 30 | 45 | ||||||

Caloric value (W) | Module 1 | $Q1,1$ | 30 | 4.0 | 12 | 12 | 20 | 0 | 7 | 0 | 10 | 15 | 0 | 20 |

$Q1,2$ | 0 | 9.4 | 11 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,3$ | 0 | 2.3 | 10 | 0 | 0 | 0 | 7 | 0 | 10 | 0 | 0 | 0 | ||

$Q1,4$ | 0 | 4.5 | 8 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,5$ | 0 | 2.6 | 7 | 0 | 0 | 0 | 7 | 0 | 10 | 0 | 0 | 0 | ||

$Q1,6$ | 0 | 7.8 | 6 | 8 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,7$ | 0 | 6.6 | 5 | 0 | 0 | 0 | 7 | 0 | 10 | 0 | 0 | 0 | ||

$Q1,8$ | 0 | 1.6 | 4 | 12 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,9$ | 0 | 7.5 | 3 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,10$ | 0 | 9.8 | 2 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,11$ | 0 | 0.8 | 1 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

$Q1,12$ | 0 | 7.5 | 0 | 8 | 15 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | ||

Module 2 | $Q2,1$ | 0 | 9.9 | 12 | 12 | 20 | 10 | 0 | 7 | 0 | 20 | 0 | 15 | |

$Q2,2$ | 0 | 4.5 | 11 | 0 | 0 | 10 | 0 | 7 | 10 | 0 | 0 | 0 | ||

$Q2,3$ | 0 | 6.5 | 10 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,4$ | 20 | 6.8 | 8 | 0 | 0 | 10 | 0 | 7 | 10 | 0 | 0 | 0 | ||

$Q2,5$ | 0 | 1.2 | 7 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,6$ | 0 | 0.5 | 6 | 8 | 0 | 10 | 0 | 7 | 10 | 0 | 0 | 0 | ||

$Q2,7$ | 0 | 6.0 | 5 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,8$ | 0 | 2.9 | 4 | 12 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,9$ | 0 | 9.5 | 3 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,10$ | 20 | 8.5 | 2 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,11$ | 0 | 5.7 | 1 | 0 | 0 | 10 | 0 | 7 | 0 | 0 | 0 | 0 | ||

$Q2,12$ | 0 | 6.1 | 0 | 8 | 15 | 10 | 0 | 7 | 0 | 0 | 0 | 0 |

The two numbers in the subscripts of the caloric value $Q$ in Tables 9 and 10 correspond to the module identification number in the former and the chip identification number in the latter. Datasets L1 to L15 in Table 9 and datasets T1 to T6 in Table 10 are the initial value problems with an analysis time of 10,000 s. The number of input parameters is 26, which is more than the number of learning datasets (15). The learning and validation data are in the range of 278–318 K for ambient temperature and 0–18 W for caloric value. The test data are in the range of 293–333 K for ambient temperature and 0–30 W for caloric value, which includes values outside the domain of definition for the learning and validation data.

The thermofluid analysis was performed by the same computational fluid dynamics software (icepak; ansys) as in Sec. 4.1. The number of nodes was several million. Natural convection was modeled using the Boussinesq approximation, and radiation was modeled by calculating the view factors. The time-step $\Delta t$ was not fixed. Figure 13 shows some of the temperature measurement points of the heat sink, for which time series temperatures were output at the base ($THSb$), fin base ($TFin1$), and fin tip ($TFin1$), for a total of nine locations each. The temperature measurement points of the power modules were the same as in the case of forced air cooling shown in Fig. 8, except for part F and the case. As for the power modules, the output time series data included the temperature at 51 points: each of the chips; parts B, C, and E on the *x*–*y* coordinates of the chip center, 12 locations each; and three locations of the case. There were 102 temperature measurement points for two power modules and 27 points for the heat sink, for a total of 129 points.

#### 4.2.2 Created Model.

The number of nonzero coefficients in the ROM we created was 586, i.e., five per equation, which is about 3% of the 21,414 basis function candidates. The basis functions for natural convective heat transfer and radiative heat transfer were properly chosen. It was also properly derived that there are two thermal paths: one through the case via the insulating resin, and the other through the heat sink. As in Sec. 4.1, in a comparison with the MSE of the learning and validation data, the square of the bias was almost the same value, and the variance was approximately 2%.

#### 4.2.3 Results.

As in Sec. 4.1.3, the effectiveness of the proposed method was verified using the created ROM to predict the test data. Table 11 shows the mean error. The mean error for the seven types of test data was 0.6 K. The mean error for the 15 types of learning data was 0.4 K. Furthermore, as mentioned above, the variance was small for the learning and validation data, so we concluded that there was no overfitting.

Dataset no. | T1 | T2 | T3 | T4 | T5 | T6 | T7 |
---|---|---|---|---|---|---|---|

Mean error (K) | 0.4 | 0.8 | 0.9 | 0.8 | 0.4 | 0.5 | 0.6 |

Dataset no. | T1 | T2 | T3 | T4 | T5 | T6 | T7 |
---|---|---|---|---|---|---|---|

Mean error (K) | 0.4 | 0.8 | 0.9 | 0.8 | 0.4 | 0.5 | 0.6 |

The prediction results for dataset T3 are shown in Fig. 14. We plotted five representative nodes out of the 129 predicted temperatures. One of the input parameters, the ambient temperature, was 15 K outside the domain of definition for the learning data. The phenomenon that the temperature $T1,Chip12$ of chip 12 in module 1 with a caloric value of 0 W increases due to the heat generated by the surrounding chips was predicted with high accuracy. Furthermore, the physical relationship between the nodes was properly captured: the temperature $T1,Chip1$ of chip 1 in module 1, which exchanges heat to the air warmed by natural convection, is approximately 2.1 K higher than the temperature $T2,Chip1$ of chip 1 in module 2.

The prediction results for dataset T7 are shown in Fig. 15. The prediction was highly accurate even when the initial temperature of the parts differed from the ambient temperature.

## 5 Discussion

We proposed a sparse modeling method for creating an interpretable ROM from the time series data of multiple temperature nodes and factors influencing temperature and showed that this method achieves a balance between accuracy and computational efficiency. The proposed sparse modeling method comprises three novel machine-learning methods. The best of these is the sequential thresholded non-negative least-squares method based on term size criteria, described in Sec. 3.2.2. This method reduces the effects of multicollinearity and enables the creation of robust ROMs. However, the other two methods are also important. The data preprocessing method for considering the correlation between errors, described in Sec. 3.2.1, predicts very many steps ahead with high accuracy, while the space search method based on similar basis function classification, described in Sec. 3.2.3, realizes physically valid combinations of physical models. Before the effectiveness verification described in this paper, various test trials on a simple problem led to the conclusion that all three of the methods were necessary. Because the proposed method can also model nonlinear time-variant systems, it can be applied to various products and systems. For example, it could enable the following applications.

A surrogate model for data assimilation.

Product-lifetime estimation based on long-term temperature prediction results assuming actual load. The design is expected to be advanced by combination with the surrogate model for life prediction by Hirohata et al. [55,56].

Assistance for human understanding of various phenomena. The proposed method is a batch learning framework, and thus, it is recommended to remove noise from original time-series data by using general smoothing methods and to create ROMs using those data.

Rapid understanding of the influence of design parameters, even during a detailed design process.

A model for condition monitoring to realize condition-based maintenance. The proposed method is also compatible with causal inference, and thus, the relationship between real sensor signals and virtual simulation parameters can be efficiently modeled. It is also possible to combine the method of Suzuki et al. [57], in which parameters that degrade over time are modeled and the ROM is used as a system model for data assimilation.

## 6 Conclusion

In this paper, we proposed a sparse modeling method for automatically creating a thermal model for temperature prediction in the form of simultaneous ordinary differential equations from a very small number of time series data with a nonconstant time-step. The form of the thermal model is constrained by the physical model, and the model parameters are efficiently estimated using three novel machine-learning methods. The method shows promise for wide application in fields where the concept of equivalent circuits can be applied. Furthermore, it might also be applicable to various products and systems because the proposed method can also model nonlinear time-variant systems. The effectiveness of the proposed method was verified using time-series data obtained by thermofluid analysis of a power module mounted on a comb-shaped heat sink. Two types of cooling systems were targeted: forced air cooling systems with variable thermal contact resistance, and natural air cooling systems. The computational efficiency of the created ROM was very high, and unknown data, including extrapolation, was predicted with an error of less than 1 K.

## Data Availability Statement

The authors attest that all data for this study are included in the paper.

## Nomenclature

- $C$ =
heat capacity

- $FI$ =
fixed interval

- $Gr$ =
Grashof number

- $h$ =
heat transfer coefficient

- $L$ =
number of datasets

- $Ml$ =
number of data in the dataset $l$

- $N$ =
number of temperature nodes

- $Nu$ =
Nusselt number

- $P$ =
number of basis function candidates

- $Pr$ =
Prandtl number

- $Q$ =
caloric value

- $R$ =
thermal resistance

- $Rc$ =
thermal contact resistance

- $Re$ =
Reynolds number

- $S$ =
surface area

- $T$ =
temperatures

- $tl,j$ =
time sample number $j$ in the dataset $l$

- $Ta$ =
ambient temperature

- $Tn$ =
temperature at the node $n$

- $Tn(0)$ =
initial temperature at the node $n$

- $v$ =
velocity

- $X$ =
factors that influence the temperature

- $\gamma $ =
correction factor for normalization

- $\Delta t$ =
time step

- $\theta $ =
element of the library

- $\Theta (T,X)$ =
library consisting of basis function candidates

- $\xi $ =
element of the sparse vector of coefficients

- $\Xi $ =
sparse vector of coefficients

- $\Omega $ =
variance–covariance matrix of the error