Energy prediction of machine tools can deliver many advantages to a manufacturing enterprise, ranging from energy-efficient process planning to machine tool monitoring. Physics-based energy prediction models have been proposed in the past to understand the energy usage pattern of a machine tool. However, uncertainties in both the machine and the operating environment make it difficult to predict the energy consumption of the target machine reliably. Taking advantage of the opportunity to collect extensive, contextual, energy-consumption data, we discuss a data-driven approach to develop an energy prediction model of a machine tool in this paper. First, we present a methodology that can efficiently and effectively collect and process data extracted from a machine tool and its sensors. We then present a data-driven model that can be used to predict the energy consumption of the machine tool for machining a generic part. Specifically, we use Gaussian process (GP) regression, a nonparametric machine-learning technique, to develop the prediction model. The energy prediction model is then generalized over multiple process parameters and operations. Finally, we apply this generalized model with a method to assess uncertainty intervals to predict the energy consumed by any part of the machine using a Mori Seiki NVD1500 machine tool. Furthermore, the same model can be used during process planning to optimize the energy-efficiency of a machining process.

## Introduction

Over 22% of greenhouse gas emissions in the U.S. come from the industrial sector, which is also the highest consumer of electrical power in the U.S. [1]. To reduce energy use in the manufacturing sector, manufacturers need energy prediction models that can estimate electricity costs and peak power demand for their equipment based on a production plan. These models can help manufacturers reduce their energy costs and environmental footprint and respond to new regulations and business drivers. For example, the Smart Grid and carbon cap-and-trade may incentivize manufacturers to adjust their operations to respond to load adjustments in the grid or take advantage of lower energy or carbon prices during specific time windows. These models can also improve process monitoring since deviations in the power demand and energy consumption can be related to component wear, tool breakage, or collisions [2,3]. The first step toward building such models is to understand the energy consumption patterns of machine tools and manufacturing operations. In this paper, we use the data collected from a machine tool to determine how different operational strategies influence the energy consumption pattern of a machine tool and to derive the most energy-efficient strategy to machine a part.

Models for predicting energy consumption and optimizing manufacturing processes have been a subject of research interest for over 50 years. Most of these efforts are physics-based, which means that the models are built upon the physical laws that govern manufacturing operations. Based on the energy transfer from an electrical system to a mechanical system, Neugebauer et al. [4] formulated a mechatronic representation for computing the total energy consumption of a metal-cutting machine tool. Using energy conservation, Dietmair and Verl [5] categorized and derived the energy consumption of a metal-cutting machine tool using its two basic operations: moving axes and removing material. Although these methods are based on the physics of a machine tool, they are difficult to implement because they often require a large number of physical parameters that are often hard to compute or estimate. It is also difficult to properly incorporate the stochastic nature of a manufacturing process into a physics-based model. These difficulties challenge the construction of physics-based models that account for different mechanical characteristics of different machine tools.

To address the challenges presented by physics-based models, a number of studies have explored characterizing the energy consumption of a machine tool using experimental data. Draganescu et al. [6] used experimental data to construct statistical regression models based on machining parameters, such as the feed rate, spindle speed, and depth of cut. Diaz et al. [7] used experimental data on face-milling operations to show that the material removal rate is one key indicator of energy consumption in a machine tool. Gutowski et al. [2] developed machine-tool characterization techniques by studying the effects of different process parameters on the total energy consumption.

Both the physical and experimental applications in the literature have been developed for specific machining operations, parameter spaces, and tool-workpiece material combinations, which limit the broad applicability of these approaches. Furthermore, they may be insufficient for machine tools with relatively high tare power demand (tare power refers to the power required for noncutting operations and auxiliary equipment). Most modern machine tools fit this description since the energy needed for material removal is a fraction of the overall energy consumed. Most of the limitations in the literature are due to a limited access to data, lack of standardized data-collection systems, and inadequate postprocessing techniques.

Advances in machine automation and sensing have begun to address such limitations by allowing continuous measurements of the operating conditions and energy consumption of a machine tool. Such advances provide new opportunities to build data-driven models to characterize a machine-tool and its performance. Teti et al. [8] gave an extensive survey of sensor technologies, signal processing, and decision-making methodologies for machine-tool monitoring. One recent advancement is MTConnect, which is an XML-based standard that has been developed to facilitate archiving, accessing, and retrieving operational data from various manufacturing equipment [9,10]. MTConnect enables aggregation of raw power data and machining operational information, which provides a means to track variations in energy consumption by different machining operations [3]. MTConnect has also been used to study the effects of different process parameters on the energy consumption of a machine tool and to construct statistical regression models for energy consumption [11]. Although these studies have clearly illustrated the possibility of collecting real-time operational and energy consumption data for future data analysis, they have so far dealt primarily with data collected from slotting operations. In practice, machining a part requires a variety of machining operations with many different combinations of operational parameters. However, the principles of data-driven process planning and machine tool monitoring based on energy consumption that come across from these studies are the key motivators of this work.

In this paper, we construct a generalized, energy prediction model for different machining operations using various combinations of process parameters. The constructed energy prediction model can be used for several tasks. First, by establishing the correlation among machining parameters and the resultant energy consumption, the energy prediction model can help gain a better understanding of the energy consumption pattern of a target machine tool. Furthermore, the energy prediction model can be used to facilitate operating a target machine more efficiently, such as reducing the total energy consumption of the target machine by selecting an energy-efficient toolpath. Finally, an energy prediction model allows monitoring of a target machine by observing sudden, unexpected events that may show deviations between the predicted and the actual energy consumption, which could be an indication of failure or deterioration of a certain machine component.

This paper is organized as follows: We first describe a data-processing methodology that uses MTConnect to extract data from a machine tool controller and add-on sensors efficiently and effectively. We used this methodology to collect data from an automated milling machine tool (Mori Seiki NVD1500DCG), which allowed us to contextualize energy-consumption data with the corresponding machining operation and its process (control) parameters. To explore all possible combinations of process parameters for different machining operations, we collected data from 18-machined parts with different machining strategies (numerical control (NC) codes). Using this data, we developed a generalized, data-driven, energy prediction model that can determine the energy consumption of the machine tool for machining a generic part. We applied the Gaussian Process (GP) regression model, a nonparametric regression model, to model the complex input and output relationship. Finally, we illustrate the use of the energy prediction model to evaluate the optimal strategy for machining a generic part.

## Data Collection and Postprocessing

The first step to construct an accurate energy prediction model of a machine tool is to collect and process data with minimum noise from the target machine. This data includes the process parameters collected from a wide range of machining operations, which are inputs to the model, and the corresponding energy-consumption measurements, which are the outputs of the model. However, collecting such extensive data through experimentation requires significant time and effort, which has been one key barrier for constructing data-driven energy prediction models. Recent advances in sensing and data management, such as MTConnect, have started to address these barriers by enabling the real-time collection and remote retrieval and processing of manufacturing data. This section discusses how we designed the experiments to collect the data used to construct a data-driven energy prediction model.

### Data Acquisition System.

Figure 1 shows the overall data acquisition system used to collect energy-consumption data contextualized with machining data, such as process parameters, NC blocks, and tool positions. The machining data were collected from a FANUC controller, and the power time-series data was collected using a high speed power meter (HSPM) from system insights. The power consumption data represents the power consumed by the entire machine tool, including auxiliary components such as the cooling system and the controller. Both types of data were collected using MTConnect and synchronized and organized using an MTConnect agent. Bhinge et al. [12] and Helu et al. [13] describe the hardware platform and data acquisition system in greater detail.

### Data Processing.

To extract insights about a machine tool, the data collected from a target machine needs to be properly processed and contextualized. When constructing our energy prediction model, we used feature extraction to identify the process parameters that possibly influence energy consumption from the raw data. Figure 2 shows how we classified the data into three groups based on the level of postprocessing applied: direct, derived, and simulated data.

Direct data was the raw data collected from the machine tool controller and added sensors using MTConnect. This data included the NC code block, timestamp, instantaneous feed rate, instantaneous spindle speed, instantaneous loads on each axis, instantaneous tool position, and instantaneous power. Instantaneous power was measured using an externally installed power meter. The MTConnect agent synchronized the direct data using a common time stamp.

Derived data, which is data corresponding to cutting operations, was generated by applying simple calculations to sets of direct data. For example, the machining process in a conventional automated machine tool is composed of sequences of cutting operations that can be described using a set of control parameters represented by an NC code block. To construct the energy prediction model, we needed to determine the relationship between the control parameters and the corresponding energy consumption in every NC code bock. Specifically, we computed the total energy, average feed rate, average spindle speed, and length of cut in the *x*- and *y*-directions over the duration of a block of NC code corresponding to a single cutting operation. We then used the length of cuts in the *x*- and *y*-directions to determine the length and direction of the cut. A sequence of such cuts constituted the complete toolpath.

Simulated data was generated by simulating the tool movements associated with the sequence of cutting operations for the machining process. Such data was needed because the block-averaged data for each NC code block did not provide enough details to distinguish actual material removal operations from other tool movements (e.g., air cut above workpiece). To determine the actual amount of material removed, we applied a reverse simulation of the entire cutting process using the instantaneous position data retrieved as direct data. This simulation required knowledge of the workpiece dimensions and the tool diameter. To simulate the cutting operation, we constructed a two-dimensional mesh on the surface of the workpiece, tracked the material removed during each cut, and redefined the elements in the mesh after every block of NC code. From the positional displacement obtained in the derived data, the toolpath of the tool for each NC code block was tracked and the material removed was calculated. The data extracted from this simulation included the depth of cut, volume of material removed, cutting strategy (i.e., climb or conventional milling), and classification of cut (e.g., air cutting, rapid motion without cutting, and feed with cutting). The cutting strategy was determined from the cutting simulation by tracking the direction of angular rotation of the tool and the number of elements being cut on either side of the centerline of the tool.

### Experimental Design.

The training output for our model was energy consumption data corresponding to each block of the NC code and its corresponding machining parameters, all of which can be collected in near-real time and efficiently processed and retrieved remotely. The experimental design and the data processing technique used for generating the training data for this study have been described in the previous work [12–14]. In this section, we briefly present the basic setup and data processing steps used in the experiments. Figure 3 shows the sample part designed to collect training data for the data-driven energy prediction model. Table 1 shows the specific details of the workpiece, machine tool, and cutting tool used in this experimental study.

As shown in Fig. 3, there were five basic cutting operations—face milling, contouring, pocketing, slotting, and plunge—that were involved in machining a part. In addition, there were three noncutting operations—air cut in the *x-y* plane, air cut in the $z$ direction, and rapid motion—that were also included in the experiments. Because process parameters, such as feed rate, spindle speed, and depth of cut, could have affected energy consumption, the test parts were produced using different combinations of process parameters to investigate this relationship. For the objective of this paper, a single tool and workpiece were chosen, but an expansion of this experimental setup could also involve variations in cutting tool geometry and workpiece materials. A Taguchi technique [15] was employed to design the experiments to ensure a fractional-factorial combination for each set of process parameters in each operation. Table 2 shows the levels chosen for the depth of cut, chip load (feed or thickness of chip removed by one cutting edge of the tool), and spindle speed used to machine the parts. The levels were chosen to cover the entire range of prescribed milling parameters for the tool-workpiece material combination. The feed rate *f* (mm/min) is the product of the spindle speed (RPM), the number of tool teeth, and the chip load (mm/tooth).

As noted in Sec. 1, 18 parts were machined for this study, which provided a total of 196 face-milling, 108 contouring, 54 slotting and pocketing, and 32 plunge experiments. Each line of NC code corresponded to a cutting operation and tool motion and was combined with the corresponding process parameters and output energy consumption. Unlike traditional data-collection procedures, each line of NC code for a part was treated as a separate experiment. This allowed us to conduct a large number of experiments by machining a modest number of parts. The face milling operations on the first nine parts were carried out in the $y$ direction, and the remaining nine parts were milled in the $x$ direction. The separation of milling operations in the *x-* and *y-*directions was necessary to measure the energy consumption accurately for the target machine. The datasets collected from machining all 18 parts were then used to construct the energy prediction model for each (cutting or noncutting) operation.

### Data Used for Energy Prediction Model.

The direct, derived, and simulated data were used to construct the energy prediction model of the milling machine. There were five basic input (predictor) variables based on the fundamental parameters of a milling machine tool that affect energy consumption (see Fig. 4): feed rate, spindle speed, depth of cut, cutting direction, and cutting strategy. Three of the input variables—feed rate, spindle speed and depth of cut—were quantifiable measurements defined as follows:

$x1\u2208\mathbb{R}$

**Feed rate**: the average velocity at which the tool is fed, which can be retrieved from the controller data$x2\u2208\mathbb{R}$

**Spindle speed**: the average rotational speed of the tool, which can be retrieved from the controller data$x3\u2208\mathbb{R}$

**Depth of cut**: the actual depth of material that the tool is cutting, which can be obtained from the cutting simulation

The remaining input variables—cutting direction and cutting strategy—were qualitatively (or categorically) labeled. Qualitative variables were represented numerically by codes to construct a regression model. For example, a vector of $K$ binary or bits represented a qualitative variable with $K$ independent categorical features; only a single bit was nonzero to indicate the associated category among $K$ possible categories (Hastie et al. 2009). This approach was used to represent the cutting direction and strategy as coded variables to convert qualitative features into quantitative features:

$(x4,\u2009x5,\u2009x6,\u2009x7)\u2208{(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1)}$

*Cutting directions*: $x$-cut, $y$-cut, $z$-cut, or $xy$-cut, which were represented as coded variables (1,0,0,0), (0,1,0,0), (0,0,1,0), and (0,0,0,1), respectively.$(x8,\u2009x9,\u2009x10)\u2208{(1,0,0),(0,1,0),(0,0,1)}$

*Cutting strategies*: conventional milling, climb milling, or a combination of both (as in slotting), which were represented as coded variables, (1,0,0), (0,1,0), and (0,0,1), respectively.

Using categorical or coded variables, the prediction model was able to represent any combination of cutting direction and cutting strategy. Furthermore, the use of coded variables allowed the prediction model to be constructed using the entire training data. Otherwise, we would have needed to partition the dataset into subsets with few data points according to each combination of cutting direction and cutting strategy if we had to construct an individual prediction function for each combination of features.

*,*over a time duration of $tk$, was retrieved as a time series dataset using MTConnect. The total energy consumption, $E(i)$ for NC code block $i$

*,*was computed as

The number of data points, $NP$, in each NC code block $i$ depended on the duration of the corresponding operation and the sampling rate for the power measurement, which was 100 Hz in this study. We generalized the energy consumption by using energy density $y(i)=E(i)/l(i)$ (i.e., the energy per unit length of cut) as the output response feature. That is the length of cut scales the predicted energy consumption. Predicting the energy density implicitly included the dependence of the duration of cut on the feed rate and the length of cut. It also allowed us to predict the energy consumption of a part with different (unseen) dimensions, which made the model spatially scalable. Note that we modeled the relationship between the averaged values of the process parameters and the average power (or energy density) across the duration of the block.

For the 18 parts machined for the experiments in this study, a total of 12,299 datasets of input feature vector $x$ and output feature $y$ were generated after postprocessing and cutting simulation; each dataset corresponded to an individual NC code block. We filtered out those datasets that corresponded to NC code blocks that had duration shorter than 2 s except for those blocks corresponding to rapid motion. This process prevented statistically low quality data from biasing the prediction model since data from blocks of longer duration are more stable. The filtered dataset $D={(xi,yi)|i=1,\u2026,m}$, where $m=3214$ (i.e., data from 3214 NC code blocks remained after filtering), was further categorized into seven different datasets ${D1,\u2009\u2026,Dq,\u2026,D7}$ that corresponded to the seven cutting operations described in Fig. 3 and Sec. 2.3; each dataset $Dq={(xi,yi)|i=1,\u2026,mq}$ contained $mq$ NC code blocks for the cutting operation type $q$.

## Data-Driven Approach for Energy Prediction

To construct a data-driven energy prediction model for a machine tool using the data described in Sec. 2, we can apply Gaussian Process (GP) regression because it can construct a nonlinear regression model with high-dimensional input features using a relatively small number of training data. As a nonparametric regression technique, GP regression can model the input and output relationship without using a set of predefined basis functions. Instead, it uses bases formed from the training data. Due to this flexibility, GP regression is able to model complex relationships among input variables and a target response with the least number of hyper-parameters. Additional benefits of GP regression are its ability to quantify uncertainties in the predicted values and its ability to update the regression model incrementally. GP regression has been applied to many fields, including modeling robotics [16], human motions [17], and traffic flow [18]. The following sections Secs. 3.1 to 3.3, describe the procedure we applied to construct the energy prediction model using GP regression.

### Gaussian Process.

GP regression is employed to approximate the unknown energy prediction function $f(x)$ using historical data on the machining process parameters and corresponding energy consumption. A GP is a collection of random variables (stochastic process), any finite set of which has a joint Gaussian distribution [19]. By treating the values of the unknown function $f(\xb7)=GP(m(\xb7),k(\xb7,\xb7))$ as a collection of random variables, GP describes the function probabilistically as a multivariate Gaussian distribution specified by its mean function $m(\xb7)$ and the covariance function $k(\xb7,\xb7)$. The mean function $m(\xb7)$ captures the prior mean of the target function, which is usually assumed to be zero. The covariance function $k(\xb7,\xb7)$ quantifies the correlation between input data in terms of their function values.

**K**is the covariance matrix (kernel matrix) whose $(i,\u2009j)$ th entry is $Kij=k(xi,\u2009xj)$. The value of the covariance function $k(xi,\u2009xj)$ quantifies the amount the two input feature vectors $xi$ and $xj$ that change together. Note that the more the two vectors $xi$ and $xj$ differ, the closer the value of the covariance approaches zero, which implies that the two input vectors are not correlated in terms of their function values. An effective kernel function can be chosen considering the characteristics of the target function. Noting that energy consumption varies smoothly with the changes in the machining parameters [11], we use a squared exponential kernel function that can effectively describe a continuously varying function. The squared exponential kernel function evaluates the covariance between the two input feature vectors $xi$ and $xj$ as [20]:

The kernel function is described by the hyper-parameters $\theta ={\sigma s,\sigma \u03f5,\u2009\lambda}$. The term $\sigma s2$ is referred to as the signal variance, which quantifies the overall magnitude of the covariance value. The term $\sigma \u03f52$ is referred to as the noise variance, which quantifies the level of noise assumed to exist in the observed output response. The Kronecker delta function $\delta ij$ serves to selectively specify the noise variance $\sigma \u03f52$ to the covariance value $k(xi,\u2009xj)$; that is, the noise signals added to different measurements are assumed to be independent and the noise correlation is nonzero only when $i=j$. The vector $\lambda =(\lambda 1,\u2026,\lambda r,\u2026\lambda n)$ is referred to as the characteristic length scales to quantify the relevancy of the input features in $x=(x1,\u2026,xr,\u2026xn)$ for predicting the response $y$. Note that we used a total of $n=10$ input features in this study. A large length scale $\lambda i$ indicates weak relevance, while a small length scale $\lambda i$ implies strong relevance of the corresponding input feature $xi$.

With the gradient $\u2207\u2009log\u2009L(\theta ;Dq)$ of the log-likelihood function $L(\theta ;Dq)$ available, Eq. (5) can be solved using a mathematical optimization algorithm. We use Gaussian processes for machine learning (GPML), a GP package implemented in matlab** ^{®}** to optimize the hyper-parameters [21].

That is, we can obtain the mean function $\mu (xnew|Dq)$ from the GP regression to predict the most probable energy density $ynew=fq(xnew)+\u03f5new$ for a given input feature vector $xnew$ and the standard deviation function $\sigma (xnew|Dq)$ to quantify the uncertainty in the predicted value of $ynew$ at $xnew$. The energy consumption per each machining operation is then aggregated to predict the total energy consumption (with some estimated uncertainty bound) for machining a part.

### Estimating Test Error.

Selecting the type of basis function and choosing the optimum feature sets precedes fitting a prediction model to the training dataset. For GP regression, once the type of kernel function is specified, the optimum feature selection is implicitly carried out by optimizing the hyper-parameters for the kernel function. For example, the optimized length scales $\lambda =(\lambda 1,\u2026,\lambda r,\u2026\lambda n)$ for the exponential squared function automatically weigh the importance of the corresponding features in predicting the output response since a smaller $\lambda r$ implies a larger influence of the corresponding input feature $xr$ on the output response $y$. This property of feature weighting, generally known as automatic relevance determination (ARD) [20], simplifies the construction of the energy prediction model since all features are being included to construct the energy prediction functions without explicitly conducting the feature-selection procedure.

Depending on the machining operation, the parameters in the input feature vector $x$ affected the energy density value $y$ differently. For each machining operation type $q$ with the dataset $Dq={(xi,yi)|i=1,\u2026,mq}$, we constructed the individual energy-prediction function for that operation using GP regression. We then estimated (generalization) errors for each prediction function using the holdout cross-validation technique [22]. Note that here the (generalization) error was estimated to provide insight into how well each individual energy prediction function would perform with unseen test data.

For each machining operation type $q$ with the dataset $Dq={(xi,yi)|i=1,\u2026,mq}$, we trained the model and computed the error rates as follows:

- (1)
Randomly divide the dataset $Dq$ into the training dataset $Dqtr$ with $mqtr$ training data points and the test dataset $Dqte$ with $mqte$ being the test data points. In this study, we set the ratio $mqtr:mqte=7:3$, which is a common ratio used to estimate the accuracy (i.e., test error) of predictions for supervised learning algorithms [22].

- (2)
Construct the energy density prediction function $fq(x)$ by computing $\mu (x|Dq)$ and $\sigma (x|Dq)$ using the training dataset $Dqtr$.

- (3)
Predict the energy densities corresponding to the input features in the test dataset $Dqte$ and compute the error by comparing them to the true energy densities in the test dataset $Dqte$. The error was measured in terms of the mean absolute error (MAE), which was more insensitive to outliers than the root mean square error (RMSE) [23]

*q*, we use the normalized mean absolute error (NMAE) [24]

Note that we could have computed the average deviation between the predicted and measured densities, i.e., $MAEq$, by simply multiplying $NMAEq$ with the measured mean density $y\xafq$ (for the machining operation type *q*).

The value of $NMAEq$ could have fluctuated depending on the selected training and test datasets. To quantify the test error reliably, 100 values of $NMAEq$ were computed using the procedure above. The averaged value $\mu NMAE$ was then determined and used as an error measure in this study. Note that the number of repetitions was chosen empirically so that a stable, representative mean value could be determined irrespective of the selected training and test datasets.

Table 3 compares the estimated (generalization) errors for the energy density prediction function for each machining operation type. The averages for the normalized mean absolute error $\mu NMAE$ (computed using 100 $NMAE$ values from 100 test experiments) for the cutting operations are different due to the different cutting mechanisms and the different numbers of training data used for constructing the models. Overall the $\mu NMAE$ values range between 8% and 45%; the smallest values occur for the feed with cut operations. The standard deviation $\sigma NMAE$ for the average of the normalized mean absolute error quantifies the variability in the estimated $\mu NMAE$.

The values of $\mu NMAE$ obtained by the energy prediction model are much lower than the value of CV, which implies that the energy prediction model captures the variations in the energy density induced by different machine operations and parameters well.

### Uncertainty Quantification in the Prediction Model.

*q*represented by the mean energy density function $\mu q(x|Dq)$ and the associated standard deviation function $\sigma q(x|Dq)$, the total energy consumption for machining a part can be estimated from the NC codes. First, we can estimate the energy consumption $E\u0302i$ and the standard deviation $Si$ from the input feature $xi$ of the NC code block

*i*performing the machining operation type $q$ as:

*q,*the predicted total energy consumption $E\u0302q$ and the associated standard deviation $Sq$ can be computed for that operation type

*Q*= 7 (including all cutting and noncutting operations). Because the energy consumed in each machining operation is considered independent of the energy consumed by other machining operations, $E\u0302$ and $S$ are expressed as:

Note that the energy density $y\u0302$ is represented to be a Gaussian random variable in the framework of GP regression. Because a linear combination of Gaussian random variables is also Gaussian, the predicted total energy $E$, which is computed as a linear combination of the energy densities, is also Gaussian. The probability distribution on the total energy $E$ then can be expressed as $E\u223cN(E\u0302,S2)$ with the mean $E\u0302$ and the standard deviation $S$ given in Eqs. (16) and (17), respectively.

## Validation Tests

The energy prediction model constructed based on GP regression was used to predict the energy consumption for machining a generic part. This section discusses the validation of the trained energy prediction function using unseen test data.

### Data Collection From a Blind Test.

Figure 5 shows a generic part, the geometry of which is quite different from the part used in the training process (see Fig. 3). The cutting and noncutting operations used to produce the generic test part are face milling, pocketing, plunge, air cut, and rapid motion.

The accuracy of the energy prediction model depended on how the machining parameters for a test part were distributed relative to the machining parameters used in the training dataset. If the machining parameters for the test part were completely different from the machining parameters used to collect the training dataset, the accuracy of the prediction fell. To study how the energy prediction model generalized over unobserved test data, we validated the energy prediction model by machining three test parts with the geometry shown in Fig. 5, but we intentionally varied the spindle speeds in these experiments as shown in Table 4. Comparing the spindle speeds used to machine the 18 training parts to those in Table 4, the first test part uses the same spindle speed while the second and third use different spindle speeds. We chose these spindle speeds to evaluate the model's capability of predicting the energy density values in incrementally more unexplored parameter space. For all test parts, the depth of cut was set to 1 mm.

### Prediction Result.

Figure 6 shows the measured energy density values *y* and the predicted energy density function $y\u0302$ for the face milling operations with different spindle speeds and different feed rates. To visualize the high-dimensional prediction function for the energy density, we fix the other machining parameters for the *y*-direction cut and conventional cutting strategy and set the depth of cut to 1 mm. For each plot, the curve shows how the energy density varies with the feed rate for fixed spindle speed. The influence of the spindle speed on the energy density can be studied by comparing the curves shown in the figure. In each plot, the dash line represents the predicted mean $\mu 1(x|D1)$ and the shaded band represents the 95% confidence bound on the predicted energy density, i.e., $\mu 1(x|D1)\xb11.96\sigma 1(x|D1)$.

As Fig. 6 shows, the energy density measurements for the face milling operations in test parts 1, 2, and 3 are well captured by the energy density prediction function for each spindle-speed/feed-rate combination. The overall trend of the energy density is well predicted by the mean function $\mu 1(x|D1)$. In addition, most measurements are within the 95% confidence bound on the predicted energy density. The width of the confidence bound changes depending on the distribution of the training data used to build the model. In general, the confidence bound for high feed rate is larger because a fewer number of data points were collected in this region to build the model.

Figure 7 compares the predicted and the measured energy consumption for each individual NC code block. To predict the energy consumption $E\u0302i$ for block $i$, the type of machine operation $q$ is first identified and the energy density prediction function $\mu q(xi|Dq)$ corresponding to that operation $q$ is used. The predicted (mean) energy for block $i$ is then computed using Eq. (12). In general, the predicted energy consumption values match well with the measurements. The deviation of the mean energy prediction from the measured energy consumption increases from test parts 1 to 3. This is because the machining parameters in test part 3 are the furthest away from the observed values in the training data.

Finally, Table 5 compares the predicted and measured energy consumption using the normalized mean absolute error (NMAE) and relative total error (RTE) defined as:

Note that the NMAE in Eq. (18) is defined using the predicted energy $E\u0302i$ and the measured energy $Ei$ for each NC code block $i$, whereas the $NMAE$ in Eq. (10) is defined using the predicted energy density $y\u0302i$ and the measured energy density $yi$. Thus, the energy prediction with the longer length of cut $li$ will contribute more to the value of $NMAE$ in Eq. (18). In spite of this dependence on the geometry, the measure can still quantify the mean absolute errors of the three test cases in a relative manner. As Table 5 shows, the $NMAE$ for the three test parts are less than 15%, which are consistent with the estimated error using the training dataset based on the hold-out cross-validation method. In other words, the energy prediction model generalizes quite well for the unseen test dataset, which validates the effectiveness of the model in predicting the energy consumed to machine a generic part.

While the NMAE quantifies error in the predicted energy for a single cut, the RTE quantifies the errors in the predicted total energy consumption for producing a whole part. Table 5 shows that for all test cases, the RTE is less than 6%. In addition, the measured total energy falls within the 95% confidence bound $E\u0302\xb11.96S$ on the predicted total energy. The RTEs for the energy prediction are small for all three test parts because the errors $E\u0302i\u2212Ei$ are distributed centered at the zero-mean with an almost equal chance to over- or underestimate the energy as shown in Fig. 8. The overestimations and the underestimations on the block-wise energy consumptions are canceled out when they are summed up to compute the total energy consumption. Therefore, the block-wise energy prediction results in accurate estimation on the total energy consumption for machining a whole part.

## Selection of Machining Strategy

In addition to predicting the energy consumption, the energy prediction functions can also be used to determine an energy-efficient toolpath to machine a part or to enable novel monitoring strategies by highlighting abnormal behavior. In this section, we discuss the use of energy prediction functions to select the toolpath that uses the least amount of energy to machine a part.

### Experiments for Toolpath Planning.

The machine-tool coordinates (*x, y, z*), with respect to the global reference, represent the location of the cutting tool. The toolpath is then described by the temporal sequences of these coordinates. The tool's sequential moves with respect to the geometry of a workpiece determine the cutting direction and the cutting strategy. Figure 9 shows four different toolpaths that were explored to machine the pocket shown in Fig. 10. Table 6 shows the process parameters used to execute these four different toolpaths. Each toolpath is composed of cuts in different directions and with different cutting strategies. The goal is to select the toolpath that minimizes the predicted energy consumption before actually machining the part. This prediction can then be compared to the true energy consumption measured during experiment.

### Energy-Efficient Toolpath Selection.

Table 7 summarizes the results of the experiments to machine the part shown in Fig. 10 using the four different toolpaths in Fig. 9. The required energy varies depending on the toolpath used, and the energy prediction function predicts the total energy consumption with good accuracy. Figure 11 compares the measured and predicted energies for each toolpath. The error bar on the predicted energy usage represents the 95% interval, i.e., $\mu \xb11.96\sigma $, for the predicted total energy consumption. Note that the measured energy values all fall within the 95% confidence bound on the predicted total energy. With the predicted energies, the toolpaths can be ordered in terms of their energy consumption, and the toolpath with the minimum energy consumption can be selected accordingly.

## Conclusions

This study demonstrates the use of a nonparametric regression model, namely the Gaussian process (GP), to predict the energy consumption of a machine tool. The GP models the complex relationships between the input machining parameters and the output energy consumption and constructs a prediction function for the energy consumption with confidence bounds. Even though the training datasets in this study include only 18 experimental parts, the models constructed using the machine-learning approach are able to reliably predict the energy consumption for machining a generic test part with the milling machine tool. This is primarily because of the block-wise experimentation and data analysis conducted which rendered each block of NC code as an experiment in itself. In addition, the energy prediction function is used to select the optimum toolpath that uses the least amount of energy to machine the same part.

There are other parameters that can possibly affect the energy consumption pattern of a target machine. For example, the workpiece material or cutting tool geometry and material can affect the energy consumption pattern of the target machine. By including these parameters as input features, the energy prediction model can be further improved and generalized. In the future, we plan to conduct additional experiments to collect datasets that include these features to improve the robustness and generalizability of the energy prediction model.

To effectively establish the energy consumption pattern of a machine tool over time, the energy prediction model would need to be updated continuously with new measurement data to account for the time-varying characteristics of the machine tool (e.g., due to tool wear and machine tool deterioration). Incorporating these characteristics, particularly tool wear, into the modeling approach is one area of future study. Another area of future work is constructing a near-real-time energy prediction model for a machine tool by combining a near-real-time data collection framework with an adaptive GP regression model. We are currently developing a near-real-time data collection framework to retrieve raw data from a milling machine tool and its sensors and convert the data into relevant input features. In addition, we are currently investigating the use of sparse representation of the covariance matrix to reduce the computational and storage demands of GP regression, which can help to update the GP regression model with near-real-time streaming data. We expect that the energy prediction function can be constructed using a fraction of training data points (perhaps as few as 10%), which can reduce the training time without sacrificing accuracy significantly.

As alluded to in Sec. 5, an energy prediction model that is continually updated can be used to monitor the condition of machine components. One area where we can apply energy prediction for machine tool monitoring is anomaly detection. For example, a sudden, unexpected event (such as tool breakage or machine collision due to incorrect tool offsets) may cause a deviation between the predicted and the actual energy consumption or power demand, which can trigger an immediate alarm. Developing monitoring strategies based on deviations between the predicted and the measured energy consumption or power demand represents a potentially impactful area of future study.

Finally, in addition to energy consumption, toolpath selection can be based on other criteria, which may include minimum machine operating time, minimum impact to a machine tool, and optimized surface roughness. Given data-driven predictive models for different performance features, such as time, tool wear, or surface roughness, an efficient toolpath can be chosen by considering these impacts individually or simultaneously. This study suggests one possible scenario of integrating the energy consumption prediction function into a computer-aided manufacturing (CAM) system. The integration of the energy prediction model into the CAM system would allow smart and well-informed decision regarding toolpath selection.

In conclusion, this study shows that with advanced data collection and processing techniques, prediction models can be constructed to predict energy consumption of a machine tool with multiple operations and multiple process parameters. The specific energy prediction model that was generated in this study would work for generic parts machined on a Mori Seiki NVD1500. The methodology that was described, though, could be used to create prediction models for other machine tools to enable improved planning and operations in various shop-floor environments.

## Acknowledgment

The authors acknowledge the support by the Smart Manufacturing Systems Design and Analysis Program at the National Institute of Standards and Technology (NIST), Grant Nos. 70NANB12H225 and 70NANB12H273 awarded to University of California, Berkeley, and to Stanford University, respectively. In addition, the authors appreciate the support of the Machine Tool Technologies Research Foundation (MTTRF) and System Insights for the equipment used in this research.