About 41% of total energy consumption in the United States of year 2014 is used for heating and air conditioning, which is about 40 quadrillion (40 × 1015) British thermal units (BTUs). Despite the fact that people have been working on fault detection and diagnosis (FDD) for heating, ventilation, and air conditioning (HVAC) systems for a long time, very few publications have focused on scalability and low cost. To address this challenge, we will propose an approach that focuses on control system data. Several machine learning algorithms are introduced for data exploration (e.g., DBSCAN, k-means) and analysis, rank-ordered weather data are used to define comparable days, a control system data focused model free approach is presented as well, and finally, fault detection is carried out by implementing anomaly algorithms (local outlier factor and isolation forest). Hidden faults were detected with a detection rate up to 90%+. The threshold parameter can be determined by selecting an acceptable true positive and false positive rate pair, which can be visualized using receiver operating characteristic (ROC) curves and is demonstrated in this article. A simulation model that is used to generate about 250 GB of data is used to evaluate the performance of various algorithms.
Ideally, every building has its heating, ventilation, and air conditioning (HVAC) system designed and made specifically, and all its components have been installed properly. In practice, the real cases are usually not perfect, and systems have faults from the very first day of installation. In addition, after years of usage, due to the lack of maintenance, many HVAC systems that seem to work properly have internal faults, which results in a large amount of waste in energy. Even a small percentage of waste adds up to a large amount over time due to the fact that HVAC systems are implemented almost everywhere.
Under normal conditions, people usually do not notice and forget about the quietly working HVAC system. Once people are aware of it, it is when something feels wrong, which usually indicates some serious problems, and the repair is expensive. This can be avoided by scheduled routine checks and maintenances. The best strategy is to prevent failures by spotting the faults when they are minor and easy to fix. However, a routine manual maintenance can be expensive, as it requires a lot of labor.
Our work is based on HVAC system simulations. To achieve scalability and lower the cost, we will be adopting a data-driven approach focusing on control system variables. To our knowledge, HVAC FDD is mostly carried out focusing on thermodynamic/physical variables, neglecting the control system (or at least not shown explicitly). Because HVAC systems work, we deduce that variables of the control system provide many crucial information of the overall HVAC systems. Scalability has often been left out in HVAC FDD. Many FDD methods are model specific and need to be tuned and set up, that is, system specific. Also, they more or less assume that there are known faults and somehow know their patterns/characteristics. The system-specific setup and prior knowledge make them less scalable.
Due to our focus on scalability and control system data, other benefits such as easy installation and low cost follow along. As mentioned earlier, this approach is not system specific; therefore, the setup process does not require one to know much of the HVAC system. We only need to be able to access data. Because there is no need of tuning involved, nor is the method designed for a specific HVAC system the deployment cost should be lower.
In the following sections, we will first talk about our approach, so the readers will have a general idea of how HVAC FDD is carried out. Then, we will briefly look at other approaches people have done. The importance of control system data is then discussed, for we will be focusing our work on it, and followed by an introduction of anomaly algorithms we have used. How experiments are conducted will be mentioned shortly after the experiment models and data are introduced. Finally, we will discuss our results of two different HVAC models and how algorithm parameters are chosen.
2 Our Fault Detection Approach
From the literature reviews, we have found that due to the popularity and potential of applications of statistical learning tools, research on data-driven methods has become a trend. However, most researches are making the assumption implicitly that they more or less have some prior knowledge of the HVAC system they are dealing with. Some approaches using supervised learning tools even have prior knowledge of the kind of faults that exist and what their patterns may look like. Research on how faults impact HVAC performance is conducted  as well. These researches help us learn more about the connection between faults and our targeted HVAC system; nonetheless, these researches do not account for scalability, which is an important objective of our work.
On the other hand, in the field of HVAC FDD, unsupervised approaches seem to rely heavily on principle component analysis (PCA) [2–8] methods. A very large portion of publications apply PCA as both a dimensionality reduction tool and a fault detection method as well. Very few articles have practiced newer anomaly detection tools such as local outlier factor , isolation forest , or other tools that have been invented in the past two decades. PCA does not perform well when multiple operating modes exist; also, PCA is not robust to anomalies, as it uses the covariance matrix . In practice, false alarms occur often during transient states between operating modes.
Because of the complexity of HVAC systems, a dimensionality reduction is needed for data analysis. PCA and partial least squares (PLS)  are often chosen for this task. We should remember that correlations do not represent causations although they are usually indicators of some hidden relations. Even if statistical tools based on correlations, such as PCA, are able to find meaningful connections among the vast number of variables, the lack of meaningful explanation for these connections leads to questionable results. Hence, we approach HVAC fault detection by focusing on its control system. To our knowledge, HVAC FDD is mostly carried out focusing on thermodynamic/physical variables, neglecting the control system. In addition to the reasons and advantages we will talk about in the control system data section, focusing on the control system data also carries out the dimensionality reduction task for us. This not only reduces the number of variables significantly but also delivers results with much more meaningful explanations.1
2.1 Our Proposed WorkFlow.
The goal for our fault detection approach would be scalability; thus, low cost and easy to set up would also be the requirements that come with it. The approach is not system specific; therefore, the setup process does not require one to know much of the HVAC system. We only need to be able to access data. With this in mind, our work flow would be the following shown in Fig. 1.
3 Heating, Ventilation, and Air Conditioning Fault Detection
Modernly built commercial buildings with modern HVAC systems installed are equipped with sensors to monitor how the system is functioning. Their control systems make use of these sensor feedbacks to make decisions and control the components. If we have access to the systems, we have access to setpoints and sensor data. In the past, FDD is often approached by using model-based methods; that is, people build a physical or simulation model of the system and compare how the model differs from the system behaviors.
3.1 Data-Driven Fault Detection and Diagnosis Approach.
The idea is, given some input data, we build some model2 to generate some measure. On the other hand, we use the input data to establish some threshold or a detection criterion. We then compare the measure with the threshold to determine whether the system is faulty.
Usually, people try to model the input data with some mathematical or statistical model, such as hidden Markov models , Bayesian networks [16,17], autoregressive–moving-average (ARMA) models , Gaussian process (GP) , fractal correlation dimension , etc., along with some threshold/detection criterion setup, depending on the model formulation. Common ones are PCA [2–8], support vector machine (SVM) [21–24], one-class SVM [19,25], artificial neural networks with fuzzy logic [26–30], etc. Moreover, to fall into the “data rich, information poor” situation, dimensionality reduction methods such as PCA, PLS , minimal-redundancy-maximal-relevance [21,31,32], and so on are adopted. Dimensionality reduction not only shrinks our datasets to prevent them from bogging down our computing machines but also specifies the latent features being hidden or masked by redundant dimensions.
3.2 Control System Data.
To lower the cost of FDD, we would want it to be scalable; that is, there would not be a requirement for a specific HVAC model. (Although very accurate, it is very expensive as well.) We would like something that is general to most HVAC systems. What all HVAC systems have in common are their control systems. Although the control system in HVAC systems varies, the basic control components are generally the same; either on/off switches or PID controllers are used for the control loops. Other types of switches are rare; even if they are used, on/off switches and PID controllers are likely to be used in the system as well. Moreover, for a working HVAC system, we suppose all the design and setup have been carefully engineered with a lot of work and effort put into it. Hence, a working HVAC system means its control system works as well. Having this in mind, and since the control system collects all the input data and outputs all the control signals (manipulated variables) after running computations over some rules, we claim that the control system data hold important information of the overall HVAC system. Figure 2 shows a simple block diagram of a control loop.
In this figure, there are basically only five variables: SP (setpoint), ε (error), MV (manipulated variable), PV (process variable), and sensor data. If we put aside noise for the moment, there are only SP, ε, MV, and PV (same as sensor data), four variables. In the controller’s point of view, these four variables are the only information needed to make the HVAC system work. No matter how large the system grows or how complicated it becomes, how the control system works is the same.
An HVAC system is not closed; it is not isolated from the outside environment. External data (including weather data, noise, and heat load) comes in and interacts with the system. Most HVAC FDD approaches found in publications monitor the external variables and sensor data closely. People try to find their relationships and patterns with all kinds of mathematical/statistical models. We here suggest that instead of inspecting external variables and sensor data alone, we should focus more on the manipulated variable data. There are a few reasons why we should focus on the control data. First, every HVAC system has controls; although they may have some intricate designs and/or complicated connections, they are basically composed of on/off switches and PID controllers.
Second, solely depending on weather and sensor data, one could encounter sensor failure and/or inaccuracy problems. That is, an anomaly data point could be caused by a faulty component, a failing sensor, or both. If control data are not included, it would be difficult to tell if the datasets we collected are reliable. Research regarding sensor failures, such as Ref. , requires additional system-specific information, which we do not prefer due to the scalability trade-off.
Another reason why we should focus on control data is that we are targeting hidden faults of an HVAC system. It has been mentioned a few times in papers that hidden faults  is something hard to deal with, but to our knowledge, so far no one has really explicitly stated how hidden faults are handled.3 We believe this is because control systems hide faults. This is the nature of control systems, for they are designed to meet the requirements of user settings by making sure the process variables are controlled and maintained at setpoint values. For example, a heater is controlled by a thermostat to keep the temperature in a room warm during winter. Suppose the fan has been clogged by dust causing the air flow to decrease, thus leading to a decreased heating performance. The thermostat then senses that the temperature is lower than expected; therefore, it keeps the heater on for longer periods to compensate the lost heating performance. This means that the control system is making the overall system work harder to ensure the process variables are in the desired range. HVAC systems are usually designed to have some reserve capacity to take care of different loadings. When faults, which are not serious, occur, they are being hidden by the control system and users would not notice any difference; hence, no one would report or complain about them. However, energy is still being wasted. When it comes to hidden faults, monitoring external and sensor data alone would not be as effective as monitoring MVs (control data), for the controllers have a nature of hiding faults, and inspecting their behaviors directly4 makes more sense. Sensor data may be able to catch some anomaly behavior for a short period of time before the controllers compensate it. On the other hand, MVs will continuously reflect the controllers’ attempts to hide faults, making MVs a better indicator to hidden faults.
Finally, with MV data available, it should help us differentiate the anomaly behaviors with system transient states. Take a simple HVAC system for example; imagine two different scenarios for the exact same building. The first scenario is during a hot sunny summer day; without a doubt, the building gets hot inside and the HVAC system tries to cool the interior by turning on the cooling subsystem. The second scenario is people holding a big event in the building during fall season; the outdoor weather is cool and comfortable; however, the indoors get hot and stuffy due to the large number of people attending the event, causing a lot of heat and rising CO2 levels. The HVAC system responds to this situation by turning on both the cooling and ventilation (fans and dampers) subsystems. In this example, the HVAC system starts cooling the building in both cases but only increases the ventilation for the second case. This is because for the first case, the CO2 level is normal, and further ventilation is not needed; increasing hot outside air intake will force the cooling system to work harder and increase power usage. As for the second case, the outside air is cool and the indoor CO2 level is relatively higher; increasing the ventilation by letting more outside air in would not only help air ventilation but also reduce the work load of the cooling system. From this example, we see that although the HVAC system is cooling the building for indoor temperature, which is higher than the setpoint, it may be working in different operating modes. If we only monitor external and sensor data, one major challenge would be how to determine which operating mode the system is in. We could guess by historical training data; still, it may be vague at times, and we could guess wrong, especially when there are multiple operating modes, thus reporting false positive faults. Another problem would be the transient states between two operating modes. These data points do not belong to either operating modes and could look very similar to faulty behaviors. If not taken with special care, these data points are often being marked as anomaly points, leading to false positives, or even causing trouble to train the FDD model at the beginning.
Due to the reasons stated and discussed earlier, we propose that including and focusing on control data could be beneficial to our fault detection task.
3.3 Anomaly Detection.
An anomaly is a process that behaves unusually; its pattern deviates from normal behaviors. Anomaly has many alternative names; people usually use the term, outliers, in statistics; abnormalities, deviants, and discordants are also used interchangeably. Although not necessarily true, in a working physical system with given inputs, an anomaly usually indicates the existent of fault(s), and exceptions happen due to uncertainties in the system. Since we are working on HVAC systems, which is a physical system and should be deterministic, if anomalies are found in our system, this gives us a hint that it is likely to have faults found, too. Therefore, the task of fault detection mainly relies on anomaly detection methods.
It is true that anomaly detection is a big topic and has many applications in a lot of different areas [34–36]. The challenge is how to properly define what is normal. This is still an open question, but people in the probability and statistics field have come up with the concept of hypothesis testing, and this has been adopted for use in anomaly detection. The challenge becomes tougher when we are dealing with unknown anomalies; that is, unsupervised learning. Compared to supervised learning methods, such as SVM [37,38] and GP , our training data do not have data of known faults; hence, it becomes harder to determine the boundaries of normal behaviors. If there is prior knowledge of what faults look like, we then know their patterns and could clearly exclude them from the set of normal patterns.
PCA , a popular technique used to examine the components of a dataset, is also often used for dimensionality reduction and as an anomaly detection tool . Research using PCA for HVAC FDD has been conducted [3,25], giving reasonable results; however, PCA struggles to tell the differences between faults and change of operating modes in a system. This could probably be improved by setting up a more detailed model for the HVAC system, but this would be going toward the opposite direction of being scalable.
Here, we introduce two anomaly detection methods. One is local outlier factor (LOF) , a density-based approach that is similar to density-based spatial clustering of applications with noise (DBSCAN) [42,43] and ordering points to identify the clustering structure (OPTICS) . Another method called isolation forest (iForest) , which is a depth-based approach, is called a forest because it uses a tree data structure in its algorithm.
4 Building Model and Data
In this article, we will work with modelica because of its features such as being equation oriented, solves differential algebraic equations easily, supports both casual and acausal models, delivers transient state values, etc., which are suitable for our system modeling.
4.1 Data Source.
Our HVAC system data will be collected from simulation results of our HVAC system model built in modelica. The models we use are built using Modelica Standard Library5 (v3.2.3) and the Buildings Library6 (buildings 4.0.0) in openmodelica environment (v1.11.0-64bit). Simulations are run with jmodlieca.org (v2.1). All software packages are executed on an Intel Core i5-2400 3.10GHz CPU with 8GB RAM machine running Windows 10 64bit.
As for input data, we have used weather datasets based on Typical Meteorological Year 3 (TMY3) from National Oceanic and Atmospheric Administration (NOAA).7 Our simulations will be based on the San Francisco bay area and Boston area weather. Two datasets are chosen to show results for different weather inputs. The locations are chosen due to different weather trends; while the bay area has a rather mild change in weather throughout a year, the east coast of the United States has an overall wider range of weather change. We will be using the heat load data from the US Department of Energy (DOE), datasets of “Commercial and Residential Hourly Load Profiles for all TMY3 Locations in the United States,”8 which is a dataset generated by eneryplus9 using TMY310 data. Again, we will choose two different heat load datasets as inputs for comparison. Since our work is mainly targeted for commercial office buildings, we will be using office heat load datasets for the experiments here.
We will start with a single room model, an HVAC system with a single room, an air handling unit (AHU) a cooling loop, and a heating loop. This setup is shown in Fig. 3.
4.2 Data Cleaning.
Generally speaking, when doing data analysis, people will spend most of their time working on data cleaning. That is, we have to make sure our data are in a usable, informative, and meaningful form. For instance, most datasets are either collected by man or documented by sensors in the field. This means that it is not uncommon to find errors, missing values, incomplete data, incompatible format, sensor failure, misplaced sensors, wrong installations, noisy data due to environment, etc.; all kinds of things could happen.
Since the parameters’ initial conditions of the HVAC model would not be exactly the same as input datasets, the system will have an initializing period to put everything into its “normal” state. Thus, we will remove the first couple of hours to avoid noisy and unstable data. Furthermore, while working on the datasets, we will have a lot of tables. Conversions between parameters, formats, pivot tables, data dimensions, and units must be done with care, or we may end up with strange and unrealistic results.
4.3 Clustering and Exploration of Input Data.
Starting with the temperature profiles, we run a DBSCAN clustering algorithm with the weather temperature data. We did not adopt k-means here because it is difficult to determine the number of clusters.
At first glance, the results are reasonable and look good. However, we soon realize the results shown in Fig. 4 do not really tell us much. This is actually because the “shapes” of temperature profile curves are very similar throughout a year for a fixed regional location. We can arbitrarily group neighboring temperature profiles together and then end up with very similar results that look reasonable. Temperature data are our input data for the HVAC system; thus, how to compare two different temperature data sequences for two days is important. Apparently, from the clustering results, we do not gain much information regarding comparable weather data. Nonetheless, we did learn that clustering algorithms are not the tools we are looking for to make two sequential data comparable; what we need here would be a pattern matching technique instead of clustering. Also, the “shape” is an important feature of the temperature profiles. Collecting more data samples will not help since the new data samples will be absorbed into their nearest neighboring clusters, making the clusters grow larger; therefore, causing the boundaries to expand. This does not help us define the pattern of a data sequence.
HVAC systems are built on top of physics; why they work is because of the physical laws. That is, for inputs given, the outputs should be deterministic. To rephrase this, with the same inputs, we should always end up with the same outputs. However, in real cases, the measurable inputs are the weather data; heat loads generated in and out of the HVAC system are, in general, unlikely to be measurable. These inputs, or external disturbances (Fig. 2) in a controller’s point of view, are considered as noise to the system.
Before carrying out the exploring work,11 we did not have the prior knowledge or information about the heat load dataset. Blindly feeding all these datasets into our HVAC model and fault detection algorithms, we ended up with very poor results. After knowing there are multiple patterns in the heat load data, we were able to narrow down the uncertainty by differentiating workdays and non-workdays; therefore, we were able improve our results significantly.
4.4 Time-Series Rank Ordering.
Given a weather day data in sequential form, to find comparables and make comparisons, we would have to put “similar”12 days together. The reason why clustering is not a good choice for grouping similar daily data for our case is because for a data sample in a dataset, the distance between itself and other data samples could be very far. This depends on the size and the density of the cluster the data sample is in. For example, in Fig. 4, we see that weather data profiles are clustered into five clusters. It is obvious that cluster 2 (orange) has the largest range/variation by simply reading the plot. Let us look for the largest distance between two data samples within the same cluster; from the figure, we see that in cluster 2, the farthest distance between two data samples is much larger than other clusters. If we pick the top and bottom two data samples from cluster 2, the distance between these two are much larger than their neighbors that happen to be in different clusters.
Depending on the data structure, clustering algorithm used, and how many data points are sampled, we would end up having different results; the similar group a data sample is in varies. What is more troubling is that as the size of a dataset grows, it is very likely that the sizes of clusters grow as well, causing dissimilar data samples being put into the same cluster. That is, members of a cluster become less and less comparable, while they are supposed to be similar data samples.
We propose to use rank-ordered time-series data for comparing input weather temperature profiles. First, rank ordering is relatively simple and straightforward; we all know that simple means fast in computations. Second, as mentioned, using sophisticated tools for a simple task is not only an overkill but also introducing more work. Take SAX [50–52] for example; SAX does not really help us much here, for temperature profiles are similar in general; if we adopt some discretization with a resolution too low, all profiles will be classified as the same symbolic pattern. On the other hand, if we choose a discretization with high resolution, we would be able to tell the profiles apart, but then we would not have comparables, for they are considered different. Moreover, the temperature is usually low during early mornings and after sunsets; symbols reserved for high temperatures for these time periods are basically never used. The opposite is true for low-temperature symbols used during noons. Third, rank ordering is actually having a “cluster” tailor made for every time-series data sample, meaning each data sample is automatically being at the center of its rank-ordered group. Therefore, we think using a simple rank ordering method would be a better choice.
We perform a comparison of five different distance measures used for rank ordering. We arbitrarily select a day as a reference and compare annual daily temperature profiles with it. A list of days is sorted with an order of the distance measure scores for each distance measure. The closest day would be put at the top of the list, and the most different one would be put at the end. We then compare how similar these lists are. Two comparison methods are conducted, and a visualized order similarity plot is shown in Fig. 5. One way to compare the ordered lists is to find matches in the lists. We start by taking two lists and go through each member in the lists. If members with the same index match, we would add 1 to the accumulative score. Once we have finished with all members, we divide the accumulative score with the total number of members and result with a similarity ratio. All five distance measures are compared to each other, and a similarity ratio matrix is presented in Table 1. A similar method can be done by computing the distance between members with the same indices in the lists. We would have an overall distance sum after summing over all member distances in the lists. Again, we do this comparison for five distance measures, and the results are listed in Table 2. In both tables, the last column is the sum for each row; this gives us an idea about how similar/dissimilar each distance measure is compared to other measures.
Figure 5 demonstrates the ordering for using these five different distance measures. We see that the absolute, Euclidean, and dynamic time warping (DTW ) distances look closer than the other two, which is consistent with the results presented in Tables 1 and 2.
From the results, we find that when it comes to rank ordering temperature profiles, these distance measures all deliver similar results. The Euclidean distance seems to work best here, and people have commented that it is a competitive distance measure in many applications [50,54]. SAX and high-low13 method gives slightly different ordering results, which is reasonable since SAX uses a discretized approximation for calculations and high-low only uses two scalar values to sort the order. It is surprising that the high-low method works well by just using so little information and computations. We believe that this result is due to the fact that temperature profiles are rather stable and similar in shape. On the other hand, DTW distance gives us a very similar result to the Euclidean distance. The advantage of DTW is that it is good at measuring dissimilarities shape-wise, with a much higher computational power requirement as a trade-off. However, as mentioned, temperature profiles for a local region are generally similar in shape; DTW would not provide us better results. Furthermore, for long sequential data, DTW distance degenerates to the Euclidean distance [50,55]. Because of our limited computational resource and very little result differences, we will leave out DTW in later data analyses.
4.5 A Fault Detection Approach Based on Raw Time-Series Data.
We demonstrate an approach to conduct HVAC fault detection based on raw control variable time-series data of the model shown in Fig. 3.
4.5.1 Simulations With modelica.
In this section, we will discuss how we set up our model in Modelica for simulations. This model simulates a rooftop unit used for a single room with a constant air volume (CAV) setup.
The input datasets for our models are cleaned and restructured. Weather data are put into the system as outside environment conditions, which can be monitored by sensors. Heat load data are also plugged into the HVAC model but without sensors monitoring. Heat load data are used for simulating possible heat load patterns, which should have a similar pattern from day to day and change seasonally with the weather data. Therefore, heat load day data are selected based on the date of weather data (with a random ±2-day range to introduce some randomness) to reflect seasonal changes, but we will separate non-workdays from the heat load data to avoid large variance and uncertainty of the data.
A normal operating condition dataset is generated by simulating this model for 362 days. This dataset will be used as a baseline reference for the HVAC system. New datasets will be compared with this normal reference by anomaly detection algorithms. Another normal condition dataset with randomly selected dates is generated for true positive tests. We have come up with six different faults and have built variant models to run simulations. The faults are listed in Table 3. Each of these testing datasets contains at least 100 samples of data with randomly selected dates for weather data and corresponding heat load data with a random ± 5-day range and excluding non-workdays for their dates. Note that all faults we have introduced to the HVAC system are hidden faults, that is, they are minor faults to the overall system, and their affection are compensated by the control system, so residents inside the building will not notice any differences.
4.6 A Centralized Multiple Room HVAC System Model.
In contrast to the model used previously, we will have three rooms with individual variable air volume (VAV) boxes and a central AHU providing constant cool air. While we have used a CAV control mechanism to maintain the room environment in our previous model, we will apply a VAV control strategy for our centralized multiple room in this section. Each room will be set to have different sizes and with their own temperature setpoints. In addition, each room will have their own heat load data added to simulate different usages. A schematic diagram of this model setup is shown in Fig. 6.
4.6.1 Simulations With modelica.
Two sets of weather data including San Francisco and Boston area are reused. We will classify heat load data into workdays and non-workdays to avoid large variances and uncertainties. Since we have three rooms this time, each room will have hourly heat load added based on the input weather date with a ±2-day range of randomness for the training data generation. Testing datasets will be generated in a similar way, but the heat load data will be drawn using a wider range of ±5-day to introduce randomness while keeping the seasonal dependency of weather data.
A training dataset of a normal operating model comprising 362 days is generated. This dataset is used to train our algorithm for a baseline. Another normal operating testing dataset is generated, which is used to test the true positives and false negatives for true positive rates. Five types of faulty HVAC system model variants are built; these models are simulated to generate our faulty datasets. These faults are listed in Table 4. At least 100 data samples are generated for each testing dataset with randomly selected dates for weather data and corresponding heat load data with a random ±5-day range and excluding non-workdays for their dates.
5 Fault Detection Results and Discussion
We construct a rank order list of the training data for each time-series data sample in the testing dataset; the testing data sample is then added to this group to form a testing group sample. On the basis of this testing group, we look up their corresponding manipulated variables. The two anomaly detection algorithms are then run using the manipulated variables of this testing group sample. We have run these tests with four different distance measures, the absolute distance, Euclidean distance, high-low distance, and SAX. Also, fault detection tests are run for four different HVAC datasets using different input settings.
The results are shown in Figs. 8–11. From these plots, we have first noticed that all four distance measures deliver very similar results. This is not surprising after we have compared their differences for rank ordering time-series. We have found that the high-low distance measure gives slightly lower fault detection rates in all of our experiment setups; this is probably because using only two values to represent the temperature profile for a whole day is less discernible after all. However, with such little information, results of the high-low distance are already very impressive. On the other hand, by comparing the figures, we have found that the overall fault detection rates seem to be higher for office type 1 compared to type 2, especially for less detectable ones, such as fault 3. To check this, we pulled out the heat load datasets and found there is obviously a much larger variance for the heat load type 2 dataset; this may be the reason for lower fault detection rates.
5.1 Parameter Selection for Anomaly Detection Algorithms.
To get a better picture of how our approach performs, we will introduce the area under receiver operating characteristic (AUROC)  curve.
If one wants to simplify these two numbers further by combining them as a single curve, which is a function of FPR with an output of TPR, one could use different parameter settings and draw the receiver operating characteristic (ROC) curve. This plot shows us with a “cost” of FPR, how much “gain” of TPR we get in return. One can use AUROC to optimize14 a classifier’s parameter setting and illustrate its performance simultaneously. An ideal classifier would have a curve going way up to the upper left corner (TPR = 1 at FPR = 0), and a poor classifier would be close to the diagonal line.
ROC curves of our HVAC fault detection approach for all six kinds of faults, two anomaly detection algorithms (iForest and LOF), and two distance measures (Euclidean and high-low) have been drawn using 18 different parameter settings, which are shown in Fig. 7.
By inspecting the ROC curves, we see that our fault detection approach has different fault detection performances for different kinds of faults. The ROC curves for fault 6 are basically lying over the diagonal line; this simply means that fault 6 is not detectable. This is consistent with results of fault detection rates shown in Figs. 8 and 9 (Figs. 10 and 11 as well).The diagonal line means that with one cost of FPR we get one TPR back, so there is no difference to randomly guessing. This is still considered as a classifier, but does not benefit us. In addition, for faults 2 and 5, which are the same type of fault, a fault detection rate drop (about 10%) is noticed; we can also spot this performance drop in their ROC curves. This phenomenon is also true for faults 4 and 6; apparently fault 4 is detected successfully by our method; however, fault 6 is not detectable. Isolation forest seems to perform slightly better for fault 2 and 5, while LOF works slightly better for fault 4, yet, both methods are comparable. Fault 3 is a cooling system outputting 8 °C cool water compared to a normal cooling system that outputs water at 6 °C. This minor change in a subsystem is detected but not very sensitive.
From these results, we conclude that the fault detection rate depends on the nature of the HVAC system itself, anomaly detection algorithm and parameters used, fault types, and severity of the faults. Strictly speaking, AUROC is used to depict the performances of classification (supervised learning) methods, while our fault detection work is considered as an unsupervised learning method. However, in this section, we are discussing and testing how our approach works; hence, for testing purposes, we have revealed the fault labels to check the results after running fault detections.
To conclude, ROC curves are only used to present and help us to both understand the performance of our approach and select parameters of our results visually.
5.2 Results and Discussion: A Centralized Multiple Room Heating, Ventilation, and Air Conditioning System Model.
Since this section is designed to test out the robustness of our fault detection approach for different HVAC models, we conduct the same procedures as we did for our one room model. That is, the environmental weather is the same, and the heat load datasets are reused; only the HVAC system model is changed. Some minor differences exist due to the different HVAC models used; the rest of the simulation setup is basically the same. However, the manipulated variables are different because of the fact that we have changed the model. Therefore, we would again build a rank order list of the training data for each testing data sample, adding this data sample to the order list and form a testing group. After locating the corresponding manipulated variables data, we run anomaly algorithms on them. We have used four different distance measures for rank ordering. Two different weather datasets and two types of office heat load datasets are used.
From the results we have demonstrated here and previously, we can say that our fault detection approach using raw time series works and delivers reasonable detection rates. It is true that the overall performance would be affected by many factors, and we are assuming the heat load patterns are in general very similar among work days for a specific building. However, our results suggest that a data-driven fault detection approach using manipulated control variables being scalable, reliable, automatic, and economical is achievable.
5.2.1 Parameter Selection According to Experiment Results.
The parameter we are talking about here is the threshold parameter. The meaning of this parameter value stands for the proportion of outliers in the dataset. Because both anomaly detection algorithms we have used, LOF and iForest, give us some score of the new data sample after comparing it with the training dataset, a threshold still needs to be assigned to tell at what score are the data points considered as outliers. The decision function here is using the distribution percentile of the training dataset. Hence, a trade-off of a higher fault detection rate versus a lower false alarm rate is set by adjusting this parameter.
Figure 16 shows ROC curves with different threshold parameter values labeled for two different weather and two different office heat load datasets using LOF as the anomaly detection algorithm. We see that larger variances, e.g., Boston weather with office type 1 heat load data, lead to poorer overall performance in Fig. 16(a). The general trend is the same; that is, by lowering the threshold parameter, we get lower false positive rates by sacrificing true positive rates. Again, how this is chosen depends on the system, inputs, desired sensitivity of our fault detection system, etc. However, from our experimented results, a value of 0.007 (99.3 percentile) seems to be a reasonable choice for us, since we would prefer a low FPR for HVAC systems. For more conservative settings, a value of 0.002 (99.8 percentile) is probably the choice to go with. This parameter selection is based on our experiment results. Note that results in the figures shown were using the same default parameter settings and were not optimized using AUROC. This is to show how performance varies due to different weather environments, HVAC systems, heat load, etc. FPRs and TPRs are trade-offs while choosing a threshold parameter; one can always suppress the FPR by selecting a high threshold if a low FPR is more important than a high TPR. The selected window size we have used is one day, for one day is a natural time length for weather and heat load data. Thus, our approach gives us a verdict about the HVAC system daily. Other window sizes are reserved for future work. Exact processing time was not recorded; however, once data are collected and in the required format, the computation time for making a verdict for a data point (one day) is close to one second for one variable, which is fast enough for our purpose. This is using an Intel i5-2400 CPU with a traditional hard drive running python code. If faster computations are needed, one can always implement this on a faster machine, use multiprocessing code, run in C code, or use approximations to down sample the time-series data.
To achieve our goal of being scalable, we have introduced the notion of focusing on control systems, for they are considered to be universal in most systems, including HVAC systems. Moreover, due to the nature of controllers that tend to hide faults, exploiting control data gains us the benefit of discovering hidden faults of a system, which would be a much harder task if we are only relying on sensor data. Sticking to our goal of being scalable and low-cost, we have chosen to adopt machine learning and data analysis techniques using raw time-series data.
We have introduced the use of modelica as our source of simulated data. This choice is made due to two reasons; one is because of our limited access to real building data, and second, the ability to conduct experiments on multiple HVAC models and/or run tests on the exact same building under identical conditions. This would not be possible, or at least extremely expensive if we worked on real buildings.
It is demonstrated that a fault detection task can be simplified significantly by focusing on control variables alone. This not only saves us a great amount of effort to work on feature selection but also helps us avoid the daunting high-dimensionality space. We have also learned that solving a problem with an insightful approach is easier and more efficient than throwing all kinds of data analysis tools to it.
According to our experiment results, it is shown that data-driven fault detection approaches have great potential. Exploring tools, such as clustering algorithms, were able to help us identify the similarities among different days in the weather datasets. Anomaly algorithms were able to learn the rules by comparing raw data without an expert to explicitly list out the rules. These techniques become more and more important as datasets grow larger.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
The word model here means a mathematical or statistical model; we are not referring to the HVAC model.
Most papers do not even mention hidden faults; that is, a distinction of faults (hidden or not hidden) probably is not made. In addition, some approaches are implicitly assuming some knowledge of faults are known (supervised learning) while some do not (unsupervised learning). This can usually be distinguished by looking at the machine learning algorithms used.
Let us say some component (plant in Fig. 2) has a glitch and is not functioning normally. External data should be independent of the HVAC system, and sensor data are the result of external data interacting with the HVAC system, making it indirectly related and less representative to hidden faults.
Modelica Standard Library is the standard and basic library that is free and comes with all modelica simulation environment packages.
A library that is built on top of the Modelica Standard Library. It includes models for HVAC systems, controls, heat transfer, etc. See http://simulationresearch.lbl.gov/modelica/ for more details.
A number of clustering methods were used including DBSCAN, k-means, mean-shift, etc., which all give similar results.
High-low distance is calculated by taking only the highest and lowest temperature data points in the time-series data.
Depending on the task one is dealing with, optimize here does not necessarily mean to maximize the TPR/FPR ratio. For example, one may want to minimize FPR even with a loss of TPR in some cases to avoid system interruptions.