## Abstract

This paper develops means to analyze and cluster residential households into homogeneous groups based on the electricity load. Classifying customers by electricity load profiles is a top priority for retail electric providers (REPs), so they can plan and conduct demand response (DR) effectively. We present a practical method to identify the most DR-profitable customer groups as opposed to tailoring DR programs for each separate household, which may be computationally prohibitive. Electricity load data of 10,000 residential households from 2017 located in Texas was used. The study proposed the clustered load-profile method (CLPM) to classify residential customers based on their electricity load profiles in combination with a dynamic program for DR scheduling to optimize DR profits. The main conclusions are that the proposed approach has an average 2.3% profitability improvement over a business-as-usual heuristic. In addition, the proposed method on average is approximately 70 times faster than running the DR dynamic programming separately for each household. Thus, our method not only is an important application to provide computational business insights for REPs and other power market participants but also enhances resilience for power grid with an advanced DR scheduling tool.

## 1 Introduction

Progress has been made toward energy managing systems for the residential sector to improve energy efficiency of buildings with smart grid and innovative technologies that integrate control systems with information and communication technologies (ICT) [1–3]. According to the U.S. Energy Information Administration [4], demand response (DR) saved 12,248 MW of actual peak power demand in 2017, and the residential sector alone accounted for 3960 MW of actual peak demand savings. Almost 87% of US residents possess air conditioning equipment that contributed up to 12% of total home energy expenditures in 2015.^{2} Heating, ventilation, and air conditioning (HVAC) power load was used in this study due to data availability, the fact that it has the largest load of household appliances and typically coincides temporally with high-energy prices [5].^{3} Other smart appliances, in theory, could also be used.

Improvement in building energy management relies heavily on ICT [6]. One of the most promising tools to save energy and energy costs during peak power demand in the residential sector is the thermostat.^{4} Remotely controlling thermostats (e.g., by changing set points or limiting the equipment duty cycle) brings many advantages to power utilities and retail electric providers (REPs) [7]. Connected thermostats provide household data to power utilities and REPs, such as indoor temperature and HVAC set point, and they help reduce stress on the electricity grid during peak electricity demand through DR programs [6]. For example, thermostats allow utilities and REPs to adjust residential HVAC (e.g., air conditioning) scheduling during times of peak power demand [8]. They also provide a means to improve electricity management for the distribution system, energy efficiency, and customer-specific services [9,10]. The extension of Internet connectivity into the network infrastructure and cloud computing have begun to appear in the literature related to DR [3,11–13]. However, the current work in this area especially at the residential sector is deficient but essential in the development and implementation for the residential DR programs.

Demand response is the managing of demand-side resources to alter electricity consumption of end-users in response high wholesale electricity prices or when the system reliability is threatened [14]. DR in this paper is defined as an electricity program between consumers and REPs, i.e., power providers, which permits for load shifting over time. The main goals for REPs to use DR are to reduce demands at peak prices and serve as a hedge against price volatility [6]. In turn, REPs utilizing DR in this way will enhance grid resilience for parties such as the independent system operator [15]. In Fig. 1, average hourly locational marginal prices (LMPs) in summer 2017 for Texas were $31.07/MWh with the highest and lowest at $1742.30/MWh and $0.62/MWh, respectively. This represents a wide range of prices which makes planning for REPs difficult. REPs seek to hedge risk associated with volatile real-time pricing of electricity, which can be financially very challenging [17]. Currently, they purchase a large portion of their power portfolio in advance but leave some part for the spot market with real-time prices to balance power supply and demand and minimize costs. In the residential sector, REPs primarily charge their customers for electricity via fixed-price contracts, so they are always looking for methods to reduce the costs of providing energy during expensive periods [16]. DR can be one of the most effective tools at minimizing their risk to high prices by serving as a physical hedge to that vulnerability [13]. In this study, we focus on how to help REPs mitigate financial risk with the use of customer classification for residential DR programs.

Classification of residential customers can improve DR programs considerably [18]. It is useful for the REPs to group users with similar characteristics into specific groups [19]. Having accurate assignment of customers into appropriate classes helps utilities in many areas including improving management efficiency, setting distribution load profiles, and charging service fees differently according to customer classes [20,21]. Having different customer profiles helps REPs better mitigate the risk of unsuccessful DR, reduce customers’ opt-out, and increase savings on both physical energy demand and electricity bills for customers [17].

This paper extends the current practices by developing a process for the classification of residential customers using electricity load data. While the study in this paper was conducted for the Texas power market using HVAC load, the methodology easily extends to other regions and different types of loads. For a different region or season, the LMP profile may vary, but the clustering techniques are expected to be equally effective. Also given the scale of load under control by a single REP, there is the assumption that the load being shifted is not large enough to influence the LMPs. If this were not the case, then this altering of LMPs would need to be taken into account in the decision process. Finally, it may be advantageous for retail power providers to periodically update these customer clusters for increased profitability in the face of changing market conditions.

There are many studies in the area of residential customer classification [5,22,23]. However, few works have focused on grouping customers, based on their electricity loads, specifically for the purpose of load shifting for DR programs [12,24]. Furthermore, the distinct effect between the magnitude of household electricity load and the pattern of residential electricity load behaviors deserves more attention. A number of previous works estimated the potential amount of residential electricity load that could be shifted over time using DR [10,13,25]. However, none have assessed the financial impacts of optimal DR scheduling with respect to different clustered load profiles and variable electricity prices as incorporated in this paper. The use of dynamic programming (DP^{5}) as a sensitivity analysis tool to determine the number of clusters is also new and introduced in this paper.

Theoretically, REPs can sort individual customers by potential profit calculated using optimal scheduling and selectively conduct DR only on the top-ranked portion to have high returns. This is a brute-force method. However, it is computationally expensive since REPs have to determine the optimal DR schedule (i.e., using DP) for each individual customer. Implementing such an approach or other methods with a large amount of customers in real time is not practical. By applying our proposed clustered load-profile method (CLPM), REPs can group customers and then prioritize them by potential profits of each group using DP. They benefit from having only *k* tailored DR programs (i.e., running a DP *k* times) for *k* groups of customers, where *k << N*, where *N* is the total number of customers. Thus, the number of DP model runs can be significantly decreased for near real-time processing, while still accurately and effectively representing the original residential set of customers. Consequently, contributions of this paper include the following:

CLPM and sensitivity analysis to classify customer electricity loads by magnitude and pattern of residential electricity load.

Analysis based on real electricity load data of 10,000 fully anonymized residential customers from Texas between May 1, 2017, and September 30, 2017.

Novel use of clustering approaches for customer classification using a DP to assess REPs’ DR profitability [26,27].

A case study to compare the effectiveness of selecting households for DR between using the greedy method, the brute-force method, and the CLPM.

## 2 Clustering Algorithm and Evaluation

### 2.1 Data.

This study obtained fully anonymized electricity load data from Whisker Labs [28], our industrial partner specializing in residential DR, and online real-time hourly LMP data from the Electric Reliability Council of Texas (ERCOT) [16] as presented in Table 1. In all, there were 10,000 residential customers (households) in Texas from May 1, 2017, to September 30, 2017. The data set has 3552 sequential hourly periods (3552 = 24 h/day times 148 days) that contain LMP and household electricity load data. We transformed the data set in Table 1 into three data sets:

*Data*: Hourly electricity load between May 1, 2017, to September 30, 2017 (dimension: 10,000 × 3552)._{1}*Data*: Average hourly electricity load (dimension: 10,000 × 24)._{2}*Data*: Hourly LMPs (dimension: 1 × 3552)._{3}

### 2.2 Estimating Demand Response Profits With the Dynamic Programming Using Historical Locational Marginal Prices.

Grouping customers into homogeneous clusters achieves two goals. First, it allows REPs more profitable control of existing customers. Second, it permits more informed selection of which customers to enroll in DR programs. With respect to the first goal, different customer load profiles can generate different profits (i.e., savings) in DR events. We aim to enter each customer into as few DR events as possible to reduce their discomfort. With respect to the second goal, enrolling a customer in thermostat-driven DR requires purchasing a smart thermostat for that customer. Therefore, it is not financially advantageous to enroll customers who do not perform well in DR events, generally true of poorly insulated houses that even during DR require significant HVAC load. For both goals, after the important step of clustering customers into homogeneous groups, we require a metric to determine the financial benefits of these groups.

To compute the profits generated by grouping customers, we use DP as formulated in Refs. [26,27] and described in the Appendix. The DP schedules DR events for a customer cluster throughout the day. The customer’s schedule is determined by the times of the day in which a customer’s HVAC load can be shifted to maximize profits for the REP. For example, a customer with a peak load at 12:00 p.m. is likely to generate more profits in a mid-day DR event if there is a price spike than a customer with a peak load at 6:00 p.m.

Demand response can take many different forms such as peak shaving and load shifting. In this paper, DR is used to describe a program initiated between REPs and their customers. Variations of this program have already been implemented in industry. Customers are enrolled in the DR program and are issued a smart thermostat. In return, they give REPs permission to remotely control their thermostat within a predetermined contract such as number of events per season, maximum temperature setback, or other restrictions to maintain customer comfort. When REPs wish to initiate a DR event, they send a command to their customer’s thermostat to increases the set point during summer events or decrease the set point during winter events. When the event is over, usually 1–4 h later, REPs send a second command to return the thermostat set point to its original value. If, at any time, customers are uncomfortable during a DR event, all they need to do is go to their thermostat and adjust their set point, and they will have opted out of the DR event.

To optimally schedule DR events using DP, we divide the time horizon of interest into stages *t* (e.g., hours). Then, we compute the amount of load that is shiftable in each stage for a given customer group. The data used in this study were the hourly household total electricity load. We determine a customer’s shiftable load as a percentage of their total load in each hour. The percent load shifted values presented in Table 2 were fitted to the pseudo data output from a gray box thermodynamic model [29]. The percentages in Table 2 determine the amount of shiftable HVAC load from the total electricity load of a household (i.e., *Data _{1}*). Illustrative percentages of load removed

*α*

_{t}and load recovered

*β*are given in Table 2, which are calibrated for our region of interest (i.e., Texas). These empirically derived factors were used to approximate the complex thermodynamics of heating and cooling. The scope of this study is only HVAC load, as previously mentioned. The percentages presented in Table 2 were calibrated for determining the shiftable HVAC load, and hence, direct application of these specific factors to other shiftable loads would not be appropriate. However, they could be recalibrated for potential broader applications using similar methodology. A DR event with an HVAC system set to cooling mode (i.e., air conditioning) involves temporarily increasing a customer’s thermostat set point, shifting load that would otherwise be used for cooling until after the DR event. More importantly, the net effect of the set point manipulation is not only a shift of the load to a lower price interval but also a reduction in total load use [29].

_{t}*γ*

_{t}in kilowatt-hour associated with those hours only for an illustrative purpose of this particular household. The removed and recovered percentages apply for a electricity load profile or a representative electricity load profile that can be one household or one cluster, respectively.

*F*(

_{t}*S*) is the savings (i.e., profits) at time

_{t}*t*.

*δ*is an energy price in dollar per kilowatt-hour. It is assumed that the customer pays a flat electricity price for this time period

_{t}*p*at 0.048 $/kWh. For the implementation of the DP used in this paper, it is assumed that prices are deterministic and perfectly known. Stochastic versions of the DP are examined in Ref. [26].

_{c}Thus, for the example presented in Eq. (3), a 3-h DR event would save an average of $0.235 per customer. Then, by using the DP formulation presented in the Appendix, an optimal DR schedule would be found for this example scenario, along with the profits generated by that DR schedule using the DP method presented in the Appendix. A similar process would be repeated for all other customers or customer clusters, generating optimal DR schedules for each. Since the DP is applied to individual clustered electricity load profiles, it has no effect on the classification approach.

To illustrate the scheduling outcomes of DP, Fig. 2 shows three randomly selected households from the same region, graphed by DP event duration, potential DR profit, and electricity load profile from May 1 to September 30, 2017. All three households experienced the same LMPs during this period. Since they differ in electricity loads, the resulting optimal strategies and profits are different. Shifting load may be more profitable or less profitable in small margins because customers (or groups of customers) consume electricity differently. For example, a customer may have a valley (i.e., low load) during the price spike in the afternoon. This customer does not use much energy during the price spike, so moving his load may be disadvantageous. Furthermore, it would perhaps preclude him from having another DR during his evening peak load, because the evening DR event would coincide with the afternoon DR event. It is advised not to schedule a DR event (i.e., do nothing as the zero-hour event duration) if the price spike is not temporally correlated with a customer’s electricity load profile.

The proposed DP can alleviate stress to the power grid in cases of excessive power demand or shortage in power supply that cause price spikes. Price spikes are not purely demand based. They can be caused by renewable energy intermittence (shortfalls in supply forecasts) or by generator outages, among other causes. It can be difficult to determine the cause of a price spike, but a significant portion of them can be caused by supply shortfalls, not demand surpluses. In either case, DR can help reduce grid stress. The elevated DR set point is more energy efficient, reducing the total amount of energy used across the DR event and recovery period, relative to the do-nothing case. Thus, if a second peak is created, it is of smaller magnitude than the original peak.

The profits generated by individual clustered load profiles (i.e., by separating customers into clusters) can be compared with treating all customers with the same DR schedule. The difference between clustered and unclustered DR profits gives the value of grouping the heterogeneous array of DR-enrolled customers into individually controlled homogeneous clusters. In addition, the characteristics of clusters that generate more profits indicated ideal candidates to enroll in future DR programs.

### 2.3 Clustering Algorithm.

In the classification literature, the majority of clustering methods are either hierarchical clustering [30] or partition clustering [22,31]. Hierarchical clustering involves setting up a tree structure (i.e., a dendrogram^{6}). Partition clustering, by contrast, subdivides the data in a number of possible ways. We chose partition clustering because it facilitates determining the best number of clusters. There are several partition-based algorithms such as *k*-means, *k*-medoids, and fuzzy C-means clustering [24,32]. Among the partitioning methods, *k*-means clustering is most commonly used to analyze different load profiles because of its simplicity [22,33,34]. Nevertheless, *k*-means is sensitive to outliers in the data as the algorithm searches for centers of clusters based on the average distance between centers and their points [35]. The method of partitioning around medoids (PAM) (i.e., known as *k*-medoids) is more robust to outliers than methods that are based on the error sum of squares (e.g., *k*-means) [35]. A medoid is an element of the set that best represents its cluster (e.g., an element that is most centrally located of the cluster). This is different from *k*-means, which uses an average element of the cluster, which may or may not exist [31]. The PAM algorithm has been employed in Refs. [36,37]. We used the PAM algorithm with the *L _{1}* distance function to discover unknown subgroups of customers based on similarities of the residential electricity load [38]. After identifying a set of

*k*-medoids, the algorithm constructs

*k*clusters by assigning each individual object to the nearest medoid. The PAM algorithm is summarized in Algorithm 1 [31,38].

#### Partitioning Around Medoids (PAM)

**Algorithm 1**

1: **Input**: *k, D*

2: Arbitrarily select *k* medoids from *D*

3: Compute the distances associated with each data point to all medoids

4: **for** each medoid *m* do

5: **for** each non-medoid data point *o***do**

6: Swap *m* and *o*

7: Compute the total distance of the configuration

8: **end for**

9: **end for**

10: Select the configuration with the lowest total distance

11: Repeat Steps 3 to 10 until there is no change in the medoid

12: **Return**: *k* set of clusters

### 2.4 Clustering Evaluation.

Determining the best number of clusters, *k*, in the data set is relatively subjective; it depends on the parameters employed for partitioning and functions applied for measuring similarities [39]. The number of cluster segments can be determined either from internal evaluating methods and/or external evaluating methods [35]. Internal evaluating methods look at the intrinsic information and geometric structure of the data. Choosing evaluating indices is fairly data dependent because each index is more or less applicable to a different type of data set [40].

This study primitively utilized three internal evaluating methods to assess the quality of a clustering algorithm’s results (i.e., the optimal number of cluster segments) including the elbow method [39,41], the silhouette method [31,39,42], and the gap statistic method [39,43]. There is no one overall best approaches but multiple, competing ones. We found that there is no clear structure in the data set as these indices did not agree on the optimal number of clusters [44]. Therefore, an additional measure such as external evaluation is needed.

The purpose of external evaluation in this section is to find an optimal number of clustered segments for a clustering method. It considers information from a previous knowledge about the data set in conjunction with numerical evaluation results to select the number of clusters [35]. This study applied sensitivity analysis on the CLPM, described in Sec. 2.4.1, to determine the optimal number of clusters as the external evaluating method. Also, the external evaluation in Secs. 3.2 and 3.3 is meant to compare the effects of the household selection process between CLPM (after the specified number of clustered segments by a user as a part of the sensitivity analysis) and two sorting methods (e.g., greedy method and brute-force method). All evaluation processes are performed without using other validating data set.

#### 2.4.1 Clustered Load-Profile Method.

The study defines *k _{m}* as the maximum number of clusters allowed for clustered categories for magnitudes of household electricity consumption (i.e.,

*C*).

_{m}*C*distinguishes households into groups of similar electricity load use. For example, five

_{m}*k*means up to five clusters of households allowed based on the level of magnitude,

_{m}*C*= 1, …, 5. In some instances, it is possible to have

_{m}*C*less than

_{m}*k*depending on the structure of that particular dataset.

_{m}*C*is important to capture outliers and noisy data that result from customers with abnormal electricity-use behaviors [45]. Therefore, the data set has to be in an original form to preserve differences in the order of magnitude. Similarly,

_{m}*k*and

_{p}*C*refer to the maximum number of clusters allowed and clustered categories of household electricity consumption patterns, respectively, i.e.,

_{p}*C*≤

_{p}*k*. It is recommended to take standardized data for

_{p}*C*using a z-score method (subtracting the mean and dividing by the standard deviation to get a standardized z-score). The z-score standardization has been found in Refs. [5,35].

_{p}*C*.

*case*

_{i}(i.e., clusters of households) based on their highest potential DR profits from the DP. Because the data set is very large (10,000 observations), our case study requires multiple iterations of Algorithm 2 with small subsets of size

*n*(i.e., number of randomly selected households in an iteration). In step 1, we choose

*n*equal to 3000 for each variation of

*k*,

_{m}*k*, and

_{p}*P*. The numbers for

_{cl}*k*and

_{m}*k*are varied as 2, 4, 6, 8, and 10 for simplicity. In fact,

_{p}*k*and

_{m}*k*can be any positive integer number and do not need to be the same.

_{p}*P*is the percent of households selected that we alter between 10% and 90% in increments of 10%. There are five levels of cluster groups (i.e.,

_{cl.}*k*and

_{m}*k*) and nine levels of

_{p}*P*, so the total number of combinations is 45. In the experimental case study, we repeated Algorithm 2 and averaged output over 50 iterations for each of 45 combinations. In step 4, the Algorithm determines

_{cl.}*C*categories that are differences in magnitude of electricity consumption. Then, the standardized data are clustered again to determine

_{m}*C*in step 5 that distinguish residential electricity-use behaviors. Step 4 and 5 use

_{p}*L*distance in the PAM Algorithm as defined previously in Sec. 2.3. To determine the

_{1}*L*distance between the various load profile, a matrix 10,000 × 3552 (i.e.,

_{1}*Data*) is converted into a matrix of 10,000 × 24 (i.e.,

_{1}*Data*), so the

_{2}*Data*is the averaged hourly electricity load. Then, we chose to cluster

_{2}*Data*on selected hours that are between 12:00 pm and 5:00 pm based on high LMPs and probability of price spikes as shown in Fig. 3. The selected hours are 6 h total, so

_{2}*Data*becomes a matrix of 10,000 × 6. For this particular setting,

_{2}*L*distance between electricity load profile

_{1}*i*and electricity load profile

*j*is defined by,

In step 6, *C*.*case*_{i} is a combination of *C _{m}* and

*C*for each observation. For example, a particular household might have

_{p}*C*equal to 2 and

_{m}*C*equal to 3, so

_{p}*C*.

*case*

_{i}would equal m2_p3. Households are selected if they belong to

*C*.

*case*

_{i}

*. Profit*in step 8 is calculated by using DP on

_{i}*Load*in step 7. Steps 9 to 13 select households by

_{i}*C*.

*case*

_{i}to satisfy the minimum number of households set in step 11. The result of steps 2 to 13 is finalized in step 14 to calculate the expected DR profits. Step 16 returns

*Profit*, that is total demand response profit from households selected by clustered the load-profile method.

_{cl.}##### Clustered Load-Profile Method (CLPM)

**Algorithm 2**

1: **procedure** CLPM(*Data*_{1}, *k _{m}*,

*k*,

_{p}*n*,

*P*)

_{cl.} 2: Randomly select *n* observations out of total *N*

3: *Data*_{2} ← Average *Data*_{1} into hourly periods

4: *C _{m}* ← Determine magnitude category using Algorithm 1 on selected hours of

*Data*

_{2}varied by

*k*

_{m} 5: *C _{p}* ← Determine pattern category using Algorithm 1 on selected hours of standardized

*Data*

_{2}varied by

*k*

_{p} 6: *C*.*case*_{I} ← Assign *C*.*case*_{i} for *observation _{n}* by combining

*C*and

_{m}*C*

_{p} 7: *Load _{I}* ← Determine representative load profile,

*Load*, by averaging

_{i}*C*.

*case*

_{i}

8: *Profit _{I}* ← Calculate DR profit,

*Profit*, for

_{i}*Load*using DP and then sort

_{i}*Profit*descendingly

_{i} 9: *Count _{I}* ← Count number of observations,

*Count*, in each

_{i}*C*.

*case*

_{i}and sort by

*Profit*descendingly

_{i}10: *k* ← 1

11: **while**$\u2211i=1kCounti$ < *P _{cl.}⋅ n*

**do**

12: *k* = *k* + 1

13: **end while**

14: $Profitcl.\u2190\u2211i=1kDP(Loadi)\u22c5Counti$

15: **end procedure**

16. **Return:***Profit _{cl.}*

#### 2.4.2 Sorting Methods.

Because REPs execute DR on only a subset of customers, selecting customers with potentially high profit is necessary for program success. We compare the CLPM with nonclustering techniques based on the performance of selecting potentially high-profits customers. There are two nonclustering techniques in this study, the greedy sorting method and the brute-force sorting method. The greedy method sorts households based on the individual average electricity load from highest to lowest. Then, it selects sorted households in a decreasing order by electricity load, without optimal scheduling using DP. The greedy method assumes that a household with the high electricity load leads to high DR profit. This method is very fast and least computationally expensive. The brute-force method, conversely, determines the optimal schedule using DP for each individual household and makes selections based on their estimated DR profits from highest to lowest. Therefore, this method is slow and very computationally expensive. However, it yields the most accurate results for DR profits because it computes the DP for all households individually. The household selection by CLPM has the advantages of both the greedy method and the brute-force method although to a somewhat lesser extent. CLPM clusters households depending on their electricity load then estimate DP using the representative electricity load profile of each group. It consequently chooses groups of customers by highest to lowest estimated DR profits. The CLPM is faster than the brute-force method but slower than the greedy method. Furthermore, the CLPM generates higher profits than the greedy method, although slightly less than the brute-force method.

## 3 Experimental Case Study and Results

### 3.1 Demand Response and Volatility of Locational Marginal Prices.

With DR programs, REPs can hedge price risks by removing load during periods of high-energy prices and recovering the load back during low-price periods. Therefore, it is important for REPs to target groups of households that will most efficiently allow electricity load to be shifted. Figure 3 is a box plot graph that presents the distribution of average hourly LMPs with a solid line connecting the hourly means. LMPs in the afternoon were volatile having a maximum of $1742.30/MWh at 2:00 p.m. (This figure only shows LMPs between $0/MWh and $150/MWh for ease of presentation.) An overview of LMPs for an entire period is depicted in Fig. 1. High LMPs were observed approximately between 12:00 p.m. and 5:00 p.m. Shifting electricity for the same amount of load during high LMP hours results in greater profits (i.e., savings) than during low LMP hours. Thus, REPs are likely to make more profits from DR programs on households having consumption patterns similar to the solid line shown in Fig. 3.

While the study in this paper was conducted for the Texas power market using HVAC load, the methodology easily extends to other regions and different types of loads. For a different region, the LMP temporal profile may vary, but the clustering techniques will still be the same. The change in the LMP profile might also yield different customer clusterings. Periodically updating the clustering results is something that might be advantageous for REPS, as customer load profiles may change over the course of a year or multiple years.

### 3.2 Opportunities to Improve Demand Response Profitability.

This study compared impacts on potential profitability among the greedy method, the brute-force method, and the CLPM. We determined optimal profits from DR events for all households using DP. As mentioned earlier, this process is called the brute-force method as it calculates individual DP schedules with individual electricity loads for 10,000 households. This method is computationally expensive and requires perfect information on both electricity loads and LMPs. The estimated DR profits from the brute-force method are plotted in Fig. 4. The vertical dashed line (e.g., split by average electricity load) was established by the greedy method with 0.3 *P _{gr.}* (top percent of households selected). Similarly, the horizontal dashed line (e.g., separated by DR profit) was employed by the brute-force method with the same percent household selection portions.

The greedy method, described in Sec. 2.4.2, presents the easiest way REPs can do DR. It assumes that high electricity-use households usually bring higher DR profits. However, it is not always true. Figure 4 illustrates the drawbacks of the greedy method if compared with the brute-force method at 0.3 *P _{gr.}*. The greedy method selected all households in the first quadrant (

*Q*) and fourth quadrant (highest loads), while the brute-force method chose all households in the first and second quadrants (highest DR profits).

The study conducted a sensitivity analysis on the percent of households selected between 1% and 99% in increments of 1%. The differences of estimated DR profits between the greedy method versus the brute-force method are graphed in Fig. 5. The total estimated DR profits between May 1 and September 30, 2017, for all 10,000 households is approximately $204,312. The largest gap between the two methods occurred at 56% household selection with a difference of 3.59% opportunity cost ratio or $5702. This 56% is important to the REPs in terms of which portions of households to DR.

The brute-force method is arguably the best, but it is very computationally prohibitive. Because it is very slow and therefore impractical to implement in real time, we opted not to compare the brute-force method and the CLPM. We proposed the CLPM in Algorithm 2 that bridges the gap between the greedy (i.e., naïve but fast) and the brute-force methods (i.e., most accurate but slow). The CLPM demonstrated higher DR profitability than the greedy method with the significant computational advantage than the brute-force method.

### 3.3 Improvement of the Clustered Load-Profile Method Over the Greedy Method.

In this section, we demonstrate how our proposed CLPM can improve profitability over the greedy method. There are two concepts discussed in the paper related to the CLPM. The first idea is described in Sec. 2.4.1. To implement CLPM into practice, we would multiply the DP profit of the representative load profile by the number of households it represents, as stated in step 14 Algorithm 2. As a result, REPS are able to judge how much they can expect from each of the clustered groups and decide to do DR just on selected groups accordingly.

Since the first concept estimates DP profit for each of the clustered groups, we cannot directly compare DP profit from the CLPM with DP profit from the greedy method as the latter is basically a sorting method. The greedy method selects households based on high values of average electricity load and then runs the DP for each individual household afterward (i.e., using the brute-force method) to estimate profits. The CLPM clusters households first generate representative load profiles for each household group. Then, it runs the DP to estimate profits using representative load profiles. The CLPM only selects groups with high estimated DP profits. For example, if REPs want to select 3000 of 10,000 households to do DR, the CLPM then selects 3000 households, called *Set _{A}*, by clustered groups based on individual estimated DP profits. The greedy method, however, takes the first 3000 households, called

*Set*, from the highest average electricity load without estimating DP profits. A household can be in

_{B}*Set*,

_{A}*Set*, or both. To compare overall DP profits between households belonged to

_{B}*Set*and

_{A}*Set*, the study runs the brute-force method to estimate DP profits on individual sets.

_{B}We modified step 14 of Algorithm 2 only for illustrative purposes of the experimental case study because we want to compare the estimated DP profit with the same number of selected households between the CLPM and the greedy method. In step 14, we use the brute-force method to calculate the actual profits (from households belonging to *Case _{i}; i* = 1,

*…, k*) instead of estimated profits $(\u2211i=1kDP(Loadi)\u22c5Counti)$. For the greedy method, we selected the same number of households to match step 14 of Algorithm 2 that is $\u2211i=1kCounti$. The numbers

*k*and

_{m}*k*are based on the expected DR profits and computational time for each scenario of step 1 in Algorithm 2.

_{p}Figure 7 indicates a positive improvement of the CLPM over the greedy method. The improvement (%) is the percentage increase of the amount of DR profits the CLPM attained over the greedy method. REPs can choose the appropriate number of clusters based on their decision on the *P _{cl.}* value. For example, two clusters (i.e., two

*k*and two

_{m}*k*) yield the highest percent improvement approximately 3.18% at 0.4

_{p}*P*. Below 0.4

_{cl.}*P*, the optimal number of clusters varies with the value of

_{cl.}*P*. Figure 6 shows four clustered load profiles based on two

_{cl.}*k*and two

_{m}*k*from 1 of 50 instances of randomly selected 3000 observations. The clustered groups plotted in Fig. 6 are based on the results of hourly electricity loads between 12:00 p.m. and 5:00 p.m. by the CLPM as discussed in Sec. 2.4.1. On the right of Fig. 6, small lines in each of four panels represent residential hourly load profiles for each household, and the think lines are representative load profiles of each clustered group. M1_P1 and M1_P2 share a similar magnitude of household electricity consumption, but they differ from M2_P1 and M2_P2. M1_P1 and M2_P1 are in the same clustered category of household electricity consumption pattern. Likewise, M1_P2 is considered to have a consumption pattern in common with M2_P2.

_{p}When looking at the ratio of improvement percentage over the computational time, two clusters are still optimal up to 0.4 *P _{cl.}*, see Fig. 8. The CLPM is always better than or equal to the greedy method (i.e., easiest ways for DR) as shown in Figs. 7 and 8. Therefore, implementation of the CLPM is beneficial to REPs in terms of potential success in near real-time DR program with the improvement in the optimization accuracy (over the greedy method) and computational efficiency (over the brute-force method).

## 4 Conclusions

This paper studies the impact of using clustered residential electricity load profiles on REPs’ DR profitability. It applied the PAM algorithm to cluster households based on their magnitude and pattern differences in electricity use. The results are evaluated with DP to determine the optimal number of clusters and the quality of the clustering. The study performed sensitivity analysis and analyzed financial impacts on different customer profiles utilizing DP to estimate potential DR profits. The case study uses a large-scale data set of 10,000 Texas residential households in the summer of 2017.

The proposed CLPM uses intelligent ways to group customers based on thermostat electricity load data that minimize computation time and minimize the effort required to properly gather the set of customers (i.e., who maximize profit/customer). We demonstrate how little information is needed to attain a reliable DR implementation, while improving profitability and reducing the computational burden. For example, two clusters of magnitudes of household electricity consumption and two clusters of household electricity consumption patterns of the proposed CLPM is the best, in terms of highest values of percent improvement (CLPM over the greedy method) over the calculation time ratio, when the percent of households selected is between10% and 40%. Four clusters are good for 50–70% of the households selected. As a result, the residential customers' classification and electricity load profiles obtained from this study are important for decision support to conduct DR on the segmented homogeneous group of customers. REPs can improve the accuracy and reliability of their DR program with approximately 2.3% profitability improvement and between 35 and 350 times faster computational performance over their business-as-usual ways (based on 3000 households). The optimized residential electricity load scheduling tool through cluster analysis and DP demonstrates strong potential to reduce the peak demand and minimize the risk from the volatility of LMPs, thus improving the overall resilience of the power grid.

The challenge of the CLPM (and DR in general) in the residential sector is the variability and small size of the individual loads, and the reduced DR effectiveness this may cause. It is difficult to retrieve data from different regions to make load profiles. It requires not only an industrial partnership but also the customers’ permission to monitor and control their thermostat. For the application of the DP used in this paper, it is assumed that prices are deterministic and perfectly known. Different regions have different applications that can be DR. For example, there is little need to heating DR in Texas. Also, laws and regulations of different regions can affect what DR programs can and cannot do. It must have a customer’s participation through enrollment of the DR program.

## Footnotes

Residential customer hardware used to manage their HVAC system. Utilities (and in our case retail electric providers) have no authority over customer thermostats unless they sign up for a specific program with a qualified product.

## Acknowledgment

The work presented in this paper was supported by the Maryland Industrial Partnerships (MIPS) program and Whisker Labs under Grant MIPS No. 5905. The authors would like to thank the anonymous referees for their valuable comments.

## Nomenclature

### Indices and Sets

### Functions and Variables

*D*=data set of

*n*objects*k*=_{m}maximum number of clusters allowed for

*C*_{m}*k*=_{p}maximum number of clusters allowed for

*C*_{p}*p*=_{c}cost to the REPs of providing electricity to their customers ($/kWh)

*p*=_{1,t}electricity price during each stage

*t*during the DR event ($/kWh)*p*=_{2,t}electricity price during the recovery period ($/kWh)

*A*_{t}=duration of the DR event selected (hours)

*C*=_{m}clustered category for magnitudes of household electricity consumption for 1,…,

*k*_{m}*C*=_{p}clustered category for household electricity consumption patterns for 1,…,

*k*_{p}*L*=_{S1,t}load removed (kWh)

*L*=_{S2,t}load recovered (kWh)

*P*=_{cl.}percent of household portion selected by the clustered load-profile method (%)

*P*=_{gr.}percent of household portion selected by the greedy method (%)

*S*=_{t}number of stages (e.g., hours)

*t*remaining in the current DR event*House.ID*=household identity

*C.case*=_{i}cluster case from combining

*C*and_{m}*C*_{p}*Count*=_{i}number of households in

*C.case*_{i}*DP*(*Load*) =_{i}dynamic programming function using representative load profile ($)

*F*(_{t}*S*) =_{t}saving function at time

*t*($)*Load*=_{i}representative load profile of

*C.case*(kWh)_{i}*Profit*=_{cl.}total demand response profit from households selected by the clustered load-profile method ($)

*Profit*=_{hou.}demand response profit per individual household ($)

*Profit*=_{i.}demand response profit per clustered load profile ($)

*V*(_{t}*S*) =_{t}optimal amount of savings generated by shifting a customer group’s load ($)

*α*_{t}=percentages of shiftable load removed

*β*_{t}=percentages of shiftable load recovered

*γ*_{t}=base load at time

*t*(kWh)*δ*_{t}=energy price ($/kWh)

### Appendix: Dynamic Program Formulation

*t*that discretize the time horizon. They are assumed to be hours in our implementation. The state of the system at any time

*t*is given by

*S*, representing the number of stages remaining in the current DR event. At each stage, an action

_{t}*A*is selected from the set of feasible actions, describing a DR event of 1, 2, 3, or 4 h duration, or the base case of doing nothing. The savings (i.e., profits) generated by selecting action

_{t}*A*is computed using Eq. (A1). The savings at time

_{t}*t*, given by

*F*, can be interpreted as the difference in profit between doing calling a DR event and the do nothing case. Thus, when

_{t}(S_{t})*A*= 0,

_{t}*F*= 0.

_{t}(S_{t})*p*is the electricity price during each stage

_{1,t}*t*during the DR event, and

*p*is the price during the recovery period.

_{2,t}*p*is the cost to the REP of providing electricity to their customer.

_{c}*L*and

_{S1,t}*L*are the amount of load that can be removed at time

_{S2,t}*t*and the amount of energy required to return the house to its original set point as a result of the load shifting at time

*t*, respectively.

*t*and the decision’s value to future time periods as shown in Eq. (A2).

*V*recursively calculates the future value of calling a DR event

_{t+1}(S_{t+1})*A*given the starting state

_{t}*S*.

_{t}*V*therefore defines the optimal amount of savings generated by shifting a customer group’s load.

_{t}(S_{t})