Forecasting Electricity Load Demand- An Power System Planning

Moving holiday electricity load demand forecasting is one of the most challenging topics in the forecasting area. Forecasting electricity load demand is essential because it involves projecting the peak demand level. Overestimation of future loads results in excess supply. Wastage of this load is not welcome by the international energy network. An underestimation of load leads to failure in providing adequate reserve, implying high costs. Many factors can influence the electricity load demand, such as previous load demand, type of the day, coincidence with other holidays and the impact of major events. Hence, 12 independent variables were considered in constructing the regression model to forecast moving holiday electricity load demand. This study investigates Malaysia’s daily electricity load demand data using multiple linear regression to forecast electricity load demand on moving holidays, such as Hari Raya AidilFitri, Chinese New Year, Hari Raya AidilAdha, and Deepavali from September 2016 to October 2017. The result shows six independent variables are significant from the several method variables selections. Overall, the constructed models from this study give promising results and can forecast for next year’s moving holiday electricity load demand with a sample forecasting error of 3.7% on the day of the moving holiday.


I. INTRODUCTION
Forecasting on electricity load demand (ELD) is essential to support the system and operation of the electric utility business in the future. In addition, it involves projections of peak demand levels and overall energy consumption patterns in electrical loads and demand forecasts. ELD forecasting is carried out to represent the main task in planning electricity production because the source needs to be determined, especially in operating the power plant, such as daily fuel consumption. According to [1], ELD forecasting is a central and integral process for planning periodical operations and facility expansion in the electricity sector. Therefore, the study on ELD forecasting is significant to assist utility companies in developing the power system's efficient operation to balance between generation and load demand [2], [3]. Moreover, this may reduce the power system operational cost [4]. Hence, the modeling of ELD with the minimum forecast error becomes very important to obtain optimum cost and maximize profit. Consequently, many previous studies have conducted forecasting in the ELD area, for example [5].
However, the ELD forecasting error is affected by the moving holiday effect. In fact, a 20% increment of forecasting error has been noted on moving holidays when compared with a normal day [6]. If the moving holiday falls on a Saturday or Monday instead of other weekdays, then there is a chance for the occurrence of a significant load forecasting error [7]. Moreover, according to [8], one percent (1%) reduction of mean absolute percentage error (MAPE) in load forecasting saves 10,000 MW in electricity load, which may lead to savings of approximately £1.6 million (around RM9 million at current exchange rates) per year. This means that a 1% reduction in MAPE in 2020 essentially results in savings of more than £1.6 million per year since the price of electricity has increased annually in real terms.
Therefore, this study focuses on forecasting ELD in Malaysia concerning the relatively unique consumption pattern due to the multi-festival holidays. Malaysia has a diverse ethnic, where most of the population are Malays, followed by Chinese, Indian, etc. Each race has a variety of festivals and festive holidays. The main festivals in Malaysia are usually related to religious activities involving Muslims, Chinese, Hindus and others. In addition, the dates of many festivals are determined based on the lunar calendar.
According to three different calendars, the date of moving holidays is based on the Geogerion calendar; Chinese, Hindu and Hijriah lunar calendar. Therefore, this holiday date does not occur on a fixed date each year but shifts from one period to another for many years. Since the ELD patterns on holidays are often idiosyncratic in nature, this leads to significant predictive errors [8]. Therefore, irregular holidays like Hari Raya AidilFitri, Hari Raya AidilAdha, Chinese New Year and Deepavali from one year to the next may influence the results of predicting time series data. In addition, some of these festivals holidays overlap with other holidays and increase the difficulty of the activity for predicting ELD. Another study has been conducted to understand electricity demand during a special day or holiday.
Researchers in the field of ELD involving moving holidays are, for example [13]. Research by [14] considered moving holiday effects and, therefore, gave a better forecasting accuracy for Malaysia's peak daily load. Meanwhile, [15] used dynamic regression intervention modeling for the Malaysian daily load based on moving holidays data. On the other hand, Kim, [8] introduced special day as a dummy variable in forecasting ELD models, forecast moving holiday ELD week based on fuzzy time series using a specific weighted mechanism.
Moving holidays should be considered in the seasonal adjustment to avoid misleading interpretations of seasonally adjusted and trend estimates. The study's reason is to forecast the ELD during moving holidays, such as Hari Raya, Deepavali, Chinese New Year, etc. We can determine how much demand for power people need during the holiday and prevent a shortage of power during that holiday. Moreover, the error of ELD forecasting will increase the operational costs. Overestimation of future load results in surplus supply, which is not welcomed by the international energy network.
On the other hand, underestimation of load causes failure in providing adequate reserve and implies high costs. Therefore, this study aims to identify the significant variable that affects the ELD in the model. We then construct a multiple linear regression model for moving holiday ELD and finally, to forecast the ELD three days before the holiday, on the holiday and next, there days after holiday for moving big holiday events (Hari Raya AidilFitri, Chinese New Year) and not big events (Hari Raya AidilAdha, Deepavali). This study focuses on data for a week moving holidays of Hari Raya AidilFitri, Chinese New Year, Deepavali and Hari Raya AidilAdha which is the moving holiday is on the fourth day of the week [16].
II. METHODOLOGY

Data Collection and Variables
For data collection, secondary data have been used collected from Tenaga Nasional Berhad (TNB) on Grid System Operator's website. This study's scope only concentrates on daily ELD for weeks that only have a moving holiday from 1st September 2016 to 31st October 2017 recorded in Malaysia. The data consists of daily data that partition into two parts. Data from 1st September 2016 until 30th September 2017 (35 data) were used to formulate a prediction model, and October 2017 (7 data) to validate the prediction model [17]. This study used electricity load demand, Y as the dependent variable measured using kilowatt (KW) scale. A total of 12 variables were used as the independent variables, as listed in Table 1. To capture the type of the day effect, qualitative of the day has been introduced into the model through the specification of a dummy variable (Z1) representing the type of that day, which is not a holiday, where the holiday is the base. That is; To capture the coincidence effect, qualitative of the coincidence has been introduced into the model through the specification of three dummy variables representing coincidence in the day, which is before coincidence (Z2), after coincidence (Z3) and no coincidence (Z4), with the coincidence as the base [18].
For before coincidence; For after coincidence; For no coincidence; To capture the big event effect, qualitative of the day has been introduced into the model through the specification of a dummy variable (Z5) representing the type of that week, which is a big event in the week, where no big event's week as the base, given by [19]; The first day of ELD before the holiday X2 The second day of ELD before the holiday X3 The third day of ELD before the holiday X4 The fourth day of ELD before the holiday X5 The fifth day of ELD before the holiday X6 The sixth day of ELD before the holiday

X7
The seventh day of ELD before the holiday Z1 Dummy variable not a holiday Z2 Dummy variable before coincidence Z3 Dummy variable after coincidence Z4 Dummy variable no coincidence Z5 Dummy variable big event

Method of Data Analysis
The objective of this research work is to predict moving holiday ELD using multiple linear regression methods. The developed models are used for predictions of out of sample forecasts. The estimated models can be written [20] as: Since there were so many independent variables in this study, we used the variable selection process to construct the best model that predicts well or explains the data's relationships. There are several selection process methods, which are stepwise selection, backward elimination, and forward selection.
A few assumptions have to be tested in the multiple linear regression analyses because the result is invalid if the assumptions are not met. These assumptions include;  For any specific value of any of the independent variables, the dependent variable's values are normally distributed.  There is a linear relationship between the dependent variable and each of the independent variables.  The observations of the dependent variable are independent of each other.  The variance for the normal distribution of possible values for the dependent variable is the same for each independent variable's value.
In summary, the assumptions describe the probability distributions of the random error in the model where,

III. ERROR MEASUREMENT
Model forecasting performance usually will be compared by using a variety error measurement. The forecasting error in time period t can be defined as the actual value minus the prediction value, e t = y t − ŷ t , where y t is the actual value at time t and ŷ t is the fitted value at time t. This study will employ the Mean Square Error (MSE), Absolute Persentage Error (APE) and Mean Absolute Percentage Error (MAPE) to calculate the error measurement. MSE is considered the most accurate measure to define which models avoid large errors because it can discover large forecast errors. The MSE is given as, On the other hand, MAPE is used to perform comparisons relative or percentage error measures. To compute MAPE, we must first compute the APE for each forecast. MAPE is computed as follows: where n is the number of observation and | ( e t y t ) × 100| is APE that calculates the fitted values for a particular forecasting method. Figure 1 shows the methodology framework of this study. The methodology starts with pre-processing data. In this stage, the collection and cleaning data processes were conducted. Then, in the next stage, the data were partitioned into two parts: estimation and evaluation. The methodology stage continues with the variable selection process conducted in the estimation part of the data using three methods: Stepwise, Backward and Forward variable selection. Next, the regression analysis of assumption checking is performed. After all the assumption tests were satisfied, the next process constructs the significance regression model. Then, forecast moving holiday ELD execution is performed using a constructed model to calculate the forecasting error using data in the evaluation part. Lastly, this study uses the constructed regression model to forecast moving holiday ELD for the year 2018. The error cannot be calculated in this stage because the actual data was not published yet on the TNB website.  Table 2 gives the coefficient for all 12 independent variables and significant p-value. For the individual part, the pvalue in the coefficient table looks significant for individuals or each variable. A significant independent variable will have a p-value of less than 0.05. From the result, the independent variables X1, X4, X7, Z1 and Z5, were found to be significant. Based on Table 2 This model has the multiple correlation coefficient, R = 0.947, indicating a strong correlation between ELD and the one predicted by the regression model. The R Square value of 0.897 implies that all independent variables explain 89.7% variation in the dependent variable. The model formed from these variables is significant, where the p-value for this model is less than 0.05.

Variable Selection Process
Stepwise selection For stepwise selection, the model starts with an empty model. Then, the variable will be added one by one based on the smallest p-value from Table 2. When the variable is added to the model, they will be removed if the model is not significant after the variable's addition. It will stop until the model is significant, and if no variable satisfies, the entry criteria will be added to the model. This model has the multiple correlation coefficient R = 0.933, indicating a strong correlation between electricity demand and those predicted by the X1, X4, X7, Z1, Z2 and Z5. Moreover, the R Square value is 0.87. This implies that all independent variables explain 87% of the variation in the dependent variables, which is X1, X4, X7, Z1, Z2 and Z5. The model formed from these variables is significant, where the p-value for this model is less than 0.05.

IV. BACKWARD ELIMINATION
For the Backward elimination process, the model starts with the full model. The variable that has the biggest p-value, as listed in Table 2, will be eliminated one by one. The process will stop until the model is significant, and those variables not satisfying the removal criteria will be removed from the model.  Table 2. It will stop until the model is significant, and no variable satisfying the entry criteria will be added to the model. Different from the Stepwise selection process, the Forward selection process does not execute the variable criteria after adding the variables. Table 5 Table 5. Only six independent variables, X1, X4, X7, Z1, Z2 and Z5, are significant. The multiple correlation coefficient for this model, R = 0.933, indicates a strong correlation between electricity demand and those predicted by the X1, X4, X7, Z1, Z2 and Z5. The R Square is 0.87, which means that all independent variables explain 87% of the dependent variables' variation, which is X1, X4, X7, Z1, Z2 and Z5.  Table 6 gives a summary of the results based on different variable selection techniques.  Table 6 shows the results to be similar to the results between Stepwise and Forward selection. This is because no variable met the removing criteria in the Stepwise selection process. In a nutshell, all variable selection processes choose the significant variables as X1, X4, X7, Z1, Z2, and Z5, but the Backward selection added another variable, which is Z3. Therefore, we decided to select the best significant models considering variables X1, X4, X7, Z1, Z2, and Z5 only in the model. However, when involving dummy variables, it has to choose all of the dummy sets due to Z2, Z3 and Z4 dummy variables representing a coincident variable. Thus, the significant variables for this model are X1, X4, X7, Z1, Z2, Z3, Z4, and Z5. We then proceed with the multiple regression analysis using X1, X4, X7, Z1, Z2, Z3, Z4, and Z5. The result of the model is given by  The histogram in Figure 2 shows a bell shape. So, it can be concluded from the histogram that they fulfilled the assumptions of normal distributions. Based on Table 7 of residual normality test, the p-value for unstandardized residuals and standardized residuals of Kolmogorov-Smirnov and Shapiro-Wilk would assume that the residuals were normally distributed because the p-value of 0.2 and 0.138 are greater than 0.5. Figure 3 shows the normal probability plot for standardized regression residuals because the points more or less follow the straight line. There occur some deviation towards the center, but generally, the points seem to follow the line. Thus, it would assume linear distribution.  Table 8 shows that the result of Durbin-Watson is 2.093, and the value is near 2, so we can conclude from the results that it fulfills the independent assumptions.  Figure 4, there is no point outside of negative 3 to 3, either at the x-axis or the y-axis. All scattered plots are distributed, and no pattern occurs. So, these scatter plots show that all residual fulfills the assumptions of constant variance.

Constant Assumptions
After all the multiple regression analysis assumptions were proved met, the next section is to forecast moving holiday ELD using constructed model from the previous section. Table 9 shows the actual and forecast value for a week of moving holiday in 2017, the Deepavali event (18th October 2017). From the calculation, the MSE is 1384742, MAPE is 6.62%, and APE on Deepavali moving holiday is 3.71%.  Table 10. V. CONCLUSION The ELD forecasting plays an important role in capacity planning, scheduling, and operation of power systems. This study employs a multiple linear regression method in predicting the ELD. This method is valuable in economic and business research and helps establish functional relationships between two or more variables. In addition, it predicts the value of the dependent variable from the value of the independent variable, and it also tells the nature of the relationship. This regression method can also determine the model, the dependent variable and the potential independent variable. Using the regression method, this study achieved the objectives, which are to identify the significant variable that affects the ELD in the model and construct a multiple linear regression model for moving holiday ELD. Lastly, to forecast the ELD three days before the holiday, on the holiday and three days after the holiday for moving big holiday events, which are Hari Raya AidilFitri and Chinese New Year and not big events such as Hari Raya AidilAdha and Deepavali. Interestingly, the model constructed in this study can forecast ELD on the day of the moving holiday with the smallest forecast error of 3.7% compared to other days in the week. In general, this model may also be considered a good model with MSE equal to 1384742 and MAPE equal to 6.62% in the week of moving holiday. Future research suggests improving the model by considering more factors, such as temperature and school break factors. Other than that, combining and hybrid with other forecasting methods is also in progress, such as using fuzzy time series forecasting approach and combining the statistical forecasting model.