0% found this document useful (0 votes)

3 views17 pages

Mathematical Methods For Maintenance and Operation Cost Prediction Based On Transfer Learning in State Grid

This paper presents a model for predicting maintenance and operation costs in the State Grid using transfer learning and artificial intelligence techniques. By leveraging historical data and employing methods like time series analysis and support vector regression, the model demonstrates improved accuracy over traditional prediction methods, reducing the average absolute error by 10%. The research highlights the significance of effective data utilization in enhancing cost management and operational efficiency within the electric power industry.

Uploaded by

salamsherifftobi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views17 pages

Mathematical Methods For Maintenance and Operation Cost Prediction Based On Transfer Learning in State Grid

Uploaded by

salamsherifftobi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Appl. Math. J. Chinese Univ.

2022, 37(4): 598-614

Mathematical methods for maintenance and operation

cost prediction based on transfer learning in State Grid

GUO Yun-peng1 WANG Dong-fa2,∗

ZHENG Ying1 DING Wei-bin2

Abstract. The electric power enterprise is an important basic energy industry for national
development, and it is also the first basic industry of the national economy. With the continuous
expansion of State Grid, the progressively complex operating conditions, and the increasing
scope and frequency of data collection, how to make reasonable use of electrical big data, improve
utilization, and provide a theoretical basis for the reliability of State Grid operation, has become
a new research hot spot. Since electrical data has the characteristics of large volume, multiple
types, low-value density, and fast processing speed, it is a challenge to mine and analyze it
deeply, extract valuable information efficiently, and serve for the actual problem. According to
the features of these data, this paper uses artificial intelligence methods such as time series and
support vector regression to establish a data mining network model for standard cost prediction
through transfer learning. The experimental results show that the model in this paper obtains
better prediction results on a small sample data set, which verifies the feasibility of the deep
transfer model. Compared with activity-based costing and the traditional prediction method,
the average absolute error of the proposed method is reduced by 10%, which is effective and
superior.

§1 Introduction

Smart grid has become a new strategy of global energy in the 21st century. At present, many
scientiﬁc research institutions and enterprises in China have actively carried out the research
and pilot of smart grid technology. Since the market-oriented reform of China’s electric power
system, the electric power enterprise has actively adapted to the market-oriented reform, and
Received: 2020-11-7. Revised: 2020-12-04.
MR Subject Classification: 91B82.
Keywords: transfer learning, LSTM, support vector regression, activity based costing, State Grid.
Digital Object Identifier(DOI): https://doi.org/10.1007/s11766-022-4319-7.
Supported by the program of science and technology of State Grid Zhejiang Electric Power Co., Ltd., named
Research and application project of standard cost activity based on machine learning(5211JH1900LZ).
∗ Corresponding author.
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 599

explored the construction of a cost control system in line with their own reality, to a certain
extent, promoted the positive development of the electric power industry. The State Grid of
China has put forward the demand of implementing cost standardization and lean management
in 2006. After continuous exploration and in-depth reform, it has gradually built up a set of cost
management system including production, sales, management and other operation standards,
and applied the standard operation in the annual budget preparation process.

The social responsibility of State Grid is to provide high-quality and low-cost power services.
Its core business mainly includes power grid construction, grid maintenance and operation,
procurement and sales, and at the same time, it pursues the maximization of benefits. At
present, however, there are still three problems: first, the level of refinement needs to be
improved; second, the degree of intensification is not high. The development mainly depends on
high investment and lacks a scientific standard system; third, the operation mechanism needs to
be perfect. Therefore, the implementation of standard cost management is of great significance.
It can promote cost lean management, predict cost demand and budget, control expenditure,
analyze cost differences, and optimize operation activities. And then, it also has contributed
to the organic integration of business and finance, so as to control the entire production and
operation management activities, cost arrangement and capital demand. Finally, it improves
the accuracy of budget, so that enterprises can comprehensively grasp the development trend
of enterprises and understand all kinds of business information in detail. Therefore, it can
eliminate the unreasonable factors in historical cost through standardization management and
ensure that the level of cost consumption is fair, scientific and reasonable.

In recent years, with the rapid development of computer information and storage technology,
the mining technology based on big data is increasingly mature, and the value of data is
gradually discovered and utilized. Many industries have entered the era , which has opened
the age of big data. The State Grid has the characteristics of large scale, many devices, wide
coverage area, long running time, complex operating conditions, including power generation,
transmission, substation, distribution and consumption, etc., which will produce a large number
of types of rich data in the process, and the data fully conforms to the characteristics of big
data. It is of great research significance to extract the hidden information from data and use it
to solve key problems in real life. Based on the 2014-2019 historical standard cost data of State
Grid Jinhua Power Supply Company in Zhejiang Province, the research topic is to master the
cost law of the system by mining and analyzing the data over the years, determine the cause
of each cost and its influencing factors, establish and improve the function model between cost
driver and cost, and predict the cost of each module in 2021. The data is based on the year
and does not involve quarterly or monthly data, and some regional data are difficult to obtain
and default. The data correlation of its target is not obvious and it is hard to fit by simple
mathematical models. Therefore, it is necessary to extract the underlying hidden features and
learn more accurate non-linear models.
600 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

Aiming at the problem of non-conventional power data bundle, the network training is
difficult to converge, and the effect is not satisfactory. This paper introduces transfer learning,
facing the prediction task of cross domain data. Based on the theoretical research of time
series prediction and support vector regression, it proposes a non-conventional prediction model,
which can make the neural network model reusable. Compared with the direct training of small
sample data set, these features have higher discrimination and robustness, which greatly solves
the problem of over fitting caused by too few samples. It can efficiently use the historical data
of State Gird, mine its useful information, serve for the target area, and establish a generalized
model. This paper mainly focuses on maintenance and operation costs, the experimental results
show that the method model proposed in this paper has a good prediction effect on the small
sample data set, and has effectiveness and superiority compared with the traditional prediction
model.
The rest of the paper is organized as follows. In the next four sections, we first briefly review
some previous work regarding transfer learning. After that, in Section 3, we formulate the idea
of two algorithms: long shot-term memory network and support vector regression. And then
we propose our model architecture. The simulation and experiment are carried out in Section
4, where corresponding experimental results are reported and discussions are provided. Finally,
a summary of our work is displayed in Section 5.

§2 Related work

Machine Learning [1] has made dramatic advances in theory and practice and has become
one of the major technical cornerstones of Big Data analysis. The core idea is to train machines
by simulating the human brain, to learn, and to distinguish. Traditional machine learning
methods usually assume that training data and test data follow the same distribution. However,
the above hypotheses are often too strict to be satisfied in the fields of computer science and
natural information processing. ‘Big data’ and ‘small data’ are contradictory and unified. There
are large areas, there are small areas, which requires people to be able to analyze both big data
and small data, but this is not always easy. In the development of deep learning, people expect
that machine learning can no longer be limited to large samples, supervised learning, and hope
to realize unsupervised, few-shot learning, or even zero-shot learning through some methods.
How to analyze and mine large-scale data in non-stationary environment is one of the most
challenging frontiers of modern machine learning.

2.1 Transfer Learning

Compared with the existing machine learning mechanism, human learning is quite diﬀerent.
Human usually can learn well on a large number of training samples, or even complete the
learning of speciﬁc goals through other auxiliary information related to the learning goal in
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 601

a small or no sample situation. Human learning has the ability to transfer and transform
knowledge between different fields and problems, which machine learning lacks. To solve the
problem of data and knowledge scarcity, researchers propose transfer learning [2-4], or inductive
transfer, domain adaptation, is an important research problem in machine learning. The goal is
to apply knowledge or structure learned from a certain field or task to different but related fields
or problems. Transfer learning attempts to realize the capability of human learning by analogy,
relaxing the constraint that training data and test data must obey independent and identically
distributed, thus it can mine domain invariant essential features and structures between two
different but interrelated domains, enabling labeled data and other supervised information to
be transferred and reused between domains.
In 2010, Pan and Yang [2] organized and summarized the transfer learning research and
published an acknowledged and most representative review of transfer learning, in which a
specific definition of transfer learning, the basic principles of The schematic is shown in Fig. 1.

Different Tasks Source Tasks Target Tasks

Transfer Learning

Learning system Learning system Learning system Knowlege Learning system

(a) (b)

Figure 1. Diﬀerent between (a) Traditional machine learning and (b) Transfer Learning.

Definition (Transfer learning) Given a source domain Ds = {(xs1 , y1s ), ..., (xsn , yns )} and
learning tasks Ts with no target domainsDt = {(xt1 , y1t ), ..., (xtn , ynt )} and learning tasks Tt , the
goal of transfer learning is to reduce the generalization error of the target domain and improve
the learning of the target predictive function ft (∗) underDs ̸= Dt or Ts ̸= Tt .
In simple terms, transfer learning is to apply the knowledge acquired when solving one
problem to solving another different but related problem, with the aim of getting better learning
results in the new task. It is similar to standing on the shoulders of giants, being able to use big
data trained on computationally powerful devices with good model information to solve their
own tasks and improve network generalization capabilities. At the same time, a generalized
model can be built to establish a foundation for the individual requirements of multi-tasking,
and flexibly respond to different tasks to meet the needs of practical applications.
602 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

2.2 A Categorization of Transfer Learning

From ‘what to transfer’, transfer learning can be divided into four categories: instance-
based, feature-based, model-based and relation-based transfer learning. Table 1 shows these
four categories and brief description.

Table 1. Diﬀerent Methods of Transfer Learning.

Transfer Methods Brief description

Instance Transfer Re-weight the data from the source domain and apply it to the
target domain.
Feature Transfer Find a feature representation which can reduce the diﬀerence
between source domain and target domain data and reduce the
errors of classiﬁcation and regression model.
Model Transfer Find the parameters or prior knowledge shared between the
models of source domain and target domain.
Relation Transfer Build the mapping between the related knowledge of source do-
main and target domain. It does not require the data between
the two domains to be independent and identically distributed.

Instance-based transfer learning assumes that part of the data in the source domain can be
reused in the target domain through weight reuse. That is to judge which data in the source
domain is similar to the target domain through some measurement, and give it a high weight
value [5-7]. The representative work includes: the LP-SVM method proposed by Wu et al. [8],
which improves the classification performance of learning machines by enhancing the training of
auxiliary data; Dai et al. [9] based on the similarity of samples between domains, proposed the
TrAdaBoost algorithm; Quanz et al. [10] introduced a strategy to constrain the mean difference
of the sample distribution between the source and target domains and proposed the LMPROJ
algorithm. In terms of the recently emerged multi-perspective integrated transfer learning, Xu
et al. [11] proposed the multi-perspective Adaboost transfer learning algorithm and Chen et
al. [12] proposed a strategy for multi-source learning; Xu et al. [13] further incorporate both
multi-source and multi-perspective mechanisms in the transfer learning process, and propose
the multi-source and multi-perspective Adaboost transfer learning algorithm, which effectively
avoids the influence of negative transfer and steadily improves the learning effect of transfer
learning; Jiang et al. [14] proposed a transfer algorithm based on the source domain knowl-
edge fuzzy system to address the problem of missing information in the target domain, which
effectively improved the performance of transfer learning.
The feature-based transfer learning approach establishes connections between domain data
from the feature structure, and exploits the intersection at the feature level by transforming
and reconstructing features to discover potential common feature spaces [15-16]. Representative
works include Argyriou et al. [17] proposed a regularization-based spectral framework for
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 603

learning multi-tasking structures; Pan et al. [18-19] who proposed an MMDE algorithm based
on manifold structures by enhancing the mean-center consistency of source and target domain
data in a low-dimensional mapping space to reduce the differences between the two domains. Tu
et al. [20] who proposed the use of domain adaptive algorithm to implement the user transfer-
based dimensionality reduction method , which improves the learning performance to some
extent; Gao et al. [21-22] proposed a kernel space model of latent variables to the pedestrian
detection problem and showed that feature-based transfer strategies can effectively improve the
performance of learning.YingWei et al. [23] proposed a new transfer learning framework called
Learning to Transfer . They analyzed previous transfer learning studies and found that for a
pair of domains, using different transfer learning algorithms would result in different knowledge
transfer.
The model-based and relation-based transfer learning are less concerned. Model based
transfer learning assumes that the target domain and the source domain share some parameters
of the algorithm model or a kind of prior knowledge. By searching for such parameters or prior
knowledge, it completes information and knowledge transfer. In multi task learning, knowledge
sharing between tasks exists in the form of parameter sharing. Lawrence et al. [24] used
Gaussian model to model the prior knowledge between tasks, and learned the shared Gaussian
parameters among tasks, so as to realize the transfer of knowledge between tasks. There are also
algorithms [25] that transfer support vector machine parameters in multiple tasks. In addition,
Zhang et al. [26] designed TELM-SDA and TELM-TDA algorithms through the research on
extreme learning machine.They transfer the knowledge and parameters acquired from training
in the source domain to the target domain, which helps to optimize the target domain model
and realize the knowledge transfer. Transfer learning based on relation assumes that there is
a correlation between the source domain knowledge and the target domain knowledge. For
example, people’s evaluation information on books and people’s interest in movies are just
through these common information to complete the knowledge transfer. When the source
domain and target domain are completely different, such as molecular biology domain and web
page. Davis et al. [27] applied the source domain structure rules represented by second-order
Markov logic to the target domain to achieve the purpose of knowledge transfer.

§3 Guiding Philosophy and Methods

In view of the problem existing in power grid data, this paper uses feature-based learning
to transfer knowledge from source domain to target domain, and tries to build model for target
domain by using source domain information or original network parameters. Considering the
time characteristics of power data, time series is used to predict the future cost, and support
vector regression is also selected as prediction algorithm. In this section, the basic principles
of the two algorithms are described. A hybrid prediction model under small sample data set is
established based on transfer learning. The prediction framework is shown Table 2.
604 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

Table 2. Framework of Transfer Learning.

The Framework of Prediction Based on Trasfer Learning

Step 1: Data preprocessing both Source domain and Target domain
Step 2: Model pre training in Source domain
Step 3: Model adjustment
Step 4: Parameter transfer
Step 5: Model regeneration training in Target domain
Step 6: Fine tuning
Step 7: Target domain model

3.1 Long Short-Term Memory

Based on the traditional artificial neural network, RNN [28] uses a directional loop, which
makes the hidden nodes connected into a ring. This internal structure helps the network to
transfer time information, shows the dynamic time-series behavior, so as to mine the charac-
teristics of time series, and finally make classification or prediction. However, when dealing
with long time series, there will be a problem of gradient disappearance, that is, with the
transmission of information in the time dimension, the perception of the later neurons to the
earlier neurons will decrease, and the information will be gradually lost. To solve this problem,
cell units and three control gates are added to the hidden layer. This structure, called Long
Short-Term Memory(LSTM), was proposed in 1997 [29], and has been gradually improved to
the current classic version. It is suitable for dealing with long time series problems.
As a special kind of RNN, LSTM is effective in avoiding gradient vanish and explosion while
possessing a good ability to handle time series data. In a LSTM cell t, the Cell State Ct denotes
the long term memory transmitted into t, while the Hidden State ht denotes the short term
memory. Ct and ht will forget some old memories and save the new ones under the control
of four different gates. Specifically, f is the forget gate which controls how many long term
memories Ct will be forgot. i is the input gate which determines whether new coming data
xt will be kept. The third one g has an influence on the volume of xt kept and the last one
output gate o determines the size of the output. The exact formulas of the parameter update
are shown as (3.1)-(3.3), where W , b and σ are the parameter matrix, bias term and sigmoid
function respectively. The architecture of a LSTM cell is shown in Fig. 2.
   
i σ
    ( )
 f   σ  ht−1
   W
 o = σ  +b (3.1)
    xt
g tanh

ct = f · ct−1 + i · g (3.2)
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 605

ht = o · tanh (ct ) (3.3)

Considering the signiﬁcant advantages of LSTM in the implementation of long-term time series

Figure 2. Structure of LSTM Cell.

prediction, the LSTM model is used to predict the ﬂuctuation of standard cost. The training
process of LSTM model is shown in Fig. 3. In the training process, according to the size
of time window, the required data is taken from the training set, and the output value is
obtained. The loss value and weight gradient are calculated by loss function to optimize the
network parameters. Repeat the above calculation and update steps until all training data are
completed.

Figure 3. Training process of LSTM network.

606 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

3.2 Support Vector Regression

For the regression problem, given the training sample D = {(x1 , y1 ), (x2 , y2 ), ..., (xn , yn )},
one would hope to obtain a predictive model shaped like f (x) = W T x + b such that f (x) is as
close as possible to y. Unlike other statistical models, Support vector regression (SVR)[30-31]
sets an isolation zone Z with a width of 2ϵ centered on f . Only for those points outside of Z
will its distance to the edge of Z be calculated the loss. The object function J of SVR can be
deﬁned by

1 ∑n
J = min ||W ||2 + C lϵ (f (xi ) − yi ) (3.4)
W,b 2
i=1
where C is the regularization constant and lϵ is the function shown in Fig. 4 ,which is
formulated by
{
0, |z| < ϵ
lϵ (z) = (3.5)
|z| − ϵ, otherwise

Figure 4. ϵ Insensitive Function.

For the SVR model, we use the radial basis function(RBF) as the kernel function.
If there is z = f (x) − y, by introducing slack variables ξi , ξˆi , (3.4) can be rewrote by

1 ∑ n
J = min ||W ||2 + C (ξi + ξˆi )
W,b,ξi ,ξˆi 2 i=1

s.t. f (xi ) − yi ≤ ϵ + (ϵ + ξi ),
yi − f (xi ) ≤ ϵ + ξˆi ,
ξi ≥ 0, ξˆi ≥ 0, i = 1, ..., n (3.6)
By introducing Lagrange multipliers µi , µ̂i , αi , α̂i , the Lagrange function can be formulated
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 607

J(W, b, αi , α̂i , ξi , ξˆi , µi , µ̂i ) =

1 ∑n ∑n ∑n
||W ||2 + C (ξi + ξî ) − µi ξi − µ̂i ξî
2 i=1 i=1 i=1
∑
n ∑
n
+ αi (f (xi ) − yi − ϵ − ξ) + α̂i (f (xi ) − yi − ϵ − ξî ) (3.7)
i=1 i=1
Supposed f (x) = W T x + b and gradient of J be zero:
∑n
W = (α̂i − αi )xi
i=1
∑
n
0= (α̂i − αi )
i=1

C = αi + µi
C = α̂i + µ̂i (3.8)
Then we have the dual form of the SVR problem (This needs to satisfy KKT conditions):

1 ∑∑
n n
min (αi − α̂i )(αj − αˆj )xTi xj
α,α̂ 2 i j=1
∑
n
+ (αi − α̂i )yi + ϵ(αi + α̂i )
i=1
∑
n
s.t. (αi − αi2 ) = 0, 0 ≤ αi , αi2 ≤ C (3.9)
i=1
And the solution of (3.9) is
∑
n
f (x) = (α̂i − αi )xT x + b (3.10)
i=1
where
∑
n
b = yi + ϵ − (α̂i − αi )xT x (3.11)
i=1
Also, if considering the feature mapping form, the SVR problem can be expressed as
∑
n
f (x) = (α̂i − αi )k(x, xi ) + b (3.12)
i=1
where k(xi , xj ) = ϕ(xi )T ϕ(xj ) is the kernel function.

§4 Experiment

The LSTM model in this paper is based on tensor-ﬂow deep learning framework. SVR is
trained by SVR function in scikit-learn, and GPU acceleration is used in training. Experiment
608 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

conﬁguration: the experiment is carried out under the Ubuntu 18.04 system. The CPU of the
host is Intel i7-7829hk, the graphics card is gtx1080, and the memory is 32GB.

4.1 Data
The standard cost data for marketing, overhaul, operations and maintenance are provided
by State Grid Jinhua Power Supply Company. They include the annual marketing costs for 76
municipalities in Zhejiang province from 2014 to 2019, and provide the official predicted costs
for 2019 based on experience and data from 2014 to 2018. Due to the similar distributions
for the source domain, we select an open source stock data with its everyday open price, close
price, highest price and volumes from 1990 to 2015.
There are two main reasons for choosing stock data as the source domain, which called
leverage effects and heteroscedasticity. Leverage effect refers to the phenomenon that when
a certain financial variable changes in a small range, another related variable will change in
a larger range. In the stock and other financial markets, it refers to the asymmetry of the
influence of information on the variation. Generally, good news always has less impact on the
market than bad news. When the stock price falls, the company’s net shareholder’s equity
will decrease, but the debt situation will not change, so it will trigger a further decline in the
company’s stock price.
Heteroscedasticity is a statistical concept, which means that the variation of random error
of a random variable is variable, and the variation can be variance or other measurement of
dispersion degree. The stock index is independent of time, so we can use the data at any time
in history to predict the future value of the index, but the actual situation is not the case. The
future trend of a stock is related to the value of the recent period, but it is more affected by the
performance of the enterprise, good news and relevant policies, which are often divorced from
the data itself. At this time, the variance changes with time, and even is completely random
and uncontrollable. The annual change of marketing cost data we want to deal with depends
largely on the change of policy, and its distribution mode also changes with time, so it has
heteroscedasticity.

4.2 Training Procedures

4.2.1 Data Preprocessing

We grouped every six-day closing price of the stock and treated the ﬁrst ﬁve days as x and
the sixth day as y. The last 20 sets of data are then used as the test set, and the remaining data
are normalized and divided into the training set and validation set at a ratio of 9:1. Finally,
the test set and power cost data are normalized based on the expectation and variance of the
previous normalization.
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 609

Figure 5. Proposed LSTM Model.

4.2.2 Model Architecture

For the LSTM model, there are 4 layers in total: one input, two LSTMs and one dense
layer with dimension of 1, which are shown in Fig. 5. The time steps are set to 5, and Mean
Square Errors(MSE) are selected as the loss function, with an Adam optimizer. Besides, we
use a self-decaying learning rate, which halves after every 20 epoches during which there is no
improvement in loss of validation set.
For the SVR model, we use the radial basis function(RBF) as the kernel function.Use grid
search to set the hype parameters.

4.2.3 Training

We use the preprocessed training set to train the LSTM model and save all the parameters.
Then the cost data are fed to ﬁne-tune the network.

4.3 Result
In order to reﬂect the performance of the prediction model more intuitively, the performance
of each method is evaluated by calculating MAPE. The prediction models of ABC, LR are
compared with the methods in this paper. The results are shown in Table 3. The corresponding
stock forecast results are shown in Fig. 6 and Fig. 7. Through the comparison of index values,
it can be seen that the proposed method is relatively good in various indicators, which greatly
reduces the budget error issued by the measurement method in the past. The unconventional
power cost prediction model based on transfer learning proposed in this paper provides a certain
610 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

reference for future ﬁnancial budget planning of power companies.

(a) (b)

Figure 6. Stock forecast results based on(a) LSTM and (b) SVR.

Table 3. Comparison of algorithm prediction results.

LSTM unit Reduced Factor MAPE(%) Total MAPE(%)

ABC 39.52 38
LR 32.83 31.96
128 0.2 21.63 22.4
LSTM 64 0.2 23.3 17.27
64 0.2 22.46 18.24

C Epsilon MAPE Total MAPE

SVR 5 0.02 26.37 13.82

5 0.01 23.3 20.65
5 0.011 23.45 19.64

4.4 Analysis
According to the calculation logic of the ‘four typical’(typical assets, typical equipment,
typical projects and typical operations) of State Grid headquarters, the actual situation of
various regions in Zhejiang Province is operationalized and detailed. The number of types is
more than that of State Grid, and the calculation results of Zhejiang Province are generally
higher than those of State Grid. At the same time, the State Grid standards have changed
every year, and with the emergence of new technology and new business in the process of power
system operation, part of the original quota is no longer applicable, and the cost standard quota
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 611

Figure 7. Stock loss function.

of new business and new activity has not been determined. In addition, policy changes, industry
environment and other reasons, the power cost of Zhejiang power grid ﬂuctuates greatly. This is
an important reason for the abnormal data. Therefore, it is diﬃcult to predict the cost. In the
past, the State Grid issued the next year’s calculation by means of measurement method, which
not only has large error, but also lacks theoretical basis. Based on the traditional prediction
methods, this paper attempts to use the deep neural network to complete the prediction task,
the results are good, but still need to be further improved, the ideas of this paper point out the
direction for future research.

§5 Conclusion

This paper proposes a deep neural network prediction model based on transfer learning
under the small sample in State Grid, which can effectively learn the source domain feature
space, and transfer it to the prediction model. Through fine tuning, the prediction accuracy
of the model is significantly improved. The experimental results verify the effectiveness and
superiority of the power cost prediction. Then how to optimize and formulate the prediction
model according to the actual business and policy factors of State Grid is our future research
direction.
For transfer learning, there are some possible research directions in the future. Firstly, there
exist no in-depth research on the measurement of domain similarity and commonality, so it is
particularly important to study accurate measurement methods. Secondly, in terms of algorithm
research, different applications have different requirements for algorithm. At present, many
research work mainly focuses on classification algorithm of transfer learning. Other application
algorithms need further research, such as emotion classification, reinforcement learning, sorting
learning, measurement learning, etc. And then, the theoretical research on the effectiveness of
612 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

transfer learning algorithm is still thirsty. It is also one of the directions to study the conditions
of transferable learning, obtain positive transfer and avoid negative transfer. Finally, in the
big data environment, it is particularly important to study eﬃcient transfer learning algorithm
which should be aimed at the practical application data to comply with the current research
wave of big data mining.

References
[1] T M Mitchell. Machine Learning, McGraw-Hill, 2003.

[2] S J Pan, Q Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data
Engineering, 2010, 22(10): 1345-1359.

[3] L X Duan, I W Tsang, D Xu. Domain Transfer Multiple Kernel Learning, IEEE Transactions
on Pattern Analysis Machine Intelligence, 2012, 34(99): 465-479.

[4] W T Tu , S L Sun. A subject transfer framework for EEG classiﬁcation, Neurocomputing, 2012,
82: 109-116.

[5] H Daumé, D Marcu. Domain adaptation for statistical classiﬁers, Journal of Artiﬁcial Intelligence
Research, 2006, 26(1): 101-126.

[6] S Biekel, M Brückner, T Scheﬁer. Discriminative learning for diﬀering training and test distri-
butions, In: Proceedings of the 24th International Conference on Machine Learning, New York,
USA: ACM, 2007, 81-88.

[7] S Biekel, C Sawade, T Scheﬁer. Transfer learning by distribution matching for targeted advertis-
ing, In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems,
Cambridge: MIT Press, 2009, 145-152.

[8] P C Wu, T G Dietterich. Improving SVM accuracy by training on auxiliary data sources, In:
Proceedings of the 21st International Conference on Machine Learning (ICML), New York, USA:
ACM, 2004, 110-117.

[9] W Y Dai, Q Yang, G R Xue, et al. Boosting for transfer learning, In: Proceedings of the 24th
International Conference on Machine Learning (ICML), New York, USA: ACM, 2007, 193-200.

[10] B Quanz, J Huan. Large margin transductive transfer learning, In: Proceedings of the 18th ACM
Conference on Information and Knowledge Management (CIKM), New York, USA: ACM, 2009,
1327-1336.

[11] Z J Xu, S L Sun. Multi-view transfer learning with Adaboost, In: Proceedings of the 23rd
Conference on Tools with Artiﬁcial Intelligence, Boca Raton, FL: IEEE, 2011, 399-402.

[12] Z J Xu, S L Sun. Multi-source transfer learning with Multi-view Adaboost, Neural Information
Processing, 2012, 7665: 332-339.

[13] M M Chen, K Q Weinberger, J Blitzer. Co-training for domain adaptation, In: Proceedings of
the 25th Conference on Neural Information Processing Systems (NIPS), 2011, 2456-2464.
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 613

[14] Y Z Jiang, Z H Deng, S T Wang. Mamdani-Larsen Type Transfer Learning Fuzzy System, Acta
Automatica Sinica, 2012, 38(9): 1393-1409.
[15] M Q Zhu, Y H Cheng, M Li, et al. A Hybrid Transfer Algorithm for Reinforcement Learning
Based on Spectral Method, Acta Automatica Sinica, 2012, 38(11): 1765-1776.
[16] W H Jiang, F L Chung. Transfer spectral clustering, In: Proceedings of the 2012 European Con-
ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
(ECML PKDD), Springer, Berlin, Heidelberg, 2012, 789-803.
[17] A Argyriou, C Micchelli, M Pontil, et al. A spectral regularization frame work for multi-task
structure learning, In: Proceedings of Advances in Neural Information Processing Systems (NIPS
2008), Cambridge, MA: MIT Press, 2007, 25-32.
[18] S J Pan, J Kwok, Q Yang. Transfer learning via dimensionality reduction, In: Proceedings of
the 23rd International Conference on Artificial Intelligence, California, USA: AAAI Press, 2008,
677-682.
[19] S J Pan, X C Ni, J T Sun, et al. Cross-domain sentiment classification via spectral feature
alignment, In: Proceedings of the 19th International Conference on World Wide Web, New York,
USA: ACM, 2010, 751-760.
[20] W T Tu, S L Sun. Transferable discriminative dimensionality reduction, In: Proceedings of the
23rd IEEE International Conference on Tools with Artificial Intelligence (CTAI), Boca Raton,
FL: IEEE, 2011, 865-868.
[21] X Gao, X M Wang, X L Li, et al. Transfer latent variable model based on divergence analysis,
Pattern Recognition, 2011, 44(10): 2358-2366.
[22] X B Cao, Z Wang, P K Yan, et al. Transfer learning for pedestrian detection, Neurocomputing,
2013, 100: 51-57.
[23] N D Lawrence, J C Platt. Learning to learn with the informative vector machine, In Proceedings
of the 21th International Conference on Machine learning, 2004, 65-73.
[24] T Evgeniou, M Pontil. Regularized multi-task learning, In Proceedings of the 10th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 2004, 109-117.
[25] L Zhang, D Zhang. Domain adaptation transfer extreme learning machines, In: J Cao, K Mao, E
Cambria, Z Man, KA Toh, (eds), Proceedings of ELM-2014 Volume 1, Proceedings in Adaptation,
Learning and Optimization, Springer, Cham, 2015, 3: 103-119.
[26] J Davis, P M Domingos. Deep transfer via second-order markov logic, In Proceedings of the 26th
Annual International Conference on Machine Learning, 2009, 217-224.
[27] J Schmidhuber. Deep learning in neural networks: An overview, Neural Networks, 2015, 61:
85-117.
[28] S Hochreiter, J Schmidhuber. Long Short-Term Memory, Neural Computation, 1997, 9(8): 1735-
1780.
[29] Y T Wu, M Yuan, S P Dong, et al. Remaining useful life estimation of engineered systems using
vanilla LSTM neural networks, Neurocomputing, 2018, 275: 167-179.
614 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

[30] A Smola, B Scholköpf. A tutorial on support vector regression, Statistics and Computing, 2004,
14(3): 199-222.
[31] X J Zhou, T Jiang. Enhancing Least Square Support Vector Regression with Gradient Informa-
tion, Neural Processing Letters, 2016, 43: 65-83.

1
State Grid Zhejiang Electric Power Company Jinhua Power Supply Company, Jinhua 321000, China.
2
State Grid Zhejiang Electric Power Company, Hangzhou 310018, China.
Email: [email protected]

Annotated Lesson Plans For E-Portfolio Task
100% (1)
Annotated Lesson Plans For E-Portfolio Task
7 pages
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
Facilitating Reflection: A Manual For Leaders and Educators
100% (1)
Facilitating Reflection: A Manual For Leaders and Educators
51 pages
4.2 Leadership Styles Questionnaire
0% (1)
4.2 Leadership Styles Questionnaire
2 pages
Eapp Week 7 Writing A Critique Paper
100% (5)
Eapp Week 7 Writing A Critique Paper
17 pages
ML IN POWER SYSTEM ReserachProposalPresentation
No ratings yet
ML IN POWER SYSTEM ReserachProposalPresentation
8 pages
Power System Analysis
No ratings yet
Power System Analysis
11 pages
Electrical Energy Conversion and Transport: An Interactive Computer-Based Approach
From Everand
Electrical Energy Conversion and Transport: An Interactive Computer-Based Approach
George G. Karady
No ratings yet
DrRKNayak Article IEEE AnOverviewofDeepLearninginSmartGrids
No ratings yet
DrRKNayak Article IEEE AnOverviewofDeepLearninginSmartGrids
5 pages
Sdarticle 8
No ratings yet
Sdarticle 8
1 page
Flexible Transmission Network Expansion Planning B
No ratings yet
Flexible Transmission Network Expansion Planning B
21 pages
Supporting Future Electrical Utilities - Infotech
No ratings yet
Supporting Future Electrical Utilities - Infotech
6 pages
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Data Mining in Smart Grids
No ratings yet
Data Mining in Smart Grids
118 pages
Application of Gray-Fuzzy-Markov Chain Method For Day-Ahead Electric Load Forecasting
No ratings yet
Application of Gray-Fuzzy-Markov Chain Method For Day-Ahead Electric Load Forecasting
10 pages
3 - Alberto Diez Oliva 1 1
No ratings yet
3 - Alberto Diez Oliva 1 1
134 pages
Elktermprojectgr 15
No ratings yet
Elktermprojectgr 15
28 pages
Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis 1
No ratings yet
Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis 1
182 pages
Etasr 8304
No ratings yet
Etasr 8304
6 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Machine Learning Methods For Attack Detection in The Smart Grid Final
No ratings yet
Machine Learning Methods For Attack Detection in The Smart Grid Final
66 pages
Articulo en Ingles Sobre IA
No ratings yet
Articulo en Ingles Sobre IA
43 pages
0d898b209ff941fbf1a400ac79733bff
No ratings yet
0d898b209ff941fbf1a400ac79733bff
6 pages
Deep Machine Learning Model Based Cyber Attacks Detection
No ratings yet
Deep Machine Learning Model Based Cyber Attacks Detection
16 pages
Iot MM 1
No ratings yet
Iot MM 1
22 pages
A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions
No ratings yet
A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions
12 pages
REFERENCE
No ratings yet
REFERENCE
4 pages
UGC List of Approved Journals
No ratings yet
UGC List of Approved Journals
4 pages
Cognitive Computing and Big Data Analytics
From Everand
Cognitive Computing and Big Data Analytics
Judith S. Hurwitz
No ratings yet
Ieeeaccess
No ratings yet
Ieeeaccess
35 pages
Energies 16 05477
No ratings yet
Energies 16 05477
3 pages
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Artigo - IJDR - 23097 - DATA MINING APPLIED TO ABNORMALITY PREDICTION IN ELECTRICAL
No ratings yet
Artigo - IJDR - 23097 - DATA MINING APPLIED TO ABNORMALITY PREDICTION IN ELECTRICAL
5 pages
Machine Learning Tree
No ratings yet
Machine Learning Tree
13 pages
Effects of Multi-Objective Genetic Rule Selection On Short-Term Load Forecasting For Anomalous Days
No ratings yet
Effects of Multi-Objective Genetic Rule Selection On Short-Term Load Forecasting For Anomalous Days
7 pages
Niversity of Ranada: Department of Computer Science and Artificial Intelligence
No ratings yet
Niversity of Ranada: Department of Computer Science and Artificial Intelligence
360 pages
CNN - and - GRU - Based - Deep - Neural - Network - For - Electricity - Theft - Detection - To - Secure - Smart - Grid Dataset
No ratings yet
CNN - and - GRU - Based - Deep - Neural - Network - For - Electricity - Theft - Detection - To - Secure - Smart - Grid Dataset
5 pages
Artificial Neural Network Based Load Forecasting
No ratings yet
Artificial Neural Network Based Load Forecasting
8 pages
Load Forecasting With The Aid of Neuro-Fuzzy Modelling
No ratings yet
Load Forecasting With The Aid of Neuro-Fuzzy Modelling
5 pages
Applsci 14 09590
No ratings yet
Applsci 14 09590
16 pages
Mathematical Problems in Engineering
No ratings yet
Mathematical Problems in Engineering
2 pages
Voltage Stability Prediction Using Active Machine Learning
No ratings yet
Voltage Stability Prediction Using Active Machine Learning
8 pages
Applied Sciences
No ratings yet
Applied Sciences
34 pages
Predicting The Highway Costs Index With Machine Learning
No ratings yet
Predicting The Highway Costs Index With Machine Learning
27 pages
Data Infrastructure For Machine Learning
No ratings yet
Data Infrastructure For Machine Learning
5 pages
ADVANCING DATA CENTER SUSTAINABILITY: Through Artificial Intelligence and Renewable Biological Power Sources
From Everand
ADVANCING DATA CENTER SUSTAINABILITY: Through Artificial Intelligence and Renewable Biological Power Sources
Alberto De Miranda
No ratings yet
Transfer Learning For Improving Model Predictions
No ratings yet
Transfer Learning For Improving Model Predictions
12 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Energy Management Systems: Design and Implementation: Definitive Reference for Developers and Engineers
From Everand
Energy Management Systems: Design and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Miguel Angel Abad Arranz
No ratings yet
Miguel Angel Abad Arranz
172 pages
Generation Expansion Planning Strategies On Power System: A Review
No ratings yet
Generation Expansion Planning Strategies On Power System: A Review
4 pages
An Adaptive Deep Learning Load Forecasting Framework by - 2023 - Advances in Ap
No ratings yet
An Adaptive Deep Learning Load Forecasting Framework by - 2023 - Advances in Ap
15 pages
Recent Advances in Electrical Engineering: Applications Oriented
From Everand
Recent Advances in Electrical Engineering: Applications Oriented
SUMAN DEBNATH
No ratings yet
A Survey of Big Data and Machine Learning
No ratings yet
A Survey of Big Data and Machine Learning
6 pages
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
From Everand
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
Ken Kwong-Kay Wong
3/5 (1)
Design and Application of Smart Vision Sensor Usin PDF
No ratings yet
Design and Application of Smart Vision Sensor Usin PDF
13 pages
166-Application IE Method in Modern Agro-Ecological Park Planning
No ratings yet
166-Application IE Method in Modern Agro-Ecological Park Planning
1,586 pages
Advanced Machine Learning Applications in Big Data Analytics
No ratings yet
Advanced Machine Learning Applications in Big Data Analytics
656 pages
Energies 17 05385
No ratings yet
Energies 17 05385
35 pages
Machine Learning For The New York City Power Grid: Citation
No ratings yet
Machine Learning For The New York City Power Grid: Citation
21 pages
Application of Big Data Analytics in The Electrical Sector A Real Case Study
No ratings yet
Application of Big Data Analytics in The Electrical Sector A Real Case Study
6 pages
Power System Load Forecasting Based On Fuzzy Clustering and Gray Target Theory
No ratings yet
Power System Load Forecasting Based On Fuzzy Clustering and Gray Target Theory
8 pages
Big Data Analytics in Smart Grids
No ratings yet
Big Data Analytics in Smart Grids
5 pages
Machine Learning For Energy Systems Optimization
No ratings yet
Machine Learning For Energy Systems Optimization
8 pages
Purposive Communication
No ratings yet
Purposive Communication
106 pages
Fastest Internet Provider of Our Country
No ratings yet
Fastest Internet Provider of Our Country
14 pages
Effective Daycare Kindergarten Interventions To Prevent Chronic Aggression
No ratings yet
Effective Daycare Kindergarten Interventions To Prevent Chronic Aggression
6 pages
Course Title: Human Computer Interaction: Course Code: CMP-3711 Course Structure: Lectures: 3/labs: 0
No ratings yet
Course Title: Human Computer Interaction: Course Code: CMP-3711 Course Structure: Lectures: 3/labs: 0
2 pages
COURSE File MANAGEMENT Principles and Applications
No ratings yet
COURSE File MANAGEMENT Principles and Applications
5 pages
College Preparation
No ratings yet
College Preparation
17 pages
Article 3
No ratings yet
Article 3
18 pages
Communication Skill in English
No ratings yet
Communication Skill in English
3 pages
Civitci and Civitci (2015)
No ratings yet
Civitci and Civitci (2015)
8 pages
Field Study 1-Sep 17-Final
No ratings yet
Field Study 1-Sep 17-Final
3 pages
Effective Communication Skill PPT at Bec Doms Mba
100% (4)
Effective Communication Skill PPT at Bec Doms Mba
16 pages
A Study of Level of Educational Aspiration of The Children of Working and Non-Working Mothers
No ratings yet
A Study of Level of Educational Aspiration of The Children of Working and Non-Working Mothers
7 pages
Formulation of Research Framework
100% (1)
Formulation of Research Framework
34 pages
Proverb Project Rubric
No ratings yet
Proverb Project Rubric
2 pages
Mental Health of Teachers
No ratings yet
Mental Health of Teachers
13 pages
Mcculloch-Pitts Neural Model and Pattern Classification.
No ratings yet
Mcculloch-Pitts Neural Model and Pattern Classification.
13 pages
Training Regulation Driving
No ratings yet
Training Regulation Driving
3 pages
Educational Manifesto 220
No ratings yet
Educational Manifesto 220
8 pages
Lesson Plan: Susan B. Anthony
No ratings yet
Lesson Plan: Susan B. Anthony
4 pages
Demo Als
No ratings yet
Demo Als
3 pages
Thesis For Final
100% (1)
Thesis For Final
77 pages
Woro Widiastuti
No ratings yet
Woro Widiastuti
42 pages
GAGE Aesthetic+Theory+Introduction PDF
No ratings yet
GAGE Aesthetic+Theory+Introduction PDF
9 pages
Eng 1 Honors - Argumentative Essay
No ratings yet
Eng 1 Honors - Argumentative Essay
4 pages
Module Five: Job Design
No ratings yet
Module Five: Job Design
63 pages
Genpact - Job Description - One Data and AI - B Tech Circuit and MCA (3) 2
No ratings yet
Genpact - Job Description - One Data and AI - B Tech Circuit and MCA (3) 2
4 pages

Mathematical Methods For Maintenance and Operation Cost Prediction Based On Transfer Learning in State Grid

Uploaded by

Mathematical Methods For Maintenance and Operation Cost Prediction Based On Transfer Learning in State Grid

Uploaded by

Appl. Math. J. Chinese Univ.

2022, 37(4): 598-614

Mathematical methods for maintenance and operation

GUO Yun-peng1 WANG Dong-fa2,∗

2.1 Transfer Learning

Different Tasks Source Tasks Target Tasks

Learning system Learning system Learning system Knowlege Learning system

2.2 A Categorization of Transfer Learning

Table 1. Diﬀerent Methods of Transfer Learning.

Transfer Methods Brief description

§3 Guiding Philosophy and Methods

Table 2. Framework of Transfer Learning.

The Framework of Prediction Based on Trasfer Learning

3.1 Long Short-Term Memory

ht = o · tanh (ct ) (3.3)

Figure 2. Structure of LSTM Cell.

Figure 3. Training process of LSTM network.

3.2 Support Vector Regression

Figure 4. ϵ Insensitive Function.

J(W, b, αi , α̂i , ξi , ξˆi , µi , µ̂i ) =

4.2 Training Procedures

4.2.1 Data Preprocessing

Figure 5. Proposed LSTM Model.

4.2.2 Model Architecture

reference for future ﬁnancial budget planning of power companies.

Table 3. Comparison of algorithm prediction results.

LSTM unit Reduced Factor MAPE(%) Total MAPE(%)

C Epsilon MAPE Total MAPE

SVR 5 0.02 26.37 13.82

Figure 7. Stock loss function.

You might also like