0% found this document useful (0 votes)
3 views17 pages

Mathematical Methods For Maintenance and Operation Cost Prediction Based On Transfer Learning in State Grid

This paper presents a model for predicting maintenance and operation costs in the State Grid using transfer learning and artificial intelligence techniques. By leveraging historical data and employing methods like time series analysis and support vector regression, the model demonstrates improved accuracy over traditional prediction methods, reducing the average absolute error by 10%. The research highlights the significance of effective data utilization in enhancing cost management and operational efficiency within the electric power industry.

Uploaded by

salamsherifftobi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

Mathematical Methods For Maintenance and Operation Cost Prediction Based On Transfer Learning in State Grid

This paper presents a model for predicting maintenance and operation costs in the State Grid using transfer learning and artificial intelligence techniques. By leveraging historical data and employing methods like time series analysis and support vector regression, the model demonstrates improved accuracy over traditional prediction methods, reducing the average absolute error by 10%. The research highlights the significance of effective data utilization in enhancing cost management and operational efficiency within the electric power industry.

Uploaded by

salamsherifftobi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Appl. Math. J. Chinese Univ.

2022, 37(4): 598-614

Mathematical methods for maintenance and operation


cost prediction based on transfer learning in State Grid

GUO Yun-peng1 WANG Dong-fa2,∗


ZHENG Ying1 DING Wei-bin2

Abstract. The electric power enterprise is an important basic energy industry for national
development, and it is also the first basic industry of the national economy. With the continuous
expansion of State Grid, the progressively complex operating conditions, and the increasing
scope and frequency of data collection, how to make reasonable use of electrical big data, improve
utilization, and provide a theoretical basis for the reliability of State Grid operation, has become
a new research hot spot. Since electrical data has the characteristics of large volume, multiple
types, low-value density, and fast processing speed, it is a challenge to mine and analyze it
deeply, extract valuable information efficiently, and serve for the actual problem. According to
the features of these data, this paper uses artificial intelligence methods such as time series and
support vector regression to establish a data mining network model for standard cost prediction
through transfer learning. The experimental results show that the model in this paper obtains
better prediction results on a small sample data set, which verifies the feasibility of the deep
transfer model. Compared with activity-based costing and the traditional prediction method,
the average absolute error of the proposed method is reduced by 10%, which is effective and
superior.

§1 Introduction

Smart grid has become a new strategy of global energy in the 21st century. At present, many
scientific research institutions and enterprises in China have actively carried out the research
and pilot of smart grid technology. Since the market-oriented reform of China’s electric power
system, the electric power enterprise has actively adapted to the market-oriented reform, and
Received: 2020-11-7. Revised: 2020-12-04.
MR Subject Classification: 91B82.
Keywords: transfer learning, LSTM, support vector regression, activity based costing, State Grid.
Digital Object Identifier(DOI): https://doi.org/10.1007/s11766-022-4319-7.
Supported by the program of science and technology of State Grid Zhejiang Electric Power Co., Ltd., named
Research and application project of standard cost activity based on machine learning(5211JH1900LZ).
∗ Corresponding author.
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 599

explored the construction of a cost control system in line with their own reality, to a certain
extent, promoted the positive development of the electric power industry. The State Grid of
China has put forward the demand of implementing cost standardization and lean management
in 2006. After continuous exploration and in-depth reform, it has gradually built up a set of cost
management system including production, sales, management and other operation standards,
and applied the standard operation in the annual budget preparation process.

The social responsibility of State Grid is to provide high-quality and low-cost power services.
Its core business mainly includes power grid construction, grid maintenance and operation,
procurement and sales, and at the same time, it pursues the maximization of benefits. At
present, however, there are still three problems: first, the level of refinement needs to be
improved; second, the degree of intensification is not high. The development mainly depends on
high investment and lacks a scientific standard system; third, the operation mechanism needs to
be perfect. Therefore, the implementation of standard cost management is of great significance.
It can promote cost lean management, predict cost demand and budget, control expenditure,
analyze cost differences, and optimize operation activities. And then, it also has contributed
to the organic integration of business and finance, so as to control the entire production and
operation management activities, cost arrangement and capital demand. Finally, it improves
the accuracy of budget, so that enterprises can comprehensively grasp the development trend
of enterprises and understand all kinds of business information in detail. Therefore, it can
eliminate the unreasonable factors in historical cost through standardization management and
ensure that the level of cost consumption is fair, scientific and reasonable.

In recent years, with the rapid development of computer information and storage technology,
the mining technology based on big data is increasingly mature, and the value of data is
gradually discovered and utilized. Many industries have entered the era , which has opened
the age of big data. The State Grid has the characteristics of large scale, many devices, wide
coverage area, long running time, complex operating conditions, including power generation,
transmission, substation, distribution and consumption, etc., which will produce a large number
of types of rich data in the process, and the data fully conforms to the characteristics of big
data. It is of great research significance to extract the hidden information from data and use it
to solve key problems in real life. Based on the 2014-2019 historical standard cost data of State
Grid Jinhua Power Supply Company in Zhejiang Province, the research topic is to master the
cost law of the system by mining and analyzing the data over the years, determine the cause
of each cost and its influencing factors, establish and improve the function model between cost
driver and cost, and predict the cost of each module in 2021. The data is based on the year
and does not involve quarterly or monthly data, and some regional data are difficult to obtain
and default. The data correlation of its target is not obvious and it is hard to fit by simple
mathematical models. Therefore, it is necessary to extract the underlying hidden features and
learn more accurate non-linear models.
600 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

Aiming at the problem of non-conventional power data bundle, the network training is
difficult to converge, and the effect is not satisfactory. This paper introduces transfer learning,
facing the prediction task of cross domain data. Based on the theoretical research of time
series prediction and support vector regression, it proposes a non-conventional prediction model,
which can make the neural network model reusable. Compared with the direct training of small
sample data set, these features have higher discrimination and robustness, which greatly solves
the problem of over fitting caused by too few samples. It can efficiently use the historical data
of State Gird, mine its useful information, serve for the target area, and establish a generalized
model. This paper mainly focuses on maintenance and operation costs, the experimental results
show that the method model proposed in this paper has a good prediction effect on the small
sample data set, and has effectiveness and superiority compared with the traditional prediction
model.
The rest of the paper is organized as follows. In the next four sections, we first briefly review
some previous work regarding transfer learning. After that, in Section 3, we formulate the idea
of two algorithms: long shot-term memory network and support vector regression. And then
we propose our model architecture. The simulation and experiment are carried out in Section
4, where corresponding experimental results are reported and discussions are provided. Finally,
a summary of our work is displayed in Section 5.

§2 Related work

Machine Learning [1] has made dramatic advances in theory and practice and has become
one of the major technical cornerstones of Big Data analysis. The core idea is to train machines
by simulating the human brain, to learn, and to distinguish. Traditional machine learning
methods usually assume that training data and test data follow the same distribution. However,
the above hypotheses are often too strict to be satisfied in the fields of computer science and
natural information processing. ‘Big data’ and ‘small data’ are contradictory and unified. There
are large areas, there are small areas, which requires people to be able to analyze both big data
and small data, but this is not always easy. In the development of deep learning, people expect
that machine learning can no longer be limited to large samples, supervised learning, and hope
to realize unsupervised, few-shot learning, or even zero-shot learning through some methods.
How to analyze and mine large-scale data in non-stationary environment is one of the most
challenging frontiers of modern machine learning.

2.1 Transfer Learning


Compared with the existing machine learning mechanism, human learning is quite different.
Human usually can learn well on a large number of training samples, or even complete the
learning of specific goals through other auxiliary information related to the learning goal in
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 601

a small or no sample situation. Human learning has the ability to transfer and transform
knowledge between different fields and problems, which machine learning lacks. To solve the
problem of data and knowledge scarcity, researchers propose transfer learning [2-4], or inductive
transfer, domain adaptation, is an important research problem in machine learning. The goal is
to apply knowledge or structure learned from a certain field or task to different but related fields
or problems. Transfer learning attempts to realize the capability of human learning by analogy,
relaxing the constraint that training data and test data must obey independent and identically
distributed, thus it can mine domain invariant essential features and structures between two
different but interrelated domains, enabling labeled data and other supervised information to
be transferred and reused between domains.
In 2010, Pan and Yang [2] organized and summarized the transfer learning research and
published an acknowledged and most representative review of transfer learning, in which a
specific definition of transfer learning, the basic principles of The schematic is shown in Fig. 1.

Different Tasks Source Tasks Target Tasks

Transfer Learning

Learning system Learning system Learning system Knowlege Learning system

(a) (b)

Figure 1. Different between (a) Traditional machine learning and (b) Transfer Learning.

Definition (Transfer learning) Given a source domain Ds = {(xs1 , y1s ), ..., (xsn , yns )} and
learning tasks Ts with no target domainsDt = {(xt1 , y1t ), ..., (xtn , ynt )} and learning tasks Tt , the
goal of transfer learning is to reduce the generalization error of the target domain and improve
the learning of the target predictive function ft (∗) underDs ̸= Dt or Ts ̸= Tt .
In simple terms, transfer learning is to apply the knowledge acquired when solving one
problem to solving another different but related problem, with the aim of getting better learning
results in the new task. It is similar to standing on the shoulders of giants, being able to use big
data trained on computationally powerful devices with good model information to solve their
own tasks and improve network generalization capabilities. At the same time, a generalized
model can be built to establish a foundation for the individual requirements of multi-tasking,
and flexibly respond to different tasks to meet the needs of practical applications.
602 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

2.2 A Categorization of Transfer Learning


From ‘what to transfer’, transfer learning can be divided into four categories: instance-
based, feature-based, model-based and relation-based transfer learning. Table 1 shows these
four categories and brief description.

Table 1. Different Methods of Transfer Learning.

Transfer Methods Brief description


Instance Transfer Re-weight the data from the source domain and apply it to the
target domain.
Feature Transfer Find a feature representation which can reduce the difference
between source domain and target domain data and reduce the
errors of classification and regression model.
Model Transfer Find the parameters or prior knowledge shared between the
models of source domain and target domain.
Relation Transfer Build the mapping between the related knowledge of source do-
main and target domain. It does not require the data between
the two domains to be independent and identically distributed.

Instance-based transfer learning assumes that part of the data in the source domain can be
reused in the target domain through weight reuse. That is to judge which data in the source
domain is similar to the target domain through some measurement, and give it a high weight
value [5-7]. The representative work includes: the LP-SVM method proposed by Wu et al. [8],
which improves the classification performance of learning machines by enhancing the training of
auxiliary data; Dai et al. [9] based on the similarity of samples between domains, proposed the
TrAdaBoost algorithm; Quanz et al. [10] introduced a strategy to constrain the mean difference
of the sample distribution between the source and target domains and proposed the LMPROJ
algorithm. In terms of the recently emerged multi-perspective integrated transfer learning, Xu
et al. [11] proposed the multi-perspective Adaboost transfer learning algorithm and Chen et
al. [12] proposed a strategy for multi-source learning; Xu et al. [13] further incorporate both
multi-source and multi-perspective mechanisms in the transfer learning process, and propose
the multi-source and multi-perspective Adaboost transfer learning algorithm, which effectively
avoids the influence of negative transfer and steadily improves the learning effect of transfer
learning; Jiang et al. [14] proposed a transfer algorithm based on the source domain knowl-
edge fuzzy system to address the problem of missing information in the target domain, which
effectively improved the performance of transfer learning.
The feature-based transfer learning approach establishes connections between domain data
from the feature structure, and exploits the intersection at the feature level by transforming
and reconstructing features to discover potential common feature spaces [15-16]. Representative
works include Argyriou et al. [17] proposed a regularization-based spectral framework for
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 603

learning multi-tasking structures; Pan et al. [18-19] who proposed an MMDE algorithm based
on manifold structures by enhancing the mean-center consistency of source and target domain
data in a low-dimensional mapping space to reduce the differences between the two domains. Tu
et al. [20] who proposed the use of domain adaptive algorithm to implement the user transfer-
based dimensionality reduction method , which improves the learning performance to some
extent; Gao et al. [21-22] proposed a kernel space model of latent variables to the pedestrian
detection problem and showed that feature-based transfer strategies can effectively improve the
performance of learning.YingWei et al. [23] proposed a new transfer learning framework called
Learning to Transfer . They analyzed previous transfer learning studies and found that for a
pair of domains, using different transfer learning algorithms would result in different knowledge
transfer.
The model-based and relation-based transfer learning are less concerned. Model based
transfer learning assumes that the target domain and the source domain share some parameters
of the algorithm model or a kind of prior knowledge. By searching for such parameters or prior
knowledge, it completes information and knowledge transfer. In multi task learning, knowledge
sharing between tasks exists in the form of parameter sharing. Lawrence et al. [24] used
Gaussian model to model the prior knowledge between tasks, and learned the shared Gaussian
parameters among tasks, so as to realize the transfer of knowledge between tasks. There are also
algorithms [25] that transfer support vector machine parameters in multiple tasks. In addition,
Zhang et al. [26] designed TELM-SDA and TELM-TDA algorithms through the research on
extreme learning machine.They transfer the knowledge and parameters acquired from training
in the source domain to the target domain, which helps to optimize the target domain model
and realize the knowledge transfer. Transfer learning based on relation assumes that there is
a correlation between the source domain knowledge and the target domain knowledge. For
example, people’s evaluation information on books and people’s interest in movies are just
through these common information to complete the knowledge transfer. When the source
domain and target domain are completely different, such as molecular biology domain and web
page. Davis et al. [27] applied the source domain structure rules represented by second-order
Markov logic to the target domain to achieve the purpose of knowledge transfer.

§3 Guiding Philosophy and Methods

In view of the problem existing in power grid data, this paper uses feature-based learning
to transfer knowledge from source domain to target domain, and tries to build model for target
domain by using source domain information or original network parameters. Considering the
time characteristics of power data, time series is used to predict the future cost, and support
vector regression is also selected as prediction algorithm. In this section, the basic principles
of the two algorithms are described. A hybrid prediction model under small sample data set is
established based on transfer learning. The prediction framework is shown Table 2.
604 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

Table 2. Framework of Transfer Learning.

The Framework of Prediction Based on Trasfer Learning


Step 1: Data preprocessing both Source domain and Target domain
Step 2: Model pre training in Source domain
Step 3: Model adjustment
Step 4: Parameter transfer
Step 5: Model regeneration training in Target domain
Step 6: Fine tuning
Step 7: Target domain model

3.1 Long Short-Term Memory

Based on the traditional artificial neural network, RNN [28] uses a directional loop, which
makes the hidden nodes connected into a ring. This internal structure helps the network to
transfer time information, shows the dynamic time-series behavior, so as to mine the charac-
teristics of time series, and finally make classification or prediction. However, when dealing
with long time series, there will be a problem of gradient disappearance, that is, with the
transmission of information in the time dimension, the perception of the later neurons to the
earlier neurons will decrease, and the information will be gradually lost. To solve this problem,
cell units and three control gates are added to the hidden layer. This structure, called Long
Short-Term Memory(LSTM), was proposed in 1997 [29], and has been gradually improved to
the current classic version. It is suitable for dealing with long time series problems.
As a special kind of RNN, LSTM is effective in avoiding gradient vanish and explosion while
possessing a good ability to handle time series data. In a LSTM cell t, the Cell State Ct denotes
the long term memory transmitted into t, while the Hidden State ht denotes the short term
memory. Ct and ht will forget some old memories and save the new ones under the control
of four different gates. Specifically, f is the forget gate which controls how many long term
memories Ct will be forgot. i is the input gate which determines whether new coming data
xt will be kept. The third one g has an influence on the volume of xt kept and the last one
output gate o determines the size of the output. The exact formulas of the parameter update
are shown as (3.1)-(3.3), where W , b and σ are the parameter matrix, bias term and sigmoid
function respectively. The architecture of a LSTM cell is shown in Fig. 2.
   
i σ
    ( )
 f   σ  ht−1
   W
 o = σ  +b (3.1)
    xt
g tanh

ct = f · ct−1 + i · g (3.2)
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 605

ht = o · tanh (ct ) (3.3)


Considering the significant advantages of LSTM in the implementation of long-term time series

Figure 2. Structure of LSTM Cell.

prediction, the LSTM model is used to predict the fluctuation of standard cost. The training
process of LSTM model is shown in Fig. 3. In the training process, according to the size
of time window, the required data is taken from the training set, and the output value is
obtained. The loss value and weight gradient are calculated by loss function to optimize the
network parameters. Repeat the above calculation and update steps until all training data are
completed.

Figure 3. Training process of LSTM network.


606 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

3.2 Support Vector Regression


For the regression problem, given the training sample D = {(x1 , y1 ), (x2 , y2 ), ..., (xn , yn )},
one would hope to obtain a predictive model shaped like f (x) = W T x + b such that f (x) is as
close as possible to y. Unlike other statistical models, Support vector regression (SVR)[30-31]
sets an isolation zone Z with a width of 2ϵ centered on f . Only for those points outside of Z
will its distance to the edge of Z be calculated the loss. The object function J of SVR can be
defined by

1 ∑n
J = min ||W ||2 + C lϵ (f (xi ) − yi ) (3.4)
W,b 2
i=1
where C is the regularization constant and lϵ is the function shown in Fig. 4 ,which is
formulated by
{
0, |z| < ϵ
lϵ (z) = (3.5)
|z| − ϵ, otherwise

Figure 4. ϵ Insensitive Function.

For the SVR model, we use the radial basis function(RBF) as the kernel function.
If there is z = f (x) − y, by introducing slack variables ξi , ξˆi , (3.4) can be rewrote by

1 ∑ n
J = min ||W ||2 + C (ξi + ξˆi )
W,b,ξi ,ξˆi 2 i=1

s.t. f (xi ) − yi ≤ ϵ + (ϵ + ξi ),
yi − f (xi ) ≤ ϵ + ξˆi ,
ξi ≥ 0, ξˆi ≥ 0, i = 1, ..., n (3.6)
By introducing Lagrange multipliers µi , µ̂i , αi , α̂i , the Lagrange function can be formulated
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 607

by

J(W, b, αi , α̂i , ξi , ξˆi , µi , µ̂i ) =


1 ∑n ∑n ∑n
||W ||2 + C (ξi + ξˆi ) − µi ξi − µ̂i ξˆi
2 i=1 i=1 i=1

n ∑
n
+ αi (f (xi ) − yi − ϵ − ξ) + α̂i (f (xi ) − yi − ϵ − ξˆi ) (3.7)
i=1 i=1
Supposed f (x) = W T x + b and gradient of J be zero:
∑n
W = (α̂i − αi )xi
i=1

n
0= (α̂i − αi )
i=1

C = αi + µi
C = α̂i + µ̂i (3.8)
Then we have the dual form of the SVR problem (This needs to satisfy KKT conditions):

1 ∑∑
n n
min (αi − α̂i )(αj − αˆj )xTi xj
α,α̂ 2 i j=1

n
+ (αi − α̂i )yi + ϵ(αi + α̂i )
i=1

n
s.t. (αi − αi2 ) = 0, 0 ≤ αi , αi2 ≤ C (3.9)
i=1
And the solution of (3.9) is

n
f (x) = (α̂i − αi )xT x + b (3.10)
i=1
where

n
b = yi + ϵ − (α̂i − αi )xT x (3.11)
i=1
Also, if considering the feature mapping form, the SVR problem can be expressed as

n
f (x) = (α̂i − αi )k(x, xi ) + b (3.12)
i=1
where k(xi , xj ) = ϕ(xi )T ϕ(xj ) is the kernel function.

§4 Experiment

The LSTM model in this paper is based on tensor-flow deep learning framework. SVR is
trained by SVR function in scikit-learn, and GPU acceleration is used in training. Experiment
608 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

configuration: the experiment is carried out under the Ubuntu 18.04 system. The CPU of the
host is Intel i7-7829hk, the graphics card is gtx1080, and the memory is 32GB.

4.1 Data
The standard cost data for marketing, overhaul, operations and maintenance are provided
by State Grid Jinhua Power Supply Company. They include the annual marketing costs for 76
municipalities in Zhejiang province from 2014 to 2019, and provide the official predicted costs
for 2019 based on experience and data from 2014 to 2018. Due to the similar distributions
for the source domain, we select an open source stock data with its everyday open price, close
price, highest price and volumes from 1990 to 2015.
There are two main reasons for choosing stock data as the source domain, which called
leverage effects and heteroscedasticity. Leverage effect refers to the phenomenon that when
a certain financial variable changes in a small range, another related variable will change in
a larger range. In the stock and other financial markets, it refers to the asymmetry of the
influence of information on the variation. Generally, good news always has less impact on the
market than bad news. When the stock price falls, the company’s net shareholder’s equity
will decrease, but the debt situation will not change, so it will trigger a further decline in the
company’s stock price.
Heteroscedasticity is a statistical concept, which means that the variation of random error
of a random variable is variable, and the variation can be variance or other measurement of
dispersion degree. The stock index is independent of time, so we can use the data at any time
in history to predict the future value of the index, but the actual situation is not the case. The
future trend of a stock is related to the value of the recent period, but it is more affected by the
performance of the enterprise, good news and relevant policies, which are often divorced from
the data itself. At this time, the variance changes with time, and even is completely random
and uncontrollable. The annual change of marketing cost data we want to deal with depends
largely on the change of policy, and its distribution mode also changes with time, so it has
heteroscedasticity.

4.2 Training Procedures

4.2.1 Data Preprocessing

We grouped every six-day closing price of the stock and treated the first five days as x and
the sixth day as y. The last 20 sets of data are then used as the test set, and the remaining data
are normalized and divided into the training set and validation set at a ratio of 9:1. Finally,
the test set and power cost data are normalized based on the expectation and variance of the
previous normalization.
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 609

Figure 5. Proposed LSTM Model.

4.2.2 Model Architecture

For the LSTM model, there are 4 layers in total: one input, two LSTMs and one dense
layer with dimension of 1, which are shown in Fig. 5. The time steps are set to 5, and Mean
Square Errors(MSE) are selected as the loss function, with an Adam optimizer. Besides, we
use a self-decaying learning rate, which halves after every 20 epoches during which there is no
improvement in loss of validation set.
For the SVR model, we use the radial basis function(RBF) as the kernel function.Use grid
search to set the hype parameters.

4.2.3 Training

We use the preprocessed training set to train the LSTM model and save all the parameters.
Then the cost data are fed to fine-tune the network.

4.3 Result
In order to reflect the performance of the prediction model more intuitively, the performance
of each method is evaluated by calculating MAPE. The prediction models of ABC, LR are
compared with the methods in this paper. The results are shown in Table 3. The corresponding
stock forecast results are shown in Fig. 6 and Fig. 7. Through the comparison of index values,
it can be seen that the proposed method is relatively good in various indicators, which greatly
reduces the budget error issued by the measurement method in the past. The unconventional
power cost prediction model based on transfer learning proposed in this paper provides a certain
610 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

reference for future financial budget planning of power companies.

(a) (b)

Figure 6. Stock forecast results based on(a) LSTM and (b) SVR.

Table 3. Comparison of algorithm prediction results.

LSTM unit Reduced Factor MAPE(%) Total MAPE(%)

ABC 39.52 38
LR 32.83 31.96
128 0.2 21.63 22.4
LSTM 64 0.2 23.3 17.27
64 0.2 22.46 18.24

C Epsilon MAPE Total MAPE

SVR 5 0.02 26.37 13.82


5 0.01 23.3 20.65
5 0.011 23.45 19.64

4.4 Analysis
According to the calculation logic of the ‘four typical’(typical assets, typical equipment,
typical projects and typical operations) of State Grid headquarters, the actual situation of
various regions in Zhejiang Province is operationalized and detailed. The number of types is
more than that of State Grid, and the calculation results of Zhejiang Province are generally
higher than those of State Grid. At the same time, the State Grid standards have changed
every year, and with the emergence of new technology and new business in the process of power
system operation, part of the original quota is no longer applicable, and the cost standard quota
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 611

Figure 7. Stock loss function.

of new business and new activity has not been determined. In addition, policy changes, industry
environment and other reasons, the power cost of Zhejiang power grid fluctuates greatly. This is
an important reason for the abnormal data. Therefore, it is difficult to predict the cost. In the
past, the State Grid issued the next year’s calculation by means of measurement method, which
not only has large error, but also lacks theoretical basis. Based on the traditional prediction
methods, this paper attempts to use the deep neural network to complete the prediction task,
the results are good, but still need to be further improved, the ideas of this paper point out the
direction for future research.

§5 Conclusion

This paper proposes a deep neural network prediction model based on transfer learning
under the small sample in State Grid, which can effectively learn the source domain feature
space, and transfer it to the prediction model. Through fine tuning, the prediction accuracy
of the model is significantly improved. The experimental results verify the effectiveness and
superiority of the power cost prediction. Then how to optimize and formulate the prediction
model according to the actual business and policy factors of State Grid is our future research
direction.
For transfer learning, there are some possible research directions in the future. Firstly, there
exist no in-depth research on the measurement of domain similarity and commonality, so it is
particularly important to study accurate measurement methods. Secondly, in terms of algorithm
research, different applications have different requirements for algorithm. At present, many
research work mainly focuses on classification algorithm of transfer learning. Other application
algorithms need further research, such as emotion classification, reinforcement learning, sorting
learning, measurement learning, etc. And then, the theoretical research on the effectiveness of
612 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

transfer learning algorithm is still thirsty. It is also one of the directions to study the conditions
of transferable learning, obtain positive transfer and avoid negative transfer. Finally, in the
big data environment, it is particularly important to study efficient transfer learning algorithm
which should be aimed at the practical application data to comply with the current research
wave of big data mining.

References
[1] T M Mitchell. Machine Learning, McGraw-Hill, 2003.

[2] S J Pan, Q Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data
Engineering, 2010, 22(10): 1345-1359.

[3] L X Duan, I W Tsang, D Xu. Domain Transfer Multiple Kernel Learning, IEEE Transactions
on Pattern Analysis Machine Intelligence, 2012, 34(99): 465-479.

[4] W T Tu , S L Sun. A subject transfer framework for EEG classification, Neurocomputing, 2012,
82: 109-116.

[5] H Daumé, D Marcu. Domain adaptation for statistical classifiers, Journal of Artificial Intelligence
Research, 2006, 26(1): 101-126.

[6] S Biekel, M Brückner, T Schefier. Discriminative learning for differing training and test distri-
butions, In: Proceedings of the 24th International Conference on Machine Learning, New York,
USA: ACM, 2007, 81-88.

[7] S Biekel, C Sawade, T Schefier. Transfer learning by distribution matching for targeted advertis-
ing, In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems,
Cambridge: MIT Press, 2009, 145-152.

[8] P C Wu, T G Dietterich. Improving SVM accuracy by training on auxiliary data sources, In:
Proceedings of the 21st International Conference on Machine Learning (ICML), New York, USA:
ACM, 2004, 110-117.

[9] W Y Dai, Q Yang, G R Xue, et al. Boosting for transfer learning, In: Proceedings of the 24th
International Conference on Machine Learning (ICML), New York, USA: ACM, 2007, 193-200.

[10] B Quanz, J Huan. Large margin transductive transfer learning, In: Proceedings of the 18th ACM
Conference on Information and Knowledge Management (CIKM), New York, USA: ACM, 2009,
1327-1336.

[11] Z J Xu, S L Sun. Multi-view transfer learning with Adaboost, In: Proceedings of the 23rd
Conference on Tools with Artificial Intelligence, Boca Raton, FL: IEEE, 2011, 399-402.

[12] Z J Xu, S L Sun. Multi-source transfer learning with Multi-view Adaboost, Neural Information
Processing, 2012, 7665: 332-339.

[13] M M Chen, K Q Weinberger, J Blitzer. Co-training for domain adaptation, In: Proceedings of
the 25th Conference on Neural Information Processing Systems (NIPS), 2011, 2456-2464.
GUO Yun-peng, et al. Mathematical methods for maintenance and operation cost... 613

[14] Y Z Jiang, Z H Deng, S T Wang. Mamdani-Larsen Type Transfer Learning Fuzzy System, Acta
Automatica Sinica, 2012, 38(9): 1393-1409.
[15] M Q Zhu, Y H Cheng, M Li, et al. A Hybrid Transfer Algorithm for Reinforcement Learning
Based on Spectral Method, Acta Automatica Sinica, 2012, 38(11): 1765-1776.
[16] W H Jiang, F L Chung. Transfer spectral clustering, In: Proceedings of the 2012 European Con-
ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
(ECML PKDD), Springer, Berlin, Heidelberg, 2012, 789-803.
[17] A Argyriou, C Micchelli, M Pontil, et al. A spectral regularization frame work for multi-task
structure learning, In: Proceedings of Advances in Neural Information Processing Systems (NIPS
2008), Cambridge, MA: MIT Press, 2007, 25-32.
[18] S J Pan, J Kwok, Q Yang. Transfer learning via dimensionality reduction, In: Proceedings of
the 23rd International Conference on Artificial Intelligence, California, USA: AAAI Press, 2008,
677-682.
[19] S J Pan, X C Ni, J T Sun, et al. Cross-domain sentiment classification via spectral feature
alignment, In: Proceedings of the 19th International Conference on World Wide Web, New York,
USA: ACM, 2010, 751-760.
[20] W T Tu, S L Sun. Transferable discriminative dimensionality reduction, In: Proceedings of the
23rd IEEE International Conference on Tools with Artificial Intelligence (CTAI), Boca Raton,
FL: IEEE, 2011, 865-868.
[21] X Gao, X M Wang, X L Li, et al. Transfer latent variable model based on divergence analysis,
Pattern Recognition, 2011, 44(10): 2358-2366.
[22] X B Cao, Z Wang, P K Yan, et al. Transfer learning for pedestrian detection, Neurocomputing,
2013, 100: 51-57.
[23] N D Lawrence, J C Platt. Learning to learn with the informative vector machine, In Proceedings
of the 21th International Conference on Machine learning, 2004, 65-73.
[24] T Evgeniou, M Pontil. Regularized multi-task learning, In Proceedings of the 10th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 2004, 109-117.
[25] L Zhang, D Zhang. Domain adaptation transfer extreme learning machines, In: J Cao, K Mao, E
Cambria, Z Man, KA Toh, (eds), Proceedings of ELM-2014 Volume 1, Proceedings in Adaptation,
Learning and Optimization, Springer, Cham, 2015, 3: 103-119.
[26] J Davis, P M Domingos. Deep transfer via second-order markov logic, In Proceedings of the 26th
Annual International Conference on Machine Learning, 2009, 217-224.
[27] J Schmidhuber. Deep learning in neural networks: An overview, Neural Networks, 2015, 61:
85-117.
[28] S Hochreiter, J Schmidhuber. Long Short-Term Memory, Neural Computation, 1997, 9(8): 1735-
1780.
[29] Y T Wu, M Yuan, S P Dong, et al. Remaining useful life estimation of engineered systems using
vanilla LSTM neural networks, Neurocomputing, 2018, 275: 167-179.
614 Appl. Math. J. Chinese Univ. Vol. 37, No. 4

[30] A Smola, B Scholköpf. A tutorial on support vector regression, Statistics and Computing, 2004,
14(3): 199-222.
[31] X J Zhou, T Jiang. Enhancing Least Square Support Vector Regression with Gradient Informa-
tion, Neural Processing Letters, 2016, 43: 65-83.

1
State Grid Zhejiang Electric Power Company Jinhua Power Supply Company, Jinhua 321000, China.
2
State Grid Zhejiang Electric Power Company, Hangzhou 310018, China.
Email: [email protected]

You might also like