Parallel Computing Based Distribution Network - Reliability Evaluation Technology Research

This document proposes using Apache Spark's parallel computing platform and MapReduce model to evaluate the reliability of large-scale distribution networks. Spark allows fast, iterative computations on massive datasets using its resilient distributed datasets (RDDs) held in memory. The authors introduce using minimum cut-sets algorithms adapted to Spark to calculate reliability indices. They implement their method on Spark to analyze a 10kV distribution network, verifying its faster computation speed and accuracy compared to traditional methods.

Uploaded by

SudiharyantoLika

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views

Parallel Computing Based Distribution Network - Reliability Evaluation Technology Research

Uploaded by

SudiharyantoLika

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Parallel Computing Based Distribution Network

Reliability Evaluation Technology Research

1st Wei Feng 2nd Liang Guo 3rd Qian Wu 4th Shaoxing Yan
China State Grid Taizhou Power China State Grid Taizhou Power China State Grid Taizhou Power China State Grid Taizhou Power
Supply Company Supply Company Supply Company Supply Company
Taizhou, China Taizhou, China Taizhou, China Taizhou, China
[email protected] [email protected] [email protected] [email protected]

5th Wei Jiang 6th Tianshi Cheng 7th Lili Huang

Southeast University NARI Technology Corporation Southeast University
Nanjing, China Nanjing, China Nanjing, China
[email protected] [email protected] [email protected]

Abstract 瀥With the development of social economy and the evaluation is an essential component of power system analysis
expansion of distribution network scaleˈ
ˈthe demand of power [1-3]..
supply reliability is gradually improving. However, the traditional One main difference between power system reliability
reliability analysis of distribution network power supply is more evaluation and that of distribution network lies on the
suitable for the simplified small-scale system, which is difficult to calculation scale [4]. The quantity of distribution line and
meet the reliability calculation task of large-scale distribution appliance in DN system is considerable. In a typical Chinese
network in the data age. In this paper, an analysis method based city, the feeder line number could exceed 5000 or even 10000.
on Spark parallel computing platform is proposed, and the system In high real-time applications, the reliability evaluation speed
reliability analysis is relied on Map-Reduce. Firstly, hash is challenging to traditional computation platforms and
mapping the network topology data to determine every element’s algorithms [5-6]. In recent research, parallel computing
connection. Then the minimum cut-sets is determined by depth technologies have been used in reliability evaluation. In [7], an
first search of each load point by Map-Reduce, and the reliability IEEE reliability test system is studied by using parallel genetic
index is calculated according to the series parallel relationship of algorithm with multiple processors. The results show that this
the components. On this basis, correcting the reliability index of method can improve the calculation efficiency of power system
the system by considering the influence of standby power supply reliability evaluation speed. It is proved in [8] that the state
and distribution automation equipment in network. Finally, the space pruning using parallel correction technique can improve
reliability of 10kV distribution network in a certain area is the convergence speed of reliability computation. In addition,
realized on the Spark platform, and the calculation speed and Map-Reduce is a typical parallel programming model that can
accuracy of the algorithm are verified. deal with massive data, it can deal with massive data quickly,
Keywords瀥Spark, Map-Reduce, reliability of distribution and can shield the low-level implementation details, and
network. reduce the difficulty of parallel programming [9].
This paper focuses on introducing a newly developed
I. INTRODUCTION
parallel computing platform: Apache Spark, into DN reliability
With the rapid development of distribution network (DN) evaluation [10]. The usage of minimum cut-sets algorithm in
in China, it has become one of the most complex systems in system reliability analysis is introduced and the algorithm is
the city. Considering the huge investment of DN and the improved to adapt the spark computing platform. Based on
importance of the economical entities it supports, its reliability Spark computing platform and minimum cut-sets algorithm,

,(((
the fast and accurate calculation of the distribution network is different worker nodes cannot communicate during computing
realized. processes. This characteristic prohibits high-efficient iterative
computations, which are commonly required in power system
II. SPARK BASED PARALLEL COMPUTING analysis including distribution network reliability evaluation.
PLATFORM Above problems can be solved by using memory based
computing and Resilient Distributed Datasets (RDD) provided
A. Apache Spark based distributed parallel computing
by Apache Spark computing platform. RDD is a fault-tolerant,
technology
parallel data structure, which allows users to store data into
Parallel computing is a definition compared with traditional memory or hard drives to control data partitions. Spark also
serial computing technology and it can be implemented using provides a set of operations to operate RDD based data
multithread or computing cluster to achieve better performance. structures, which can be classified into transformations,
In this paper, we use computing cluster technology to realize including map, flatMap, filter and actions, including join,
the parallel computing based reliability analysis of distribution groupBy, reduceByKey. The abstraction of RDD makes Spark
networks for three advantages: suitable for high-efficient memory-based Mapreduce
1) The computing cluster technology can be easily operations. For these reasons, Spark is selected as the basic
extended. If better computing power is needed, more nodes can parallel computing platform in our works.
be added into the cluster since the structure is extendable.
B. Design of the parallel computing platform
2) Powerful computing platforms are available for
computing cluster. Thanks to the rapid development of big data The practical parallel computing platform for power system
technologies, foundations and companies release their own reliability evaluation includes three components: data storage,
parallel computing platforms which integrate storage, task computing framework and interface/development environment.
management and data structures. Most of them are Considering the requirement of analysis of large scale
open-sourced. distribution network, the platform need to
3) Cloud computing technology reduces the cost of 1) Be able to store and read large scale of hybrid data,
computing cluster. Many internet companies like Amazon and including historical data, real-time data, structured data and
Alibaba provide cloud computing services which can be unstructured data.
applied and managed online. Meanwhile, the configuration of 2) High performance computing to accelerate algorithm
the clusters can be real-time adjusted according to the with distributed parallel structure and to realize program
performance requirement. Compared to purchasing expensive commonly, efficiently and flexibly.
multi-core CPUs, renting cloud computing environment is 3) Friendly development environment to assist prototype
cheaper and more convenient. test, algorithm inspection and application deployment.
The Mapreduce computing model is the most widely used
parallel computing programming model, which can be easily
adapted to diver-worker distributed parallel computing
platform. Since Map operations and Reduce operations are
independent, the scheduling and synchronization of worker
nodes are easy. Programmers do not need to know the details
of task management on different node and they only need to
split the algorithm into Map step and Reduce Step.
However, traditional Hadoop Mapreduce parallel
computing is based on open-loop data flow model. Every data
process involves read-write of files on hard drives. Meanwhile,
sets, is more often used in distribution network reliability
Interactive User Interface
(Jupyter Notebook) evaluation[11-12].
Jupyter Kernels
(Python,Scala, R, JAVA, C++...)

Parallel Computation Module Other Computation Module

Application Interface

Spark GraphX
MLib
(machine
Spark
Ipyparallel
(Ipython Kernel)
P5
Streaming (graph) SQL
learing)

...
Apache Spark P3

P6
1
Data Storage P
Historical Data Storage Real Time Data Storage P4
P0

Fig. 1. The structure of parallel computing platform for distribution network

reliability evaluation
Fig. 2 Operation modes trasfer of load

According to above standards, we designed the structure of

parallel computing platform for distribution network reliability The effect of reliability from backup power supply should
evaluation, as shown in Fig.1. The data storage level is be considered in distribution network using hand in hand
compatible with historical data storage like HDFS, HBase and power supply mode. The existance of backup power supply
MySQL, and the real-time database like mongoDB and redis. will shorten the average interraption duration. For that reason,
Apache Spark module extracts data from these sources with the possibility of load shift to backup power supply need to be
Spark SQL and uses RDD model to implement reliability counted in the cut-set based reliability evalution. Fig 2. shows
evaluation algorithm. On the top level, the Jupyter notebook the operation states trasfer of load. The whith circle represents
interactive computing environment is deployed. Users can the normal operation state of load, the circles with vertical
access data with different programming languages and kernels. lines represent the recovery state. Pxs indicate the occurance
Jupyter notebook also supports browser based programming propobilities of each state, which is listed in Table I.
and plotting, which allow the users to verify the performance
TABLE I. OCCURANCE PROPOBILITIES OF EACH STATE
of their algorithms.
No. Indicator Meaning

III. DESIGN OF PARALLELL COMPUTING 1 P0 The load is nornal

ALGORITHM FOR DISTRIBUTION NETWORK Component failure occurs but it can be
RELIABILITY EVALUATION 2 P1 transfered to backup power supply when
automation equipnents operate normally.
A. Cut-set based reliability evaluation with consideration of Component failure occurs and it cannot be
transfered to backup power supply(The
backup power supply 3 P2
coupling components of main source and
The strucutre of distribution network can be modelled with backup power supplies fail).
4 P3 Automation equipments operate normally.
undirected graph. Thus, it can be anaylzed with graph theory
5 P4 Automation equipments fail.
based algorithms. The minimal-path set method is a classic Backup line is availabe
6 P5
algorithm to analyze complex network. However, it is hard to 7 P6 Backup line is unavailabe
used to analyze closed-loop graph, which is common in The avarage interruption duration of load is ditermined by
distritution network. Thus, its dual model, the minimum cut the final state and the transfer path. The backup cut-set is
calucated from backup node and load node. If the intersection duration of backup power supply. r = r r ⋂ /(r ⋂ +
of the backup cut-set and load failure cut-set fails, the load r ) and r is the mean time to resolution (MTTR) wheren
cannot be transfered to backup power supply and the backup line is unavailable. λ is the annual failure rate of load.
interruption duration is unaffected. Correspondingly, if the A represents the power supply cut-set and B represents the
backup cut-set fails, the load can be transfered to backup backup power supply cut-set. r is the transfer duration,
power supply and the interruption duration changes. which is different when the transfer is accomplished manually
Meanwhile, the unavailablity of backup line should also be and automatically.
included.
B. Implement of parallel computing based reliability
With Fig.2, the average interruption duration with evaluation algorithm
consideration of backup power supply can be calucated as As shown in Fig. 3, the implement of parallel computing
based reliability evaluation includes four steps, which are data
= ( + ( + + )) (1) preprocess, parallel computing on minimum cut-sets, parallel
computing on reliability of load points and calculating
where P = λ /λ, P = 1 − P , P = 1 − P , P = 1 − reliability indexes. These steps will be introduced in following
P , P = U /8760 and U is the average interruption paragraph.

<L,
<L, topo>
topo>
po> Depth
Depthh first
first
fi
Split 0
Map task1 search
search

Calculate the reliability index of the

Initialization

system by the node reliability

<L,
<L, topo>
topo> Determine
Determine the
the
Split 1 Reduce task1
x Copy load minimum
minimum cutcut -- <L
< ii,λ
<L ,λii U
Uii>>
node file sets
sets between
between the
the

Merge Data
x Map task2 load
load node
node and
...
Read topology and
x Determine the Split 2 source
source and
and
<L,
<L, topo>
topo>
source and backup
backup supply
supply

x
Backup supply
Input Split 3 <L,
... Calculate failure <L
Calculate failure < ii,λ
<L ,λii U
Uii>>
<L, topo>
topo>
reliability rate
rate and
and outage
outage Reduce task n
parameters time
time ofof the
the load
load
Map task n node
node according
according
Split 4 to
to the
the cut-
cut- set
set

Fig. 4. Parallel computation of reliability based on Mapreduce

1) Data preprocess cache probabilities is by searching the vertex indexes of target

The original model of distribution network has to be edge or two-dimension matrix containing all vertexes. The
transferred to data structure that suits parallel computing. former requires high time complexity and the later high space
Considering most distribution network contains close-loop, the complexity. In our research, we use following tuple to indicate
original model is complex graph, which is hard to analyze. edge from vertex f to t
Thus, the first process of data preprocessing is transferring the
original complex graph to undirected simple graph. The =( , ) < (2)
create_simple_graph(net) function from PandaPower library
[13] is used to create the simple graph in our work. A hash mapping could be used to store the probability of
Meanwhile, during the parallel computing of reliability, the the above tuple[14]. In our work, the hash function of cpython
state occurrence probabilities will be accessed with is used for this purpose.
high-frequency. The storage of the probabilities determines the 2) Parallel computing on minimal cut-sets
computing speed and efficiency. The traditional approach to
With the data structure from data preprocess, the cut-sets of cut-sets. In the above formulas, λi and ri is the fault rate and
each load point to main power sources and backup power repair time of components in cut-set ci ; λci and Uci is the failure
sources. The depth first search (DFS) algorithm is used to rate and fault time of the cut-set i.
obtain minimal path-sets of the model. The minimal path-sets λL1’ and UL1’ are then corrected by using the difference and
matrix can be used to calculate the minimum cut-sets[15-16]. interssection to calculate the reliability index of the component
In the parallel computing design, these processes are in the backup supply which does not in the minimum cut-set.
implemented with the map transformation of Spark. With map The minimal of corrected failure rate and interruption duration
method, the cut-sets of large scale of load points can be λL1 and UL1 are finally used to obtain the reliabilty indexes.
searched simultaneously. These processes are also implemented as a map
3) Parallel computing on reliability of load points transformation and executed in parallel on multiple computing
Reliability calculation under main power supply nodes.

Reads the
Judge the The elements
A series
4) Parallel computing on reliability of load points and
connection in the cut-set
componen
relation are handled
system O '
L1 calculating reliability indexes
ts of the formula is
between in a parallel U '
minimum
components system
used to deal L1 The reliability indexes includes system average interruption
cut with cut-sets
and cut-sets formula frequency index (SAIFI), system average interruption duration
index (SAIDI) and average service availability index (ASAI),
which are defined in (5)-(7)
Calculation of reliability correction for backup supply
Reads the Use difference and Compute ∑
Correct the = (5)
minimum interssection to calculate Min( λL1,
parameters of
cut and the the reliability index of the UL1) to
backup component in the backup O ' and U '
L1 L1 get the ∑
minimum supply which does not in
then get
best = (6)
λL1 and UL1
cut set the minimum cut set backup
∑
= 1− (7)
⋅
Output the min( λL1, UL1)
These indexes are counted as the mathematical expectation
Fig. 4. Algorithm of reliability computing of load point of failure rate and interruption duration of all load points. This
processes is carried out in parallel as a reduce action of Spark.
The algorithm of reliability computing on load point is
shown in Fig. 4. First, the reliability calculation under main
IV. EXPERIMENT AND ANALYSIS
power supply is implemented. The probabilities of every main
power supply cut-set of each load point forms a parameter The experiments are implemented on Ubuntu Server 16.04,
matrix. The series formula and parallel formula are updated Spark 2.1.0, python 3.5, Pandas 0.19.2 and PandaPower 1.2.
according to the connection between components and cut-sets. The hardware configuration is CPU: i7 6700K (4.0-4.2Ghz)
Thus, the failure rate λL1’ and interruption duration UL1’ can be and 32G DDR4 2133Mhz memory. The IEEE RBTS-BUS4
calculated with following equation: 33kv modal used in [17] is used to verify the parallel
computing technology, as shown in Fig. 5. The reliability
°Oci Oi ri (¦1 / ri ) parameters are listed in Table II.
® (3)
°̄U ci Oi ri TABLE II. RELIABILITY PARAMETERS
Circuit
°OL' 1 ¦ Oci Equipment Line
Breaker
Transformer
® ' (4)
°̄U L1 ¦ Oci rci λ(times/year) 0.04 0.00514 0.01
r(hours/time) 8 8 100
The elements in the cut-set are handled by parallel system λ’(times/year) 0.0143 0.2055 0.1444
formula 3 and the series formula 4 is used to deal with the r’(hours/time) 2.18 8 10
[2] Andrej Schreiner, Gerd Balzer, Armin Precht. "Risk sensitivity of failure rate
and maintenance expenditure, " 2011 IEEE 11th International Conference on
Probabilistic Methods Applied to Power System, pp.137-142, 2010.
[3] R.M. Vitorino, L.P. Neves, H.M. Jorge. " Network reconfiguration to
improve reliability and efficiency in distribution system, " 2009 IEEE
Bucharest PowerTech, pp.1-7, 2009.
[4] Qian Xie, Haozhong Cheng, Yi Zhang, et al. "Active distribution network
planning based on active management, " 2014 China International
Conference on Electricity Distribution, pp.1261-1265, 2014.
[5] Baek, Joonsang, et al. "A Secure Cloud Computing Based Framework for
Big Data Information Management of SmartGrid," IEEE Transactions on
Cloud Computing, vol. 3, pp. 233-244, 2015.
[6] Song, Yaqi, G. Zhou, and Y. Zhu. "Present Status and Challenges of Big
Data Processing in SmartGrid." Power System Technology, vol. 37, pp.
927-935, 2013.
[7] Lingfeng Wang, Chanan Singh. "Multi-deme parallel genetic algorithm in
reliability analysis of composite power systems," 2009 IEEE Bucharest
PowerTech, pp.1-6,2009.
[8] Robert C. Green, Lingfeng Wang, Mansoor Alam, et al. "Intelligent and
Fig. 5. System network structure parallel state space pruning for power system reliability analysis using MPI
on a multicore platform," 2009 ISGT 2011, pp.1-8,2011.
[9] Dean J, Chenmawat S. "MapReduce : simplified data processing on large
TABLE III. RELIABILITY EVALUATION RESULTS clusters," Communications of the ACM, vol.51,pp.107-113, 2008.
[10] Liu, Keyan, et al. "Big Data Application Requirements and Scenario
Reliability SAIFI SAIDI Analysis in Smart Distribution Network." Zhongguo Dianji Gongcheng
ASAI
indices (times/year) (h/year) Xuebao/proceedings of the Chinese Society of Electrical Engineering, vol.
Value 0.1266 0.2658 0.999970 35, pp. 287-293, 2015
[11] Zefang Zhou, Zhean Gong, Bo Zeng, et al. "Reliability analysis of
The reliability evaluation result is listed in Table III, which distribution system based on the minimum cut-set method," 2012
International Conference on Quality, Reliability, Risk, Maintenance, and
is identical with the result in [17]. The parallel computing Safety Engineering, pp.112-116,2012.
performance is evaluated with the model of distribution [12] Vijay Venu Vadlamudi; Oddbjørn Gjerde, Gerd Kjølle. "Impact of
protection system reliability on power system reliability: A new minimal
network of Taizhou, China. The model contains 207 nodes, cutset approach," 2014 International Conference on Probabilistic Methods
Applied to Power Systems (PMAPS), pp.1-6, 2014.
203 lines, 45 swathes/breakers, 29 load points and 5 backup [13] https://pandapower.readthedocs.io/en/v1.3.0/elements.html#
[14] Tang H, Gulbeden A, Zhou JY, et al. "A self-organizing storage cluster for
power supplies. The traditional series computing performance parallel data-intensive applications," IEEE Computer Society, pp.52-63,
is compared with parallel computing performance. The series 2004.
[15] Jane C C, Lin J S,Yuan J. "Reliability evaluation of a limited-flow network
computing takes 4.116s while parallel computing 2.75s. The in terms of minimal cutsets," IEEE Transactions on Reliability, vol.42,
pp.354-361, 1993.
speedup ratio is 33%. This result verifies the efficiency of [16] R.Billinton, R.Allan. Reliability Evaluation of Engineering Systems:
proposed parallel computing based reliability evaluation. The Concepts and Techniques (second edition). New York and London: Plenum
Press, 1992.
performance can be further enhanced when more nodes [17] R.N. Allan, R.Billinton, I.Sjarief, et al. "A reliability test system for
educational purposes-basic distribution system data and relusts, " IEEE
participate in the cluster. Transactions on Power Systems, vol.6, pp.813-820, 1991.

V. CONCLUSION
In the paper, an Apache Spark based parallel computing
platform is designed for distribution network reliability
evaluation. The eveluation algorithm contains four steps and is
deisgned in parallel with RDD techonolgy. The experiment
result indicates that the proposed parallelled computing
methods could enhance the effiencicy and speed of distribution
network reliability evaluation.

REFERENCES
[1] Alan J. McBride, Andrew R. McGee. "Assessing smart Grid security," Bell
Labs Technical Jouranl, vol.17, pp.87-103, 2012.