0% found this document useful (0 votes)
38 views

Moth-Flame Optimization-Bat Optimization: Map-Reduce Framework For Big Data Clustering Using The Moth-Flame Bat Optimization and Sparse Fuzzy C-Means

The document describes a new technique for big data clustering using the Moth-Flame Optimization-Bat Optimization (MFO-Bat) algorithm within Spark's architecture. The technique first uses MFO-Bat to select optimal features from big data across distributed systems. It then feeds the selected features to Spark's final cluster nodes, which use sparse fuzzy C-means clustering to perform optimal clustering. When tested on big data sets, the proposed MFO-Bat approach achieved classification accuracy of 95.806%, Dice coefficient of 99.181%, and Jaccard coefficient of 98.376%, outperforming other existing methods.

Uploaded by

Tanvir Sardar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Moth-Flame Optimization-Bat Optimization: Map-Reduce Framework For Big Data Clustering Using The Moth-Flame Bat Optimization and Sparse Fuzzy C-Means

The document describes a new technique for big data clustering using the Moth-Flame Optimization-Bat Optimization (MFO-Bat) algorithm within Spark's architecture. The technique first uses MFO-Bat to select optimal features from big data across distributed systems. It then feeds the selected features to Spark's final cluster nodes, which use sparse fuzzy C-means clustering to perform optimal clustering. When tested on big data sets, the proposed MFO-Bat approach achieved classification accuracy of 95.806%, Dice coefficient of 99.181%, and Jaccard coefficient of 98.376%, outperforming other existing methods.

Uploaded by

Tanvir Sardar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Big Data

Volume 00, Number 00, 2020


ª Mary Ann Liebert, Inc.
DOI: 10.1089/big.2019.0125

ORIGINAL ARTICLE

Moth-Flame Optimization-Bat Optimization:


Map-Reduce Framework for Big Data Clustering
Using the Moth-Flame Bat Optimization
and Sparse Fuzzy C-Means
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

Vasavi Ravuri1,* and S. Vasundra2

Abstract
The technical advancements in big data have become popular and most desirable among users for storing, pro-
cessing, and handling huge data sets. However, clustering using these big data sets has become a major chal-
lenge in big data analysis. The conventional clustering algorithms used scalable solutions for managing huge
data sets. Thus, this study proposes a technique for big data clustering using the spark architecture. The proposed
technique undergoes two steps for clustering the big data, involving feature selection and clustering, performed
in the initial cluster nodes of spark architecture. At first, the initial cluster nodes read the big data from various
distributed systems, and the optimal features are selected and placed in the feature vector based on the pro-
posed moth-flame optimization-based bat (MFO-Bat) algorithm, which is designed by integrating MFO and
Bat algorithms. Then, the selected features are fed to the final cluster nodes of spark, which uses the sparse-
fuzzy C-means method for performing optimal clustering. The performance of proposed MFO-Bat outperformed
other existing methods with a maximal classification accuracy of 95.806%, Dice coefficient of 99.181%, and Jac-
card coefficient of 98.376%, respectively.
Keywords: big data; big data clustering; fuzzy; optimization algorithm; spark architecture

Introduction fact that it is compiled of semistructured and struc-


The advancements in information technologies and tured information. The big data are generally unstruc-
information societies led to different types of data. Dif- tured and concentrate on three principles, namely
ferent types of data are collected in the database and velocity, variety, and volumes.7 The data mining tech-
data libraries, and it becomes very difficult to under- niques are used in big data for transferring the accu-
stand the data from a human’s perspective.1 The accu- mulated big data to extensive knowledge, which are
mulation of data from various scientific researches, understandable to humans.6 Data mining is described
industrial productions, and daily live updates has as the process of extracting useful information and
endorsed the volatile progress of information technol- knowledge, unknown to humans, and also eliminates
ogies2,3 and produces an advanced subject in the aca- the noisy, incomplete, fuzzy, random, and disordered
demic as well as industrial fields called big data.4,5 The data from the large data sets.1 The cluster analysis6 is
big data are described by the industrial field that ex- a type of functional data mining technique, whose
pands the information with mass force and high aim is to partition the given group of unlabeled data
growth rate for making better decisions and process- samples into subgroups or clusters. The samples that
ing the optimization ability.6 The major difference be- belong to the same cluster are similar to each other,
tween the traditional data and big data relies on the whereas the data samples belonging to different clusters

1
VNRVJIET, Hyderabad, India.
2
Department of CSE and NSS Coordinator, JNTUA University, Ananthapuramu, India.

*Address correspondence to: Vasavi Ravuri, VNRVJIET, Pragathi Nagar, Hyderabad, Telangana 500090, India, E-mail: [email protected]

1
2 RAVURI AND VASUNDRA

are dissimilar to each other. The clustering is applicable the uniqueness of multiple sources and dimensions,
in various scientific fields.1 structured, unstructured, or semistructured heteroge-
Various techniques are devised for big data cluster- neous data. Hence, the storage of huge amounts of
ing,8 which are considered an active research field.9 data in the relational database becomes complicat-
Big data clustering has been widely studied in many ed.10,14 From data sets obtained by acquisition devices,
areas, such as medicine and chemistry.10 The signifi- only a small amount of data are important. There exist
cant features are utilized to construct the clusters, two types of data heterogeneity, involving syntactic
whereas the insignificant features are not helpful for heterogeneity and conceptual heterogeneity. Syntactic
constructing the clusters.8 Many features influence heterogeneity has occurred when two data are not ar-
the performance of the inductive learning algorithm.11 ticulated in the same language. Likewise, conceptual
Insignificant features are noisy and can be eliminated heterogeneity, which is also named as semantic hetero-
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

for reducing the size of data for yielding improved clus- geneity or logical mismatch, represents the differences
tering. This clustering also minimizes the noise and is in modeling the domain interest.4,14 Moreover, termi-
beneficial for storing a large amount of data and pro- nological heterogeneity depicts the dissimilarities in
cessing those stored data. Feature selection is one of names when it refers to the same entities with differ-
the important tasks in data mining, which removes ent sources of data. Moreover, semiotic heterogeneity
the unrelated and inconsistent features and enhances is also termed as pragmatic heterogeneity, used for
the performance of learning. Various clustering algo- representing the interpretation of peoples with differ-
rithms are devised for clustering the unsupervised ent entities.13
data for initiating the classification.12 The feature ex- The primary intention of this research is to develop a
traction methods, such as principal component analy- technique for clustering big data sets using the spark
sis, singular value decomposition, or Karhunen/Loeve architecture. The method consists of two phases,
transformation, and the dimensionality reduction namely, feature selection, and clustering. The initial
methods are used for clustering the data. In Dash cluster nodes read the big data from various distributed
and Liu,8 the CLIQUE algorithm was devised, in systems and form a feature vector based on the pro-
which each dimension is divided into user-defined di- posed moth-flame optimization-based bat (MFO-Bat)
visions and starts by determining the dense regions algorithm that selects the optimal features for cluster-
in dimensional data. In this method, k-dimensional ing. The proposed MFO-Bat is designed by integrating
dense regions are determined using the candidate gen- the MFO algorithm and Bat optimization algorithm to
eration algorithm named Apriori. The method is re- acquire the advantage of the MFO and Bat optimiza-
sponsible for clustering the complete data. The tion algorithms for selecting the optimal features. The
projected clustering is to determine the regions of in- selected feature is then provided to the final cluster
terest in subspaces of high-dimensional data. This nodes of spark in which the clustering is performed
method determines the clusters and chooses features using the sparse-fuzzy C-means (FCM) algorithm.
for each cluster. Moreover, this method investigates Thus, optimal clustering is carried out on the final clus-
the features by adapting a restriction of the lowest ter nodes of spark using the available data.
and the highest number of features.4,8 The major contributions of the proposed method
The issues of big data algorithms focus on designing used for big data clustering are as follows:
the algorithm, addressing the complexities elevated by
 Proposed MFO-Bat for feature selection: Combin-
big data volumes, complex, and distributed data. The
ing MFO with Bat optimization algorithm to
challenge consists of the following phases, namely het-
design a novel algorithm, MFO-Bat, to select sig-
erogeneous, sparse, uncertain, and incomplete phases.
nificant features for big data clustering. The pro-
Different data are preprocessed using different data fu-
posed MFO-Bat is utilized in the slave nodes for
sion methodologies.13 The heterogeneous data refer to
selecting the optimal features of big data and
any data that has a high variability of data formats.
then introducing sparse FCM for clustering the
They are perhaps indefinite and pose low qualities
big data.
due to missing values, high data redundancy, and un-
truthfulness. It is very complicated to combine hetero- The organization of the article is as follows. The
geneous data for meeting the demands of business Introduction section explains the introductory part
information.13 Multiple heterogeneous big data pose based on big data clustering. The literature survey
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 3

using different methods for big data clustering along and changes over time, and another was about all
with the challenges is given in the Literature Review nodes that tend to be homogenous. Also, the fuzzy
section. The Proposed Big Data Clustering Based logic-based clustering algorithm was heuristic in na-
on the Spark Architecture section describes the pro- ture, which may lead to clustering failure. Zhang
posed MFO-Bat technique developed for big data et al.18 designed an algorithm named secure weighted
clustering. In the Results and Discussion section, possibilistic C-means algorithm (SWPCM) on the
the results and the comparative analysis are per- basis of the background verification (BGV) encryption
formed to evaluate the performance of the proposed scheme for clustering big data in the cloud environ-
technique. Finally, the article is concluded in the ment. Here, the BGV was utilized for encrypting the
Conclusion section. raw data for preserving the privacy in cloud infrastruc-
ture. Moreover, the Taylor theorem was utilized for ap-
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

Literature Review proximating the functions to calculate the weight value


The section deliberates the literature survey of the big of each object and simultaneously update the member-
data clustering methods along with the disadvantages ship matrix. At last, the weighed possibilistic C-means
of the methods. Also, the challenges of the existing (PCM) algorithm was applied for computing the addi-
methods are deliberated. tion and multiplication operations for establishing
Son and Tien15 developed a hybrid clustering encrypted data on the cloud. The method attained
method by integrating incremental clustering and good stability over the cloud for clustering the big
FCM for addressing the big data problem. The first al- data but failed to involve a homomorphic encryption
gorithm determined the rectangle meshes and termed scheme for improving the clustering efficiency without
the data points as representatives, whereas the second disclosing the confidential data.
algorithm adapted the data points, which had a strong Ilango et al.19 developed a technique named the arti-
influence over other representatives. These representa- ficial bee colony (ABC) approach for minimizing the
tives were clustered using FCM, and then the new time taken for execution and for optimizing the best
centers were chosen for clustering the data, but this cluster among the different sizes of clusters. The
method failed to consider a half-sphere representative method used the real bee’s behavior for addressing
method for increasing the dimension, and additional the problems of numerical optimization in clustering.
exact boundaries were needed for using the density- The algorithm was executed in a Hadoop environment
based method. Hidri et al.16 developed a technique using Mapper and Reducer programming. The method
named consensus clustering for handling the big data minimized the execution time and gave improved per-
clustering. Here, the sampling was integrated with a formance based on time efficiency. Bijari et al.20 devel-
split-and-merge strategy for fragmenting the data oped a heuristic method named Big Bang–Big Crunch
into smaller subsets, and then basic partitions were algorithm for solving the problems of clustering. The
generated using the MapReduce model for computing method adapted the advantage of heuristics for allevi-
the final result. The scalability analysis was performed ating the clustering algorithms. The drawbacks of the
for increasing the number of computing nodes and method were solved using the memory of previously
used the size of samples for fulfilling the volume and created solutions. Moreover, these solutions were
velocity dimensions, but this method was ineffective used for obtaining the new candidate in a probabilis-
while dealing with the heterogeneity and speed of tic random walk manner for improving the exploita-
data flow. tion and exploration. The method improved the
Wang et al.17 developed an analytical model for pro- exploitation and exploration but suffered from slow
viding the optimal clusters in the wireless sensor net- rate convergence.
work. Here, the centralized cluster algorithm was Sreedhar et al.21 developed a method named
devised based on the spectral partitioning method. K-Means Hadoop MapReduce (KMHMR) for cluster-
Also, the distributed execution of the clustering algo- ing huge-scale data using MapReduce. The method
rithm using FCM was devised for the clustering. The concentrated on the implementation of MapReduce
method revealed that the used algorithm outperformed using standard K-means. The method also improved
other algorithms based on cost, network lifetime, and the cluster quality for generating the clusters with min-
energy, but this algorithm faced certain limitations: imum intercluster and maximum intracluster distances
first was the network topology, which was dynamic for the huge data set. The method was efficient and
4 RAVURI AND VASUNDRA

improved the performance while processing the huge ing zero weights to the noisy features. Hence, the clus-
data sets, but the method failed to consider multilevel tering results cannot be affected by noisy objects.
queues for scheduling the jobs using huge data sets.
 Among the conventional techniques, evolution-
Chormunge and Jena22 developed a method named
ary techniques are effectively utilized in selecting
correlation-based feature selection with clustering for
the optimal features. However, the extreme incre-
solving the dimensionality problem, while integrating
ment of the individual size limits the applicability
the clustering with correlation measure, and for pro-
and, thereby, not able to offer a preprocessed data
ducing good feature subset. Initially, irrelevant features
set in a specific amount of time, while addressing
were removed using the K-means clustering method,
huge problems. In the existing works, there are
and then, the nonredundant features were chosen as
no standard methods for addressing the prob-
a correlation measure from each cluster. The method
lems of feature space with evolutionary big data
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

posed the ability to solve dimensionality problems


models.12
and minimize the good feature subset, but filter mea-
sure was not used for selecting the significant feature In this work, the MFO-Bat algorithm is used for
for enhancing the performance. feature selection, which solves the nonlinear prob-
Kushwaha and Pant23 developed a feature selection lems with complex constraints and selects the effec-
method named link-based particle swarm optimization tive features.
for clustering the big data. The method provided a new
 The major challenge faced by machine-learning
neighbor selection strategy for selecting the important
techniques is to determine the important and
features. The method used an update-based neighbor
nonredundant data from the applications consist-
position for improving the exploitation and explora-
ing of a huge number of data. While operating the
tion capabilities. The method required less cost and
data, a huge number of irrelevant and redundant
computation time. However, the method failed to de-
features are generated that maximize the compu-
tect informative features, which were liable to improve
tation cost and diminish the accuracy of machine-
the search abilities of the algorithm. Shukla and
learning techniques. Due to the dimensionality
Muhuri24 developed a technique for big data clustering
issues, most of the existing learning algorithms
using interval type-2 fuzzy sets (IT2 FSs) in gene ex-
failed to scale in huge data with more features.
pression data. The IT2 FS-based technique was more
Moreover, the presence of noise can badly affect
efficient in providing better clustering results for uncer-
system performance and also mortifies the learn-
tain gene expression data sets, and it was scalable to the
ing algorithms.22
large gene expression data sets. Heidari et al.25 intro-
duced an algorithm for big data clustering with varied The proposed work utilizes the spark architecture,
density using a Hadoop platform running MapRe- which processes the huge data using machine-
duce. This method utilized the local density to deter- learning algorithms, interactive data analysis tools,
mine the density of every point, which avoids the and reuses the data in parallel operation, at the
situation of connecting clusters with varying densities. same time maintaining the scalability. Also, the sparse
This method offered the best varying density cluster- FCM used in this work offers the best results even if
ing capability and scalability. the data have noise.

Challenges Proposed Big Data Clustering Based


The challenges faced by the existing techniques are en- on the Spark Architecture
listed as follows: This section uses the spark architecture and proposed
method for clustering the big data. The block diagram
 The PCM is considered an effective technique
of big data clustering using the proposed MFO-Bat
for analyzing big data, but the conventional
algorithm-based spark architecture is shown in Figure 1.
PCM algorithm is unable to obtain the desired
Initially, requests from clients, remote employees, and
clustering results for the data sets that contain
office employees are gathered for processing the desired
the noisy objects.18
requests. The users are connected to the system using
In this work, sparse FCM is used for the clustering the Internet connectivity. The spark architecture is
process, which uses the sparse regularization for assign- used for clustering the big data. Spark26 is a framework
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 5
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

FIG. 1. Block diagram of big data clustering using the proposed MFO-Bat algorithm-based spark architecture.
FCM, fuzzy C-means; MFO-Bat, moth-flame optimization-based bat.

used to process huge data processing tasks using gen- namely a master node and slave node. The master node
eral purpose programming language on the big data. is responsible for managing and distributing the task
The spark supports several interactive data analysis obtained from the requests of the user by partitioning
tools, machine-learning algorithms, and reuses the the obtained tasks into different subtasks for each
data in parallel operation, while maintaining the scal- slave node. These subtasks are processed by the slave
ability. The spark architecture poses two main modules, nodes for processing the request. In this model, assume
6 RAVURI AND VASUNDRA

the size of master node is m · n, which is divided into Algorithm 1. Pseudocode for Selecting the Features
four slave nodes, each of size p · q. In each slave
Procedure Parallel_Feature selection (Solution M)
node, the feature selection process is carried out {
using the proposed MFO-Bat for selecting the opti- Master:
Call proposed MFO-Bat algorithm to select the optimal feature
mal features. The proposed MFO-Bat is designed by Each block acquires the block of features of the data set
integrating the MFO and Bat optimization algorithms Slave (Parallel):
for effective feature selection. Each extracted feature Perform feature selection of each cluster nodes using proposed
MFO-Bat.
is of size u · v and these features are combined for ini-
Master:
tiating the clustering process, which poses size r · s: M = Merging optimal features from all slaves by calling proposed
The clustering is carried out using the sparse-FCM MFO-Bat algorithm
Return M
algorithm and each cluster is of size u · v. The sparse-
}
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

FCM works by dividing the data into different sub-


sets, and then the clustering is carried out on different
subsets. In clustering, the cluster centers and mem- Proposed MFO-Bat algorithm for feature selec-
bership functions are given as input for clustering tion. Feature selection is a process for selecting the es-
the subsequent subset. The clustering is carried out sential features to reduce the feature space based on
on an Apache Spark. The obtained membership in- specified targets. Moreover, the optimal feature was
formation of all the processed subsets is used for cov- selected using the optimization algorithm, which
ering the sample space. Thus, using the cluster enhances the clustering rate by minimizing dimensional-
centers produced by membership information avoids ity and eliminating irrelevant features. Here, the effective
the issues related to highly deviated clusters and re- feature selection is performed using the proposed MFO-
sults in faster convergence and works well with Bat optimization algorithm. In this method, a new hybrid
huge data sets. optimization algorithm, named MFO-Bat, is designed
Thus, the proposed big data classification method for feature selection, and the Apache Spark architecture
involves two processes, such as feature selection and is adapted for handling the big data. The proposed
clustering, performed in the initial cluster nodes of MFO-Bat algorithm is designed by the hybridization of
spark architecture in a parallel manner. MFO27 and Bat algorithms.28 Moreover, the integration
of MFO into Bat inherits the advantages of both the
Parallel implementation of slave nodes on spark MFO and Bat algorithms. The MFO algorithm is highly
Parallelizing the algorithms tends to be useful for efficient, as it can provide competitive exploration using
addressing large-scale data and solves issues of optimi- multimodal functions. Moreover, MFO can balance the
zation and analytics based on clustering and optimal exploitation and exploration and is effectual for solving
feature selection in less time. In the parallelized algo- the issues with unknown search spaces. Moreover, MFO
rithm, the data points are not influenced by each saves the best solutions, so they are not lost. The demer-
other when they are reassigned for the closest cen- its of MFO are that it possesses lower convergence, is
troids. The process of selecting the optimal features highly sensitive to the hyperparameters, and cannot
can be paralleled by utilizing the spark architecture solve multiobjective algorithms. Thus, the demerits of
with master and slave nodes. Algorithm 1 shows the MFO are overcome using Bat that offers a better conver-
procedure for selecting the optimal feature. Consider gence rate, while obtaining a global optimal solution.
C as the database having i number of data with j attri- The echolocation characteristics of microbats motivate
butes and is represented as the Bat optimization algorithm. This method is very
efficient for generating improved features for solving
C = fCi, j g; 0  i  k and 0  j  l (1) multiobjective optimization problems. Moreover, the
method can solve the highly nonlinear problems with
The data Ci, j in the database are splitted into a fi- complex constraints. Thus, integrating both MFO and
nite number, which is equal to the number of slave Bat is done for solving the optimization problems to
nodes. In the slave node, the feature selection process gain effective feature selection.
is carried out based on the proposed MFO-Bat algo- At first, the initial cluster nodes acquire the big data
rithm, and the steps for the feature selection are from different distributed systems to obtain the fea-
given as follows: tures and form a feature vector by selecting the optimal
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 7

features using the proposed MFO-Bat algorithm. The tance between two classes. Thus, the fitness of the solu-
steps involved in the proposed MFO-Bat for the effec- tion is given by the following:
tive feature selection are described in this section and
K = fBD g, (6)
enlisted as follows:
Step 1: Initialization. The first step is to initialize the where K represents the fitness function and BD indi-
solution space with the position of moths. The solution cates the Bhattacharyya distance between two classes.
of MFO is in the form of a vector. Thus, the solution Thus, the formula for Bhattacharyya distance between
space of moth is given by the following: two classes is given by the following:
( !) ( 2 )
2 3 1 1 r2x1 r2x2 1 l x1  lx2
R1, 1 R1, 2 . . . R1, h BD (x1 , x2 ) = ln þ 2 þ2 þ ,
6 R2, 1 R2, 2 . . . R2, h 7 4 4 r2x2 r x1 4 r2x1 þ r2x2
6 7
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

R=6. 7, (2)
4 .. 5 (7)
Rg, 1 Rg, 2 . . . Rg, h
where BD (x1 , x2 ) is the Bhattacharyya distance between
where g represents the total number of moths, where two classes x1 and x2 , the variance of (x1 )th class and
1  c  g, and h indicates the total number of dimen- (x2 )th is given by rx1 and rx2 , and the mean of (x1 )th
sions. Once the solution space of moth is initiated, the class and (x2 )th is given by lx1 and lx2 .
array of storing the obtained value of fitness is com-
puted for a random solution and is stored in a matrix Step 3: Update solution based on proposed MFO-Bat algo-
given by the following: rithm. The MFO algorithm updates the solution
2 3 space on the basis of flame intensity. The alterations
F1 in the flame intensity make the moth movement in
6 F2 7
6 7 one direction. Thus, the update solution of MFO is
F=6
6 7,
7 (3)
4 5 given by the following:
Fg
Rc = Zc :euv :Cos(2pv) þ Sd , (8)
where g indicates the total number of moths. Similarly,
the solution space of flame is given by the following: where Zc is the distance between cth moth and d th
2 3 flame, u represents the constant for describing the
S1, 1 S1, 2 ... S1, h shape of the logarithmic spiral, v specifies the random
6 S2, 1 S2, 2 ... S2, h 7
6
S=6.
7
7, (4) number in the range [1,1], and Sd represents the d th
4 .. 5 flame. Here, the solution update of the Bat optimiza-
Sg, 1 Sg, 2 ... Sg, h tion algorithm is used to formulate the update equa-
tion of the proposed MFO-Bat algorithm. The
where Sg, h indicates the g th moth in hth dimension. equation for position update of bat is based on the fol-
Once the solution space of flame is initiated, the lowing equation:
array of storing the obtained value of fitness is com-
puted for a random solution and is stored in a matrix P¢ = P þ aT y , (9)
given by the following:
2 3 where a represent a random number between [1,1],
F1¢
6 F2¢ 7 Ty is the average loudness of bat, P¢ is the new solution
6 7 for each bat, and P is the old solution of each bat. After
F¢ = 6
6 7,
7 (5)
4 5 rearranging the above equation, the value of the ran-
Fg¢ dom number is given by the following:

where g indicates the total number of moths. P¢  P


a= (10)
Ty
Step 2: Evaluation of fitness. The fitness of the solution
is computed on the basis of the distance measure. The Assume that both random numbers obtained from
Bhattacharyya distance is used for computing the dis- the MFO algorithm and Bat algorithm acquire the
8 RAVURI AND VASUNDRA

same value for a particular iteration, which is given by sparse FCM is used to find the cluster centroids and
a = v. Thus, the above equation becomes is elaborated in the following section: Algorithm 3 illus-
trates the algorithmic steps of parallelized clustering.
P¢  P
v= (11)
Ty Algorithm 3. Procedure for Performing Parallelized Clustering

Thus, in the equation after substituting (11) in Equa- Parallelized_ Clustering


tion (8), the obtained solution for the proposed optimi- {
Call sparse FCM for splitting the final cluster nodes containing the
zation is given by the following: selected features.
H = Clustered data
  ¢  Return H
P P
Rc = Zc :euv Cos 2p þ Sd (12) Master:
Ty H = Merge the partial clusters from all slaves, by calling sparse-FCM
method
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

Return H
Step 4: Determination of the best solution. The best so- }
lution R is obtained using the fitness function. Thus,
the fitness for calculating the best solution is derived Sparse-FCM method for clustering huge data. In this
using Equation (6). section, the sparse FCM29 is used for clustering the
huge data. Numerous data pose the cluster structure,
Step 5: Termination. The algorithm is terminated, which considers limited relevant features rather than
when the maximum iteration tmax limit is crossed, the whole feature set. However, for huge data, identify-
and finally, at the end of an iteration, the algorithm de- ing the significant feature and determining the cluster
termines the best solution. Thus, the features selected structure becomes complex. For solving these issues,
using the proposed MFO-Bat is given by the following: sparse FCM is being used for initiating the clustering
process using the selected features from the previous
A = fA1 , A2 ,    , Af g, (13)
step. The sparse FCM uses the sparse regularization
where f is the total number of features, and A represents a for assigning zero weights to the noisy features to clus-
feature vector of size r · s. The selected features are sub- ter huge data. Here, the similarity is represented using a
jected as an input to the final cluster nodes of the spark distance measure and tries to determine the collection
that is provided with the sparse-FCM method. The opti- of clusters that reduces the intracluster distances and
mal clustering is performed in final cluster nodes of spark increases the intercluster distance. The data to be clus-
such that they form optimal clusters. Algorithm 2 speci- tered are represented as data points, and the set of data
fies the steps of the proposed MFO-Bat algorithm. is termed as a data set.

Step 1: Initialization. The first step in the initialization


Algorithm 2. Procedure Moth-Flame Optimization-Bat Algorithm
step is the initialization of cluster centers O, attribute
Input: Rmoths and S flames weight x, and dissimilarity measure eð<Þ.
Output: R // best solution
Initialize the position of moths R
Initialize the position of flames S Step 2: Partition matrix update. At first, fix the cluster
If (t < tmax )
Evaluate the fitness of each solution.
center O, and attribute weights x and eð<Þ represent
Update the solution using Equation (10). the dissimilarity measure and are minimized if the fol-
End if lowing condition satisfies:
t=tþ1
Compute the fitness of new position using Equation (5) 8
> 1
Update the position >
> ; if Bkt = 0 and Mt = card fl : Bkt = 0g
Return R >
> M
>
> t
Terminate <0 ; if Bkt 6¼ 0 but Bkj = 0 for some j, j 6¼ t
Pkt = 1
>
> ; Otherwise
Due to the complexity in the mapping of large data >
> n  ðb 1 1Þ
>
> Bkt
sets, a clustering algorithm, namely sparse FCM,29 is >
: + Bjt
j=1
used for clustering the available data. Here, the cluster-
(14)
ing is done in the final cluster nodes of spark and is ini-
tiated by the sparse-FCM method. This approach can where card ðAÞ denotes the cardinality of set A.
enhance scalability by reducing problems. Here, the The distance measure in the standard sparse FCM
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 9

is represented using the Euclidean distance between Algorithm 4: Sparse-Fuzzy C-Means Algorithm (Continued)
the data and the cluster centroid and is given as i) Initialize x as, xe1 = xe2 = . . . = xeb = p1ffiffib
follows: ii) Update partition matrix Pkt
iii) Update cluster centers I:
iv) F ix the value of I1 , I2 ,    , IE and calculate Gl.
n q
+ jxl  xel j
Bkt = + xl ðVkl  Vtl Þ2 (15) l=1
q < 10  4
t=1 + jxel j
l=1
v) Repeat steps ii, iii, and iv step until the stopping criteria are gratified.
where n denotes the number of clusters, weight of j q
+ jxl  xel j
objects, l=1
q < 10  4
+ jxel j
l=1
The result from the sparse FCM is the clustered data, which is of size u · v:
Step 3: Update cluster center O. Let x and < be fixed q
f
and eðOÞ is minimized if max + xl : Gl such that kxk22  1, kxkf  ‘ and obtain x .
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

x
l=1

8
> 0 ; if xl = 0
>
>
>
>
< n b Results and Discussion
+ P :V
Otl = i = 1 kt kl (16) This section illustrates the results produced by the pro-
> n
> ; if xl 6¼ 0
>
> posed method for big data clustering, and the effective-
>
: + Pkt b

k=1 ness of the proposed method is evaluated with the


performance analysis by varying the population size
where t = 1, . . . , n and l = 1, . . . , q, b refer to the weight and feature vector size. Moreover, comparative analysis
component, which is responsible for controlling the de- is performed by comparing the proposed method with
gree of membership sharing between the fuzzy clusters. different existing methodologies.
The contribution of lth feature to the objective function
is denoted as xl and the dissimilarity measure is indi- Experimental setup
cated as <. The experimentation of the proposed method is per-
formed in MATLAB that runs in the PC with Windows
Step 4: Determine the class. The class value is deter- 8-based 64-bit operating system, 8 GB RAM, and 1.60
mined using the fixed clusters fo1 , o2 , . . . , oi , . . . , on g. GHz processor.
The class Gl is computed based on the following
objective: Data set description
q
The experimentation is performed using a standard
f
max + xl : Gl such thatkxk22  1, kxkf  ‘ and obtain x data set, named Global Terrorism Database (GTD)30
x
l=1 taken from the Kaggle repository. The GTD database
(17) is an open-source database that contains information
regarding terrorist attacks from the year 1970 to
where ‘ is the tuning parameter and ð0  f  1Þ;
q 2015. The GTD contains systematic data on domestic
f
kxkf = + jxl jf . and incidents of international terrorism that had
l=1 occurred during the above periods and include
>150,000 cases. The database is handled by the re-
Step 5: Terminate. The iteration is repeated until the searchers of the National Consortium for the Study of
stopping criterion is attained. Terrorism and Responses to Terrorism. The headquar-
The algorithmic steps of the sparse-FCM algorithm ters is located at the University of Maryland. The data
are depicted in Algorithm 4. used in the database consist of variables that are based
on location, tactics, perpetrators, targets, and outcomes.
Algorithm 4. Sparse-Fuzzy C-Means Algorithm

E; Number of clusters
Competing methods
Ci, j ; Data matrix The proposed method of big data clustering is com-
Procedure sparse FCM (E, Ci, j ) pared with the existing methods, such as SWPCM,18
// select first centroid
// Clusters I1 , I2 ,    , IE and xe ABC,19 MFO,27 and KMHMR,21 to prove the effective-
(continued) ness of the proposed method. Thus, the existing
10 RAVURI AND VASUNDRA

methods are compared with the proposed MFO-Bat Analysis based on population size. Figure 2 illustrates
algorithm based on performance metrics. the performance analysis of proposed MFO-Bat with
varying population sizes ranging from 10 to 50. The
Performance metric analysis performed with respect to the classification
The analysis of the existing methods with respect to the accuracy is depicted in Figure 2a. When the total
proposed method is done in terms of the Jaccard coef- slaves are 2, then the classification accuracies com-
ficient, Dice coefficient, and classification accuracy. puted by the proposed MFO-Bat with population
sizes 10, 20, 30, 40, and 50 are 57.202%, 67.766%,
i. Jaccard coefficient: The Jaccard coefficient is to 70.775%, 71.153%, and 85.905%, respectively. Simi-
measure the similarity between two different sets larly, for 10 slaves, the classification accuracies mea-
of data, and the value of the Jaccard coefficient sured by the proposed MFO-Bat with population
ranges from 0% to 100%. The data are said to be
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

sizes 10, 20, 30, 40, and 50 are 68.443%, 77.444%,


more similar if the percentage of Jaccard coeffi- 88.420%, 93.843%, and 97.694%, respectively. The
cient is high and formulated as follows: analysis based on the Dice coefficient with varying
population sizes is depicted in Figure 2b. When the
Jc = (Ss )=(Ss þ Sd þ Ds ) (18) total slaves are 2, the corresponding Dice coefficient
values measured by the proposed MFO-Bat with pop-
where Ss , Ss , Sd and Ds represent the total number of ulation sizes 10, 20, 30, 40, and 50 are 71.620%,
possible pairs of data points, Ss denotes both the data 73.317%, 83.623%, 93.649%, and 94.372%, respec-
points that belong to the same cluster and same tively. Likewise, for 10 slaves, the corresponding
group, Sd specifies both the data points that belong to Dice coefficient values measured by the proposed
the same cluster, but different groups, and Ds indicates MFO-Bat with population sizes 10, 20, 30, 40, and
both the data points that belong to different clusters, 50 are 93.156%, 94.244%, 96.368%, 99.620%, and
but the same group. 99.629%, respectively. The analysis in terms of the
ii. Dice coefficient: The Dice coefficient is a measure Jaccard coefficient with varying population sizes is
that is utilized to weigh the similarity between depicted in Figure 2c. When the total slaves are 2,
two data samples. The Dice coefficient is formu- the corresponding values of the Jaccard coefficient
lated as follows: computed by the proposed MFO-Bat with population
sizes 10, 20, 30, 40, and 50 are 55.788%, 58.772%,
71.856%, 88.056%, and 89.345%, respectively. Simi-
2jA \ Bj larly, for 10 slaves, the corresponding values of the
Dc = (19)
jAj þ jBj Jaccard coefficient computed by the proposed MFO-
Bat with population sizes 10, 20, 30, 40, and 50 are
where A and B denote the cardinalities of the two sets.
87.189%, 89.115%, 92.991%, 99.243%, and 99.261%,
iii. Classification accuracy: The classification accu- respectively. From the above analysis, it is noted
racy is defined as the percentage of correct pre- that the performance of the proposed MFO-Bat in-
dictions, which is formulated as follows: creases with the increase in population size.

Nc Analysis based on feature size. The performance


Ca = (20) analysis of the proposed MFO-Bat with varying feature
T
sizes ranging from 8 to 16 is depicted in Figure 3. The
where Nc represents the number of correct predictions, analysis carried out in terms of classification accuracy
and T indicates the total number of predictions. is depicted in Figure 3a. When the total slaves are 2,
then the classification accuracies computed by the pro-
Performance analysis posed MFO-Bat with feature sizes 8, 10, 12, 14, and 16
This section presents the performance analysis of the are 49.594%, 59.950%, 66.136%, 66.289%, and 66.384%,
proposed MFO-Bat by varying the population size respectively. Similarly, for 10 slaves, the classification
and feature size. The analysis is performed on the accuracies measured by the proposed MFO-Bat with
basis of performance metrics, namely accuracy, Jaccard feature sizes 8, 10, 12, 14, and 16 are 86.521%,
coefficient, and Dice coefficient. 86.554%, 86.582%, 86.599%, and 86.611%, respectively.
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 11
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

FIG. 2. Performance analysis of proposed MFO-Bat in terms of population size. (a) Classification accuracy.
(b) Dice coefficient. (c) Jaccard coefficient.

The analysis in terms of Dice coefficient with varying 99.519%, and 99.840%, respectively. The analysis on
feature sizes is depicted in Figure 3b. When the total the basis of the Jaccard coefficient with varying feature
slaves are 2, the corresponding Dice coefficient values sizes is depicted in Figure 3c. When the total slaves are
measured by the proposed MFO-Bat with feature 2, the corresponding values of Jaccard coefficient com-
sizes 8, 10, 12, 14, and 16 are 24.638%, 74.083%, puted by the proposed MFO-Bat with feature sizes 8,
87.144%, 90.724%, and 90.752%, respectively. Likewise, 10, 12, 14, and 16 are 17.849%, 58.835%, 77.218%,
for 10 slaves, the corresponding Dice coefficient values 83.024%, and 83.070%, respectively. Similarly, for 10
measured by proposed MFO-Bat with feature sizes 8, slaves, the corresponding values of the Jaccard coeffi-
10, 12, 14, and 16 are 84.818%, 99.429%, 99.476%, cient computed by the proposed MFO-Bat with feature
12 RAVURI AND VASUNDRA
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

FIG. 3. Performance analysis of proposed MFO-Bat in terms of feature size. (a) Classification accuracy. (b) Dice
coefficient. (c) Jaccard coefficient.

sizes 8, 10, 12, 14, and 16 are 74.263%, 98.865%, Figure 4 depicts the comparative analysis of existing
98.958%, 99.043%, and 99.682%, respectively. From SWPCM, ABC, MFO, and KMHMR, and proposed
the above analysis, it is noted that the performance of MFO-Bat with respect to classification accuracy, Dice
the proposed MFO-Bat increases with the increase in coefficient, and Jaccard coefficient. The analysis of
feature size. existing and proposed methods in terms of classifica-
tion accuracy is depicted in Figure 4a. When the total
Comparative analysis slaves are 2, then the corresponding classification accu-
This section presents the comparative analysis of the racies measured by the existing SWPCM, ABC, MFO,
proposed MFO-Bat with respect to the existing meth- and KMHMR are 66.573%, 67.053%, 67.055%, and
odologies on the basis of performance metrics, namely 69.957%, whereas the proposed MFO-Bat acquired
accuracy, Jaccard coefficient, and Dice coefficient. the classification accuracy of 84.855%. Similarly, for
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 13
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

FIG. 4. Comparative analysis. (a) Classification accuracy. (b) Dice coefficient. (c) Jaccard coefficient. ABC,
artificial bee colony; KMHMR, K-Means Hadoop MapReduce; SWPCM, secure weighted possibilistic C-means
algorithm.

10 slaves, the corresponding classification accuracy Dice coefficient values computed by the existing
values computed by the existing SWPCM, ABC, SWPCM, ABC, MFO, and KMHMR and the proposed
MFO, and KMHMR and the proposed MFO-Bat are MFO-Bat are 70.456%, 71.951%, 71.955%, 72.647%,
67.055%, 67.084%, 89.491%, 92.740%, and 95.806%, re- and 85.462%, respectively. Likewise, for 10 slaves, the
spectively. From the above data, the proposed MFO- corresponding Dice coefficient values computed by
Bat shows maximum classification accuracy compared the existing SWPCM, ABC, MFO, and KMHMR and
with the existing methods. The analysis of the existing the proposed MFO-Bat are 71.951%, 71.960%, 90.103%,
SWPCM, ABC, MFO, and KMHMR and the proposed 93.282%, and 99.181%, respectively. The analysis based
MFO-Bat based on the Dice coefficient is depicted in on the Jaccard coefficient is depicted in Figure 4c.
Figure 4b. When total slaves are 2, the corresponding When the total slaves are 2, the corresponding Jaccard
14 RAVURI AND VASUNDRA

Table 1. Comparative discussion fed to the final cluster nodes of spark, which uses the
Proposed sparse-FCM method for the clustering process. The
Methods SWPCM ABC MFO KMHMR MFO-Bat optimal clustering is performed at the final cluster
Classification accuracy 67.055 67.084 89.491 92.740 95.806 nodes of spark to obtain optimal clusters of different
(%) data. The experimentation of the proposed MFO-Bat
Dice coefficient (%) 71.951 71.960 90.103 93.282 99.181
Jaccard coefficient (%) 60.501 60.528 82.719 87.509 98.376
is performed, and it confirms that the proposed
method outperforms the existing methods with a max-
ABC, artificial bee colony; MFO-Bat, moth-flame optimization-based imal classification accuracy of 95.806%, maximal Dice
bat; KMHMR, K-Means Hadoop MapReduce; SWPCM, secure weighted
possibilistic C-means algorithm. coefficient of 99.181%, and maximum Jaccard coeffi-
cient of 98.376% respectively. The proposed method
handles big data with a large sample size and offers
coefficient values computed by the existing SWPCM,
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

good clustering performance. However, it is limited


ABC, MFO, and KMHMR, and the proposed MFO- to the FCM algorithm; it will be extended to prototype-
Bat are 56.851%, 59.267%, 60.384%, 60.501%, and based clustering methods. Also, the evaluation will be
74.870%, respectively. Similarly, when the total slaves performed using various criteria, such as the silhouette
are 10, the corresponding Jaccard coefficient values coefficient. Moreover, to extend the application sce-
measured by the existing SWPCM, ABC, MFO, and narios of the proposed method, high-dimensionality
KMHMR, and the proposed MFO-Bat are 60.501%, features will be considered in the future.
60.528%, 82.719%, 87.509%, and 98.376%, respectively.
Author Disclosure Statement
Comparative discussion No competing financial interests exist.
Table 1 shows the comparative discussion of the meth-
ods based on the performance metrics, classification Funding Information
accuracy, Dice coefficient, and Jaccard coefficient.
The classification accuracies of the methods SWPCM, References
ABC, MFO, and KMHMR and the proposed MFO-Bat 1. Fan T. Research and implementation of user clustering based on
MapReduce in multimedia Big Data. Multimed Tools Appl. 2018;77:
are 67.055%, 67.084%, 89.491%, 92.740%, and 95.806%, 10017–10031.
respectively, at the end of the iteration. The Dice co- 2. Diamantini C, Potena D, Storti E. Multidimensional query reformulation
with measure decomposition. Inf Syst. 2018;78:23–39.
efficient of the methods SWPCM, ABC, MFO, and 3. Kulkarni YR, Senthil Murugan T. Hybrid weed-particle swarm optimization
KMHMR and the proposed MFO-Bat is 71.951%, algorithm and C mixture for data publishing. Multimed Res. 2019;2:
71.960%, 90.103%, 93.282%, and 99.181%, respec- 33–42.
4. Rao BT, Sridevi NV, Reddy VK, Reddy LSS. Performance Issues of Hetero-
tively, at the end of the iteration. The Jaccard coeffi- geneous Hadoop Clusters in Cloud Computing. Global J Comput Sci
cients of the methods SWPCM, ABC, MFO, and Tech 2012;11:6.
5. Gu Z, Saberi M, Sarvi M, Liu Z. A Big Data approach for clustering and
KMHMR and the proposed MFO-Bat are 60.501%, calibration of link fundamental diagrams for large-scale network sim-
60.528%, 82.719%, 87.509%, and 98.376%, respectively. ulation applications. Transp Res Part C EmergTechnol. 2017;94:151–171.
6. Ward JS, Barker A. Undefined by data: A survey of Big Data definitions.
2013.
Conclusion 7. Liu G, Yang J, Hao Y, Zhang Y. Big Data-informed energy efficiency as-
sessment of China industry sectors based on K-means clustering.
This study proposes an enhanced clustering method for J Clean Prod. 2018;183:304–314.
clustering the huge data sets. Here, big data clustering 8. Dash M, Liu H. Feature selection for clustering. 2000, pp. 110–121.
9. Tsapanos N, Tefas A, Nikolaidis N, et al. Fast kernel matrix computation for
is performed using the spark architecture such that Big Data clustering. Procedia Comput Sci. 2015;51: 2445–2452.
the data from the distributed sources are handled in a 10. Cannuccia E, Ta Phuoc V, Brière B, et al. Combined first-principles calcu-
lations and experimental study of the phonon modes in the multiferroic
parallel manner simultaneously. The big data are ana- compound GeV4S8. J Phys Chem C. 2017;121:3522–3529.
lyzed by the spark architecture to yield the clustering 11. Gangurde HD. Feature selection using clustering approach for Big Data.
results, and the processing is of two steps, namely 2014, pp. 1–3.
12. Rı́o S, et al. Evolutionary feature selection for Big Data classification: A
feature selection and clustering. In feature selection, MapReduce approach. 2015;2015:11.
the selection of optimal features from the big data is 13. Wang L. Heterogeneous data and Big Data analytics. Autom Control Inf
Sci. 2017;3:8–15.
made, and the feature vector is formed using the 14. Liu Y, Wang Q, Chen HQ. Research on IT architecture of heterogeneous
newly designed proposed MFO-Bat algorithm. The pro- Big Data. J Appl Sci Eng 2015;18:135–142.
15. Son LH, Tien ND. Tune up fuzzy C-means for Big Data: Some novel hybrid
posed MFO-Bat is designed by integrating the MFO clustering algorithms based on initial selection and incremental clus-
and Bat algorithms. The obtained selected features are tering. Int J Fuzzy Syst. 2017;19:1585–1602.
MAP-REDUCE FRAMEWORK FOR BIG DATA CLUSTERING 15

16. Sassi Hidri M, Zoghlami MA, Ben Ayed R. Speeding up the large-scale con- 28. Yang X-S. Bat algorithm for multi-objective optimization. Int J Bioinspired
sensus fuzzy clustering for handling Big Data. Fuzzy Sets Syst. 2017;1:1–25. Computation 2012;3:267–274.
17. Wang Q, Guo S, Hu J, Yang Y. Spectral partitioning and fuzzy C-means 29. Chang X, Wang Q, Liu Y, Wang Y. Sparse regularization in fuzzy c-means
based clustering algorithm for big data wireless sensor networks. for high-dimensional data clustering. IEEE Trans Cybernet 2017;47:
J Wireless Com Network 2018;54:11. 2616–2627.
18. Zhang Q, Yang LT, Castiglione A, et al. Secure weighted possibilistic c-means 30. Global Terrorism Database. Available online at https://www.kaggle.com/
algorithm on cloud for clustering big data. Inf Sci. 2019;479:515–525. bstaff/global-terrorism-database (last accessed February 2019).
19. Ilango SS, Vimal S, Kaliappan M, Subbulakshmi P. Optimization using ar-
tificial bee colony based clustering approach for big data. Cluster
Comput. 2018;22:12169–12177.
20. Bijari K, Zare H, Veisi H, Bobarshad H. Memory-enriched big bang–big Cite this article as: Ravuri V, Vasundra S (2020) Moth-flame
crunch optimization algorithm for data clustering. Neural Comput Appl. optimization-bat optimization: map-reduce framework for big data
2018;29:111–121. clustering using the moth-flame bat optimization and sparse fuzzy C
21. Sreedhar C, Kasiviswanath N, Chenna Reddy P. Clustering large datasets means. Big Data 3:X, 1–15, DOI: 10.1089/big.2019.0125.
using K-means modified inter and intra clustering (KM-I2C) in Hadoop.
J Big Data. 2017;4.
Downloaded by UPPSALA UNIVERSITETSBIBLIOTEK from www.liebertpub.com at 05/31/20. For personal use only.

22. Chormunge S, Jena S. Correlation based feature selection with clustering


for high dimensional data. J Electr Syst Inf Technol. 2018;5:542–549.
23. Kushwaha N, Pant M. Link based BPSO for feature selection in big data Abbreviations Used
text clustering. Futur Gener Comput Syst. 2018;82:190–199. ABC ¼ artificial bee colony
24. Shukla AK, Muhuri PK. Big-data clustering with interval type-2 fuzzy un- BGV ¼ background verification
certainty modeling in gene expression datasets. Eng Appl Artif Intel. FCM ¼ fuzzy C-means
2019;77:268–282. GTD ¼ Global Terrorism Database
25. Heidari S, Alborzi M, Radfar R, et al. Big Data clustering with varied density IT2 FSs ¼ interval type-2 fuzzy sets
based on MapReduce. J Big Data. 2019;6:77. KMHMR ¼ K-Means Hadoop MapReduce
26. Kusuma I, Ma’sum MA, Habibie N, et al. Design of intelligent k-means MFO ¼ moth-flame optimization
based on spark for big data clustering. In Proceedings of International MFO-Bat ¼ moth-flame optimization-based bat
Workshop on Big Data and Information Security, 2016, pp. 89–96. PCM ¼ possibilistic C-means
27. Mirjalili S. Moth-flame optimization algorithm: A novel nature-inspired SWPCM ¼ secure weighted possibilistic C-means algorithm
heuristic paradigm. Knowledge Based Syst. 2015;89:228–249.

You might also like