0% found this document useful (0 votes)

17 views7 pages

đầu

The document discusses using a Susceptible-Infected (SI) epidemic model for maximizing influence spread in social networks. It proposes a two-level approach using the SI model and implementing the algorithm with multithreading to improve performance in terms of influence spread per second.

Uploaded by

Huyen Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views7 pages

đầu

Uploaded by

Huyen Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Applied Computing and Informatics 15 (2019) 102–108

Contents lists available at ScienceDirect

Applied Computing and Informatics

journal homepage: www.sciencedirect.com

Original Article

A SI model for social media influencer maximization

Jyoti Sunil More a,⇑,1, Chelpa Lingam b,1
a
Department of Computer Engineering, Ramrao Adik Institute of Technology, Nerul, Navi Mumbai, India
b
Pillai’s HOC College of Engineering and Technology, Rasayani, India

a r t i c l e i n f o a b s t r a c t

Article history: Social network mining can be divided into two categories, namely, the study of structural characteristics
Received 11 May 2017 and content analysis. One of the most significant problem in the context of a social network is finding the
Revised 11 November 2017 most influential entities within the network. This task has significance in viral marketing, since the most
Accepted 14 November 2017
influential entities can be targeted for endorsing new products in the market. However, the problem of
Available online 15 November 2017
discovering the most persuasive node in a social network has proved to be NP-hard and also the exact
algorithms cannot be designed. This creates a wide scope for developing approximation methods and
Keywords:
algorithms that are able to produce solutions with proven approximation guarantees. Greedy algorithm
Influencers
Social network analysis
serves as a base for most of the existing algorithms designed for dealing with these problems. Greedy
Diffusion model algorithm can achieve a good approximation, but it is found to be computationally expensive.
SI model Therefore, in this paper we propose a two level approach, designed based on Suspected-Infected (SI) epi-
Multithreading demic model for maximizing the influence spread. We further propose that, multithreading approach for
Marketing strategies implementation of algorithm for the proposed SI model aids to further elevate the performance of pro-
posed algorithm in terms of influence spread per second.
Ó 2017 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction operate for the companies indirectly for finding the potential cus-
tomers. This is an indirect form of marketing also called influencer
Viral marketing has been acknowledged as an effective market- marketing.
ing strategy. Eventually a large number of people get connected Social media influencers are the entities in the social network,
through social networks, such as Facebook, Flickr, and Twitter. who help potential customers make a buying decision by influenc-
The impact of social network on their lives has increased signifi- ing his opinion, through social networking. An influencer can be
cantly. The social influence acts as a motivating force, governing any person who reviews product, posts a blog about a new product,
the diffusion of the information in the network. Although there any industry expert or any person who has a potential to influence
are millions of users on social platforms, the activities of a selected people. The problem of influencer identification can be presented
number of users are acknowledged and spread through the net- as, given a group of individuals which are to be motivated to adopt
work. These dominant users generate trends and play a significant a new product or information, find the optimum target subset of
role to shape or manipulate opinions in social networks. These individuals (seed set), which can further influence the nodes. The
opinions are crucial in areas such as marketing or opinion mining. ultimate goal is to maximize the spread the information to a large
Many companies have started targeting the key individuals called population.
influencers, who are in contextual alignment with their brand and Recently, there are large advances in the social networks field. It
has focused on the study of relationships that includes quantitative
measures of social networks like influence, authority, centrality,
⇑ Corresponding author.
modularity, connectedness, etc. [1]. Influence maximization can
E-mail addresses: [email protected] (J.S. More), [email protected]
(C. Lingam). be defined as the problem of forming an objective function for
1
Affiliated to Mumbai University. selecting appropriate target nodes in a social network such that
Peer review under responsibility of King Saud University. it maximizes the influence spread. These target nodes in turn will
propagate the influence to their connected nodes. This will be help-
ful to design marketing strategies or diffuse a new idea in a net-
work related work in influencer detection in social networks.
Production and hosting by Elsevier

https://doi.org/10.1016/j.aci.2017.11.001
2210-8327/Ó 2017 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
J.S. More, C. Lingam / Applied Computing and Informatics 15 (2019) 102–108 103

The major issue of concern here is: how to improve the diffu- The bond percolation theory and epidemic models are studied
sion for the given seed selection? Greedy algorithm proposed by and utilization of the SI epidemic model [24] for modelling the dif-
Kemp et al. [2], is a significant algorithm for selecting the seed fusion in social networks is proposed. There is majority of work
nodes. This is a NP-Hard problem. Greedy algorithms require opti- based on bond percolation and on SIR model. The SI model is pre-
mal local choices. The solutions provided by greedy are subopti- ferred since SI model is a progressive model and hence can be bet-
mal. Hence there is a need to find a solution to get a better ter exploited in the influence maximization problem.
algorithm which provides maximum influence spread. The prob-
lem of finding the minimal set of activated nodes to spread infor-
mation to the whole network or to optimally immunize a
network against epidemics can be exactly mapped onto optimal 3. Problem discussion
percolation. The most influential nodes are the ones forming the
minimal set that guarantees a global connection of the network. Different models and frameworks have been defined by differ-
At a general level, the optimal influence problem can be stated as ent researchers to obtain an optimal solution for the above stated
follows: find the minimal set of nodes which, if removed, would problem. Some of the approaches are discussed in this section.
break down the network into many disconnected pieces. The nat- Social network can be interpreted as a directed graph G = (V, E)
ural measure of influence is, therefore, the size of the largest where V denotes the nodes in the graph, which represent the users
(giant) connected component as the influencers are removed from in the social network and E denotes the edges, that represent the
the network. Hence an epidemic model, Susceptible-Infected (SI) is relationship between the users. In this context the relationship
used. It is best fitted to deal with the progressive nature of the would be that of the influencer and influenced node i.e. who
model in context. influences whom. The influence maximization problem deals with
optimally selecting the seed set of users such that they contribute
to maximize the expected spread of influence or diffusion in the
given social network, in the context of a given propagation model.
2. Related work Let St # V be defined as the active set, containing the active
nodes at given time t. The active nodes which participate in
Kemp et al. [2] proposed a greedy approach for finding K influ- spreading the influence to the next level of influence will be ter-
ential nodes out of all existing nodes. They provided for the first med as seed nodes. Let S0 be the seed set, containing the seed
time an approximation guaranteed solution for the greedy algo- nodes. In other words, the seed nodes in this set are called the
rithm. They proposed an analysis framework for finding the seed seeds of influence diffusion. These seed nodes are the initial nodes
nodes. This framework is based on submodular functions. The at the root level, which are selected to propagate the influence
framework also showed that a feasible solution can be obtained throughout the network. For example, as a marketing strategy,
using a greedy strategy. They also proposed triggering model and the initial users selected by the promotional campaign of a new
showed that their proposed approximation algorithm worked bet- product, designed as marketing strategy.
ter as compared to other known node selection strategies in social In progressive diffusion models, the active sets are monotoni-
networks. cally non-decreasing and hence the superset, V is finite, for a finite
The performance of greedy algorithm for influence maximiza- number of steps and the set of active nodes, St remains unchanged.
tion can be improved by exploiting the submodularity, by an Eventually, the active nodes belonging to the active set leads to the
approach called Cost-Effective Lazy Forward selection (CELF) [3]. final active set and is denoted as / (S0), where S0 is the initial seed
Eventually, an improvised algorithm called CELF++ [4] was pro- set.
posed, which exploits the property of submodularity of the spread There exists two classic progressive models, originally proposed
function. in the mathematical sociology, are described as follows:
On the other hand, the social network diffusion was also mod-
elled [5] using various theories like bond percolation, resulting in
the proposal of Susceptible Infected Recovered (SIR) model. Mean- 3.1. Independent cascade model (IC model)
while, Graph evolution parameters, such as densification and
shrinking diameters, were analysed [6] for modelling social net- Independent cascade (or IC) model was the first progressive
works. Based on global social network metrics, such as between- model [25,26]. The key characteristic of this model is that diffusion
ness centrality and closeness centrality, a semi-local centrality events associated with every edge in the given social graph are
measure was proposed to design an effective ranking method. This mutually independent.
design along with the SIR epidemic model was used to evaluate the
performance of the diffusion model by considering the parameters
such as the rate of influence spread and the number of infected
3.2. Linear threshold model (LT model)
nodes i.e. influenced nodes [7,8]. Later a new heuristic and scalable
solution based on maximum influence path was proposed [9].
The linear threshold (or LT) model is a progressive, stochastic
Social networks were analysed for quantifying user influence
information diffusion model, proposed by Granovetter [27].
and they dealt with web semantics to learn about influence in
heterogeneous social networks [10–12]. Social influence was
further exploited for the study of human dynamics and human
behaviour [13–17]. Social networks are evaluated for influence 3.3. Triggering model
maximization [18,19] and a two phase model for information dif-
fusion was employed [20] selecting the seed nodes and further Kempe [2] proposed the triggering model, which is mainly
activating it in multiple levels. Recently, linear-time implementa- based on two basic propagation models, already discussed above,
tion of Collective-Influence (CI) algorithm is used to find the min- namely, the IC and the LT models. In IC and LT propagation models,
imal set of influencers in networks via optimal percolation [21,22]. this influence maximization problem is proved to be NP-hard by
Further, scalable algorithms were proposed for massively large Kempe et al. [2]. He also showed that the maximization function
social networks [23]. rm(S) follows the properties of monotonicity and submodularity.
104 J.S. More, C. Lingam / Applied Computing and Informatics 15 (2019) 102–108

3.4. Submodularity and monotonicity of a function

The propagation models i.e. IC, LT and triggering models satisfy

two important properties in terms of their influence spread func-
tion, r. These properties are submodularity and monotonicity.
Submodularity can be interpreted in this context as diminishing
marginal return i.e. when more nodes are added to the seed set,
there is no great effect on the performance of the model. In this
context, monotonicity can be interpreted as if more elements are
added to a seed set, it cannot reduce the size of the final set con-
taining the active nodes (influenced nodes).

3.5. Greedy algorithm proposed by Kempe et al. [2]

Input: G, k, rm(S)
Output: seed set S, h
1. S /
2. while |S| < k do
3. u argmax w e VS (rm(S + w)rm(S));
4. S S[{u}

The line 3 of the greedy algorithm is most important. Here, it

selects the node that provides the maximum influence spread, in
other words, the largest marginal gain rm (S + w) rm (S) with
respect to the total expected influence spread of the seed set in
context i.e. S. This step helps to ensure the submodularity and
monotonicity.
Fig. 1. SI-based two phase model.
Greedy algorithms require the optimal local choices. Greedy
algorithm works only if locally optimal choices have potential to 4.2. Proposed method
lead to a global optimum and the sub problems are optimal. But
if it fails, then the greedy algorithm performs poorly. The same Fig. 2 represents the proposed method. Here, in phase I, we pro-
thing is observed here. The greedy algorithm only finds local min- pose that the initial population of the nodes will be the candidates
imum edge at every iteration and hence it fails to reach more i.e. susceptible nodes, represented as {S1}. The nodes which get trig-
nodes. There could be a possibility that in a large perception, the gered i.e. the active nodes will now act as seed set and will be cate-
local optima is far weaker than global optima. Hence we tried to gorized as infected nodes {I}. These infected nodes will now serve as
exploit the graphical structure of the graph. the influence carriers. In phase II, the nodes other than the seed
nodes are all susceptible. Once these susceptible nodes are influ-
enced by the seed nodes, they become active nodes. The nodes once
4. Proposed model
active do not become inactive. This is the progressive behaviour as
stated earlier. Hence it fits in the framework of SI model.
This maximization problem can be expressed as a discrete opti-
Let G be a graph, with initial population assumed as susceptible
mization problem. It can be modelled as a graphical model for
for spread. Let {S} denote the set of seed nodes obtained from the
learning tree distribution. A discrete approach aims to choose the
greedy algorithm, {S1} denote the set of susceptible nodes and {I}
optimal set of nodes that constitute an optimal path in a spanning
denote the set of infected nodes, i.e., the nodes responsible to spread
tree, emerging out of the seed node. In other words, finding a span-
the influence. l represents the threshold, considered as eccentricity,
ning tree of social graph G of best fit for the triggering nodes (seed
i.e., the maximum distance (using the spanning tree) from given
nodes), such that when the nodes are traced along the given path
node v to any other node in the graph. It is the diameter of sub graph,
length (also termed as threshold), it provides a subset of the solu-
used to find diffusion for one infected node e {I}. This is how we actu-
tion i.e. subset of final active set.
ally compute the reachability of the nodes. We assume that the
nodes which are reachable are more likely to get infected. Hence,
4.1. SI epidemic model – (Susceptible infected model) it is a significant parameter in the process of influence spread algo-
rithm. The set {S1} represents graphically all the reachable nodes
The SI model [24,28], categorizes the entire population in the which are at a distance l. The algorithm can be stated as follows:
context into two groups, namely, the susceptible individuals who
may get infected by the given disease i.e. who are likely to get 4.2.1. Algorithm
infected and the other group is that of the infected individuals, Algorithm SI_Influence_Spread (G, S, S1, h, I, W)
who get infected by the disease and further may carry or spread
the disease to the next set of individuals i.e. susceptible group. Input: S, l
Once a susceptible entity becomes infected, he or she gets added 1. Initialize set S /
into the infected set, thereby increasing the size of the infected 2. Data preprocessing- Build the social network graph.
set and ultimately decreasing the size of the susceptible set of indi- 3. Assume initial population as susceptible (S) and identify the
viduals. We utilize this characteristic of the epidemic model to seed set from greedy algorithm depending on threshold h
model the influence spread across a social network. Fig. 1 shows 4. Phase I: Identified nodes become infected i.e. {I} {S}
the proposed two phase SI model for the influence maximization 5. Phase II: For Each node v e {I}, find the set {S1}vi # {S1} such
problem. that vi e {S1} only if vi is at distance l
J.S. More, C. Lingam / Applied Computing and Informatics 15 (2019) 102–108 105

START

Identify initial susceptible

population and threshold

by Greedy algorithm

Yes Is {I}
Empty?

For v
nodes set; Initialize {S1} =

No Is vi
reachability = ?

Yes

Exclude vi (not
susceptible), cannot Compute spread
help in further spread for node vi

Check next node of {I} Spread for v is added in Discard v from set {I}
set {S1}= {S1} U {v} return set {S1}

Total influence spread

STOP

Fig. 2. Flowchart for the proposed SI model.

P
6. Total_influence_spread, W(S1) = {S1} vi Theorem 1. In the SI model, the number of susceptible individuals
7. return W(S1) decreases monotonically, that is Sn+1 Sn, for all n. We also have that
8. end For the number of infected individuals increases monotonically, i.e., In+1
In, for all n [28].
The shortest path is traversed using spanning tree which makes
sure that the vertex with maximum influence spread is passed to
the next iteration. This leads to an incremental influence spread. Proof. The input for the algorithm is a set of infected nodes {I}
The nodes returned by the above algorithm represents the set of which are derived from the population by using the greedy algo-
nodes influenced by the source node. It is observed that the spread rithm. It is clear that the target seed set obtained from the greedy
is wider than the greedy algorithm. algorithm is submodular as well as monotonous [2].
106 J.S. More, C. Lingam / Applied Computing and Informatics 15 (2019) 102–108

Table 1
Features of the datasets.

Datasets CA-AstroPh Cit-HepTh Cit-HepPh Soc-Eopinions

Nodes 18772 27770 34546 75879
Edges 198110 352807 421578 508837
Average clustering coefficient 0.6306 0.3120 0.2848 0.1378
Diameter 14 13 12 14

Fig. 4. Sample hybrid algorithm for l = 6 (2650) for seed node 204089 for dataset
cit-HepTh (visualization using R).
Fig. 3. Sample hybrid algorithm for l = 7 (664 nodes) for seed node 204089 for
dataset cit-HepTh (visualization using R).

Hence, if we are using incremental approach to find the influ- 5. Dataset description
ence spread, then the monotonicity is reserved in this approach.
Further, experimentally, we have proved that the number of nodes We consider three datasets, available on Stanford Large Net-
influenced by a seed set at earlier step n is more than the next step, work Dataset Collection (SNAP), published by Stanford University
i.e., n + 1. (Available at: http://snap.stanford.edu/data/com). Table 1 repre-
sents the features of datasets.
Theorem 2. The number of susceptible individuals is never negative,
Sn 0, and the number of infected individuals is never more than the 6. Working of algorithm
total population size, In N [28].
The algorithm will work as follows:
Step 1:
Proof. In our approach, the initial population, i.e., a set of infected
nodes is never empty. Hence, even for minimum one seed node, it
Input: Set of Susceptable nodes
is not possible to have susceptible individuals < 0. As we follow
After Processing using greedy Algorithm: Susceptible nodes ?
graphical model, we know that at least two nodes and one edge
Infected nodes
will be required. Hence, if one node is selected as the seed node,
P ? {v1, v2, v3, v4, v5, v6, v7} ? seed sets for influence spread
i.e., infected node, then the other is susceptible (as it is reachable
(assuming threshold as 7)
from infected node). Therefore, the set of susceptible individuals
can never be negative. In the worst case, where all the nodes are
Step 2: Input: Set of Susceptible nodes (Those found as infected
infected, the susceptible node can be 0 but not negative.
in previous step) i.e. {v1, v2, v3, v4, v5, v6, v7}
On the other hand, the seed nodes, i.e., infected node set {I} can
contain the entire population in the best case. The greedy algo- After modelling: Infected nodes
rithm terminates when the seed node set contains all the nodes
of the entire population N. In this case, the seed set becomes uni- {v1}- {v11, v12. . ., v1n} assuming the threshold as l
versal set. Hence, even if any infected node gets added later, it will e.g. for dataset citHepTh, the infected nodes for a
be a subset of universal set and according to set theory, |U| = N, i.e., seed node 204089, for (l = 7), it influences 664 nodes (shown in
the total number of nodes. Hence, the number of infected nodes Fig. 3).
can be a maximum of N. Figs. 3 and 4 illustrates the visualization of the influence spread
The computational time can be reduced substantially. Once we for a node. The central hub (node) is the seed node, selected during
get the target nodes using the greedy algorithm, we can simultane- phase I whereas other nodes are the infected nodes during phase II.
ously execute the algorithm on all the seeds. This is the reason why The influence is spread by the set of seed nodes. The set of all active
the computational time gets reduced. For achieving further nodes at this phase represents the final active set. It clearly shows
improvement in time efficiency, we propose to use the the effect of threshold on the total spread. As the threshold l,
Multithreading approach. increases, the total spread shrinks.
J.S. More, C. Lingam / Applied Computing and Informatics 15 (2019) 102–108 107

shortest paths, the given nodes influenced (which is computed

by cascaded operation) will be restricted by specified threshold
(which specifies the path length). The SI model is based on the
incremental approach where the spread is cumulative. It exploits
the graph properties. The spanning tree enables the SI approach
to find the best possible longest path which helps in increasing
the influence spread. The greedy approach is restricted to the
local search. CELF++ is based on the submodularity imposed on
greedy algorithm and CI is based on the adaptive bottom up
approach utilizing the finite radius of sphere of social networks.
The performances of influence spread with different values of
thresholds h (l = 6 for SI model) are illustrated in Fig. 5a. Linear
(SI) represents the trend of the performance of algorithm as the
Fig. 5a. Influence spread for CA-AstroPh Dataset.
threshold varies. The performance elevates linearly with increase
in the threshold.
Fig. 5b show the influence spread per second for different
algorithms for the dataset CA-AstroPh Dataset. It is clear that the
influence spread is elevated to a large extent by using SI model
as compared to greedy algorithm. The influence spread per second
shows the significant outperformance when a multithreaded
approach is used. CI gives best influence spread per second but
the total influence spread is best achieved by SI model as shown
in following Fig. 5c. In SI based algorithm, spanning tree data struc-
ture is used where in, while pre-processing, the reachability of the
nodes is checked for the seed nodes which are shortlisted by the
greedy algorithm. This enables us to obtain the infected nodes.
However, the time complexity of the CI-algorithm is better than
the proposed algorithm because of use of max-heap data structure
for storing and processing the CI values. The finite radius ‘ of the CI
Fig. 5b. Influence spread per second for CA-AstroPh Dataset.
sphere, allows to process the CI values in a max-heap data struc-
ture [22,23]. The basic idea is that, after each node removal, there
is a need to recompute CI just for a O(1) number of nodes, and find
the new largest value. It follows bottom up approach. Whereas, in
SI based approach, the computations are based on computations of
the longest possible path among all the shortest paths, which is an
incremental approach. Here, the execution time of CI is better than
SI model, but still fails to achieve the influence spread as good as SI
model. We further propose to use multithreaded approach to
obtain a better performance of SI model in terms of execution time
to some extent.
The comparison of performance gain for influence spread of dif-
ferent algorithms for dataset CA-AstroPh are as depicted in Fig. 5c.
Table 2 shows the comparison in terms of performance i.e., influ-
ence spread of proposed SI model with other models for different
Fig. 5c. Influence spread gain chart for CA-AstroPh Dataset.
datasets. Table 3 depicts the performance gain (Influence spread
per second) by using multithreaded approach for SI model. In
7. Results discussion Table 3, the performance of multithreaded approach for SI model
is compared with the basic SI model. It is proved that by multi-
Initially, the seed set is computed using the greedy algorithm. threading though doesn’t provide the best influence spread per
To find the influence spread, the next cascade is found by con- second, still manages to give a significant boost to the speed of
sidering that an edge connecting two nodes mean that one is the influence spread. This performance improvement is reported
influencing the other. Further, by considering the longest among in Table 3.

Table 2
Performance comparison (Influence Spread) of SI model with other models (l = 6).

Datasets? Cit-HepTh CA-AstroPh Cit-HepPh Soc-Eopinions

Seed set size %gain w.r.t. greedy %gain w.r.t. CI %gain w.r.t. greedy %gain w.r.t. CI %gain w.r.t. greedy %gain w.r.t. CI %gain w.r.t. greedy %gain w.r.t. CI
7 685% 552% 189% 102% 444% 405% 56% 20%
10 966% 757% 275% 135% 553% 499% 107% 48%
15 1246% 922% 405% 178% 698% 581% 172% 79%
20 1716% 1222% 476% 193% 859% 686% 285% 143%
25 1880% 1283% 595% 233% 963% 735% 355% 179%
30 2257% 1489% 692% 262% 1039% 764% 415% 200%
108 J.S. More, C. Lingam / Applied Computing and Informatics 15 (2019) 102–108

Table 3 [3] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne
Performance gain (Influence spread per second) by using multithreaded approach for VanBriesen and Natalie Glance, Cost-effective Outbreak Detection in Networks,
SI model. in: Proceedings of the 13th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD, 2007, pp. 420–429.
Datasets? CitHepTh CA-AstroPh Cit-HepPh Soc-Eopinions [4] Amit Goyal, Wei Lu, Laks V.S. Laxmanan, Celf++: optimizing the greedy
algorithm for influence maximization in social networks, in: ACM Proceeding,
7 123% 150% 71% 129% 2011, pp. 47–48.
10 133% 150% 48% 105% [5] Masahiro Kimura, Kazumi Saito, Ryohei Nakano, Hiroshi Motoda, Extracting
15 127% 134% 48% 114% influential nodes on a social network for information diffusion, Data Min
20 154% 142% 51% 104% Knowl Disc, Springer, 2009, pp. 70–97.
25 141% 147% 84% 116% [6] Jure Leskovec, Jon Kleinberg, Christos Faloutsos, Graph evolution: densification
30 145% 142% 82% 185% and shrinking diameters, ACM Trans. Knowl. Disc. Data 1 (1) (2007).
[7] Duanbing Chen, LInyuan Lu, Ming Sheng Shang, Yi-Cheng Zhang, Tao Zhou,
Identifying influential nodes in complex networks, Phys. A: Stat. Mech. Appl.
(2011) 47–55.
[8] Christine Kiss, Martin Bichler, Identification of influencers-measuring
8. Conclusion and future scope influence in customer networks, Decis. Supp. Syst. (2008) 233–253.
[9] Chi Wang, Wei Chan, Yajun Wang, Scalable influence maximization for
As discussed in the earlier part, greedy algorithms require opti- independent cascade model in large-scale social networks, Data Mining and
Knowledge Discovery, Springer, 2012, pp. 1029–1038.
mal local choices at each stage with the hope of finding a global
[10] Eytan Bakshy, Brian Karrer, Lada Adamic, Social influence and the diffusion of
optimum. If locally optimal choices yield a global optimum and user-created content, in: Proceedings of the 10th ACM Conference on
the sub-problems are optimal, then the algorithm works. If it fails, Electronic commerce, 2009, pp. 325–334.
then the greedy algorithm performs poorly. This has been con- [11] Eytan Bakshy, Jake M. Hofman, Winter A. Mason, Duncan J. Watts, Everyone’s
an influencer: quantifying influence on twitter, in: WSDM Proceedings of
firmed by this study too. The greedy algorithm only finds local Fourth ACM International Conference on Web Search and Data Mining, 2011,
minimum influence spread at every iteration, hence it fails to reach pp. 65–74.
more nodes. It is observed that the influence spread observed in [12] Lei Tang, Huan Liu, Leveraging social media networks for classification, Data
Mining and Knowledge Discovery, Springer, 2011, pp. 447-478.
the greedy algorithm is limited and generally requires more run [13] Lu Liu, Jie Tang, Jiawei Han, Shiqiang Yang, Learning influence from
time. The proposed two phase SI based algorithm performs better heterogeneous social networks, Data Mining and Knowledge Discovery,
than greedy algorithm in terms of time and the overall influence Springer, 2012, pp. 511–544.
[14] Bogart Yail Mrquez, Manuel Castaon-Puga, Juan R. Castro, Eugenio D. Suarez,
spread. Hence, we show that the graphical structure of the social Jos Sergio Magdaleno-Palencia, Fuzzy models applied to complex social
network can be exploited to improve the reachability and hence systems: modeling poverty using distributed agencies, Int. J. New Comput.
improving the influence spread. Arch. Appl. (IJNCAA) 1 (2) (2011) 292–303.
[15] Na Li, Denis Gillet, Identifying influential scholars in academic social media
Here, a novel approach is proposed based on SI epidemic model platforms, in: ASONAM Proceedings IEEE/ACM International Conference on
for influence spread, the longest shortest path concept for reacha- Advances in Social Network Analysis and Mining, 2013, pp. 608–614.
bility and implementation of multithreading for improving the [16] Wei Pan, Wen Dong, Manue Cebrian, Taemie Kim, James H. Fowler, Alex
SandyPentland, Modeling dynamical influence in human interaction, ACM
time efficiency which iteratively improves the greedy cascaded
Trans. Web (ACM TWEB) (2012) 77–86.
model exponentially. The influence spread in this model is maxi- [17] Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, Ploutarchos
mized as compared to the basic greedy model. The efficiency in Spyridonos, Community detection in social media-performance and
terms of speed is an added benefit. In this study, we evaluated application considerations, Data Mining and Knowledge Discovery, Springer,
2012, pp. 515–554.
the algorithm for different seed sizes with different datasets [18] Kundu Suman, C.A. Murthy, S.K. Pal, A new centrality measure for influence
against different approaches proposed earlier. We observed that maximization in social networks, in: 4th International Conference on Pattern
our ultimate aim of maximizing the influence spread is achieved Recognition and Machine Intelligence (PReMI11), Springer-Verlag, 2011, pp.
242–247.
using SI Model, but at the cost of execution time. Hence we used [19] Sankar K. Pal, S.C.A. Murthy, Centrality measures, upper bound, and influence
multithreading to improve the total number of nodes influenced maximization in large scale directed social networks, Fundam. Inform. (2014)
per second, i.e., indirectly decreasing the computational time. 317–342.
[20] Swapnil Dhamal, K.J. Prabuchandran, Y. Narahari, Information diffusion in
This work provided an overview of the influencer identification social networks in two phases, IEEE Trans. Network Sci. Eng. (2016) 197–210.
and the influence maximization. This study concludes that by iden- [21] F. Morone, H. Makse, Influence maximization in complex networks through
tifying the influential users in social media, different business optimal percolation, Nature 524 (2015) 65–68.
[22] Flaviano Morone, Byungjoon Min, Lin Bo, Romain Mari, Hernan A. Makse,
strategies can be planned, e.g., efficient launching and marketing Collective Influence Algorithm to find influencers via optimal percolation in
new products, targeting the potential consumers, etc. It is obvious massively large social media, Sci. Rep. 6 (2016) 30062.
that the influence maximization and social influence mining [23] Sen Pei, Xian Teng, Jeffrey Shaman, Flaviano Morone, Hernan A. Makse,
Efficient Collective Influence maximization in cascading processes with first-
together will form the significant components to enable extensive
order transitions, Sci. Rep. 7 (2017) 45240.
viral marketing through online social networks. [24] Linda J.S. Allen, Some discrete-time SI, SIR, and SIS epidemic models, Math.
Identifying influential users may be proposed through different Biosci. 124 (1) (1994) 83–105.
models, algorithms and statistical techniques. Also parallel prob- [25] Jacob Goldenberg, B. Libai, E. Muller, Talk of the network: a complex systems
look at the underlying process of word-of-mouth, Market. Lett. (2001) 211–
lems like link prediction, social network content analysis, etc. 223.
could be considered as potential problems for social network min- [26] Jacob Goldenberg, B. Libai, E. Muller, Using complex systems analysis to
ing to deal with in future. advance marketing theory development: modeling heterogeneity effects on
new product growth through stochastic cellular automata, Acad. Market. Sci.
Rev. (2001) 1–18.
References [27] M. Granovetter, Threshold models of collective behavior, Am. J. Sociol. (1978)
1420–1443.
[1] Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and [28] Kacie M. Sutton, Discretizing the SI epidemic model, Rose-Hulman
Techniques, Morgan Kaufmann Publishers, 2011. Undergraduate Math. J. 15 (2014) 192–208.
[2] David Kemp, Jon Kleinberg, Eva Tardos, Maximizing the spread of influence
through a social network, in: KDD 03, USA, 2003, pp. 137–146.