0% found this document useful (0 votes)
17 views

SMA Exp 2

1. The document describes an experiment using network analysis on a Facebook dataset to analyze relationships between users based on attributes like age and date of birth. 2. It introduces libraries like Pandas, NetworkX, SciPy, and Matplotlib that are used for tasks like data handling, graph creation, analysis, and visualization. 3. Theoretical concepts discussed include degree centrality, closeness centrality, bridges, clustering coefficients, and their implications for identifying influential users, understanding information flow, detecting network vulnerabilities, and recognizing community formations.

Uploaded by

pameluft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

SMA Exp 2

1. The document describes an experiment using network analysis on a Facebook dataset to analyze relationships between users based on attributes like age and date of birth. 2. It introduces libraries like Pandas, NetworkX, SciPy, and Matplotlib that are used for tasks like data handling, graph creation, analysis, and visualization. 3. Theoretical concepts discussed include degree centrality, closeness centrality, bridges, clustering coefficients, and their implications for identifying influential users, understanding information flow, detecting network vulnerabilities, and recognizing community formations.

Uploaded by

pameluft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

SOCIAL MEDIA ANALYTICS

EXPERIMENT 2

Aim : To represents network analysis using NetworkX. The dataset is assumed to


contain Facebook user attributes, and the code creates a graph based on the
connections between users.

Libraries Used:

Pandas: Pandas is a Python library used for working with data sets.It has functions for
analyzing, cleaning, exploring, and manipulating data.pandas is a fast, powerful, flexible
and easy to use open source data analysis and manipulation tool, built on top of the
Python programming language.

Networkx: In Python, we can create, manipulate and analyze networks or graphs with
the help of the NetworkX library. This library provides functions to work with multigraphs
and directed, undirected and weighted graphs. NetworkX is widely used in
transportation, infrastructure planning, and biological networks.

Scipy : The SciPy is an open-source scientific library of Python that is distributed under
a BSD license. It is used to solve the complex scientific and mathematical problems. It
is built on top of the Numpy extension, which means if we import the SciPy, there is no
need to import Numpy. The Scipy is pronounced as Sigh pi, and it depends on the
Numpy, including the appropriate and fast N-dimension array manipulation.It provides
many user-friendly and effective numerical functions for numerical integration and
optimization.

Matplotlib:Matplotlib is an amazing visualization library in Python for 2D plots of arrays.


Matplotlib is a multi-platform data visualization library built on NumPy arrays and
designed to work with the broader SciPy stack. It was introduced by John Hunter in the
year 2002. One of the greatest benefits of visualization is that it allows us visual access
to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots
like line, bar, scatter, histogram etc.
Here are some theoretical points related to the experiment:

1. Graph Creation:

In network analysis, a graph is a mathematical representation of a set of objects (nodes)


and the relationships (edges) between them. In this code, a graph is created to
represent the Facebook network.

Nodes in this graph represent Facebook users, and the relationships between them are
formed based on their attributes, specifically age and date of birth. When two users
have similar attributes, a connection (edge) is established between them.

This approach allows us to study the structure of the Facebook friendship network
based on user attributes.

2. Node Centrality (Degree Centrality):

Degree centrality is a measure of how well-connected a node is within a network. In this


code, it's calculated for each user in the Facebook network.

Users with a higher degree centrality have more connections, indicating that they are
potentially more influential or well-connected within the network.

Degree centrality is useful for identifying key individuals who may play important roles in
information spread or network dynamics.

3. Visualization:

Visualization is a crucial step in network analysis to understand the structure and


connections within a network.

The code attempts to visualize the Facebook network using nx.draw(), but there appear
to be issues with sorting and visualization.

Proper visualization can help researchers and analysts identify patterns, clusters, and
important nodes within the network.

4. Closeness Centrality:

Closeness centrality measures how close a node is to all other nodes in the network. It
indicates the efficiency of a node in reaching other nodes.
In the context of a Facebook network, users with high closeness centrality can quickly
reach a wide range of other users, potentially making them influential in information flow
or network dynamics.

5. Bridges:

Bridges are edges in a network whose removal would disconnect the network or split it
into separate components.

Identifying bridges is crucial for understanding the network's vulnerability and ensuring
its connectivity.

The code checks for bridges and visualizes them, which can help in understanding
potential points of network vulnerability.

6. Clustering Coefficient:

The clustering coefficient measures the degree to which nodes in a network tend to
cluster together.

A higher clustering coefficient suggests that nodes are more likely to form tightly-knit
groups or communities.

Analyzing the clustering coefficient helps in understanding the presence of communities


or subgroups within the Facebook network.

7. Data Quality and Error Handling:

Data quality is essential for accurate network analysis. Errors or inconsistencies in the
dataset can lead to incorrect insights.

Proper data cleaning, validation, and formatting are necessary to ensure the reliability of
the analysis.

Error handling is important to address issues that may arise during data processing or
analysis.

8. Interpretation:

The results of these analyses can provide valuable insights into the Facebook network:

Understanding the overall structure of the network.


Identifying influential users with high degree centrality or closeness centrality.

Recognizing patterns of clustering or community formation.

Detecting potential vulnerabilities through bridge analysis.

These insights can be applied to various domains, including marketing, community


management, and understanding information diffusion within the network.

Conclusion:

In summary, network analysis of the Facebook dataset involves creating a graph,


calculating centrality measures, visualizing the network, and drawing conclusions based
on the network's structure and characteristics. Proper data quality and error handling
are crucial for meaningful results and interpretations.

You might also like