Network Traffic Analysis Visualization in R
Last Updated :
19 Jun, 2025
The goal of a typical network traffic analysis project is to monitor, analyze, and visualize data flow across a network. This helps in identifying patterns and trends, detecting security threats, and making informed decisions about network infrastructure.
The Theory Behind Synthetic Data Generation
- Timestamps: These are generated at regular intervals to simulate network activity over time, enabling the analysis of temporal patterns in network traffic.
- IPv4 Addresses: Randomly generated IPv4 addresses are used to simulate communication between devices on the network.
- Bytes Transferred: Randomly sampled within a realistic range to reflect real-world variations in data transfer volumes.
- Dataframe Creation: The synthetic data is organized into a dataframe, where each row represents a record of network activity.
1. Creating the Dataset
We begin by generating a synthetic dataset that will allow us to explore network traffic patterns. Here’s how we can generate the synthetic data. We are generating synthetic data including timestamps, random IPv4 addresses and bytes transferred. This data is stored in a dataframe for easy analysis. Here,
- Timestamps allow us to track when traffic occurred.
- Source and Destination IPs help identify which devices are communicating.
- Bytes Transferred shows the volume of data being transferred, which helps us identify high-traffic sources or destinations.
R
set.seed(123)
num_records <- 1000
timestamps <- seq.POSIXt(from = as.POSIXct("2024-06-03 00:00:00"),
by = "hour", length.out = num_records)
generate_ipv4 <- function(n) {
paste(sample(0:255, n, replace = TRUE), sample(0:255, n, replace = TRUE),
sample(0:255, n, replace = TRUE), sample(0:255, n, replace = TRUE), sep = ".")
}
source_ips <- generate_ipv4(num_records)
destination_ips <- generate_ipv4(num_records)
bytes_transferred <- sample(100:10000, num_records, replace = TRUE)
traffic_data <- data.frame(
timestamp = timestamps,
source_ip = source_ips,
destination_ip = destination_ips,
bytes_transferred = bytes_transferred
)
head(traffic_data)
Output:
Creating the Dataset2. Visualizations for Network Traffic Analysis
We will plot various visualization for analyzing our data.
2.1 Time Series Analysis
We are plotting the total bytes transferred over time to understand how network traffic changes at different intervals.
R
install.packages("ggplot2")
library(ggplot2)
options(repr.plot.width=10, repr.plot.height=6)
ggplot(traffic_data, aes(x = timestamp, y = bytes_transferred)) +
geom_line() +
labs(title = "Network Traffic Over Time",
x = "Timestamp",
y = "Bytes Transferred")
Output:
Time Series AnalysisFrom the time series plot, we can analyze traffic patterns, peak hours and anomalies that might indicate unusual activity or congestion.
2.2 Top Talkers Analysis
Next, we identify the "top talkers" by aggregating the data based on source IPs to find which devices are responsible for the most traffic.
R
top_talkers <- aggregate(bytes_transferred ~ source_ip, data = traffic_data, FUN = sum)
top_talkers <- top_talkers[order(top_talkers$bytes_transferred, decreasing = TRUE), ]
top_talkers <- head(top_talkers, 10) # Selecting top 10 talkers for visualization
ggplot(top_talkers, aes(x = reorder(source_ip, bytes_transferred),
y = bytes_transferred, fill = bytes_transferred)) +
geom_bar(stat = "identity", color = "black") +
scale_fill_gradient(low = "lightblue", high = "blue") +
labs(title = "Top 10 Network Talkers",
x = "Source IP Address",
y = "Bytes Transferred") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
legend.position = "none") +
geom_text(aes(label = scales::comma(bytes_transferred)), vjust = -0.3, size = 3.5)
Output:
Top Talkers AnalysisThe bar plot of top talkers helps us pinpoint which devices are using the most bandwidth. This information is important for identifying potential network congestion or suspicious activity from specific devices.
2.3 Destination Analysis
We analyze the distribution of bytes transferred based on destination IP addresses to identify which resources are most accessed.
R
destination_summary <- aggregate(bytes_transferred ~ destination_ip, data = traffic_data,
FUN = sum)
destination_summary <- destination_summary[order(destination_summary$bytes_transferred,
decreasing = TRUE), ]
top_destinations <- head(destination_summary, 10)
ggplot(top_destinations, aes(x = bytes_transferred, y = reorder(destination_ip,
bytes_transferred))) +
geom_segment(aes(x = 0, xend = bytes_transferred, y = reorder(destination_ip,
bytes_transferred),
yend = reorder(destination_ip, bytes_transferred)),
color = "grey") +
geom_point(aes(color = bytes_transferred), size = 4) +
scale_color_gradient(low = "lightgreen", high = "darkgreen") +
labs(title = "Top 10 Network Destinations",
x = "Bytes Transferred",
y = "Destination IP Address") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text.y = element_text(size = 10),
legend.position = "none") +
geom_text(aes(label = scales::comma(bytes_transferred)), hjust = -0.1, size = 3.5)
Output:
Destination AnalysisThe lollipop chart highlights which destinations are receiving the most traffic. This insight is useful for identifying heavily accessed resources, potential bottlenecks or areas requiring optimization.
2.4 Geographical Visualization
Visualizing network traffic geographically can provide insights into the regions generating or receiving traffic. This can be particularly useful for detecting unusual traffic patterns from specific locations.
R
install.packages("leaflet")
library(leaflet)
set.seed(123)
num_records <- nrow(traffic_data)
traffic_data$source_latitude <- runif(n = num_records, min = -90, max = 90)
traffic_data$source_longitude <- runif(n = num_records, min = -180, max = 180)
traffic_data$destination_latitude <- runif(n = num_records, min = -90, max = 90)
traffic_data$destination_longitude <- runif(n = num_records, min = -180, max = 180)
leaflet(data = traffic_data) %>%
addTiles() %>%
addCircleMarkers(~source_longitude, ~source_latitude, radius = 5,
color = "blue", fillOpacity = 0.5,
label = ~paste("Source:", source_ip)) %>%
addCircleMarkers(~destination_longitude, ~destination_latitude, radius = 5,
color = "red", fillOpacity = 0.5, label = ~paste("Destination:",
destination_ip)) %>%
addPolylines(~c(source_longitude, destination_longitude), ~c(source_latitude,
destination_latitude),
color = "green", weight = 2, opacity = 0.7) %>%
addLegend("bottomright", colors = c("blue", "red", "green"),
labels = c("Source IP", "Destination IP", "Traffic Path"),
title = "Legend") %>%
setView(lng = mean(traffic_data$source_longitude),
lat = mean(traffic_data$source_latitude), zoom = 1)
Output:
Geographical VisualizationThe map allows us to visually explore the geographical distribution of network traffic. By observing the locations of source and destination IP addresses, we can identify traffic sources and destinations from specific regions, potentially highlighting abnormal activity.
Conclusion
Through the visualizations discussed in this article, we identified the following insights into network traffic patterns:
- Trends in Network Traffic: We identified peak traffic periods, which can help network administrators optimize network resources and ensure smooth performance during high-traffic times.
- Anomalies in Traffic Patterns: We found that certain patterns deviated from the norm, indicating potential issues like network congestion or malicious activity, enabling proactive intervention.
- Potential Security Threats: We identified abnormal behaviors in traffic, which could signal security threats. This allows for early detection and quicker response to mitigate risks.
These insights contribute to better network management, enhanced security, and more effective decision-making.
Similar Reads
Network Visualization in R using igraph Exploring Network Visualization in R through the powerful igraph package opens a gateway to deciphering intricate network structures without triggering plagiarism detection systems. This article embarks on an insightful journey, unraveling the art of representing and analyzing diverse networks, from
11 min read
Network Visualization in R using igraph Exploring Network Visualization in R through the powerful igraph package opens a gateway to deciphering intricate network structures without triggering plagiarism detection systems. This article embarks on an insightful journey, unraveling the art of representing and analyzing diverse networks, from
11 min read
Telecommunication Network Traffic Analysis in R Telecommunication network traffic analysis involves studying the data flow within a network to ensure efficient performance, identify bottlenecks, and predict future trends. With the increasing demand for high-speed internet and mobile services, understanding network traffic patterns is crucial for
6 min read
Data Visualization in R Data visualization is the practice of representing data through visual elements like graphs, charts, and maps. It helps in understanding large datasets more easily, making it possible to identify patterns and trends that support better decision-making. R is a language designed for statistical analys
5 min read
Data Visualization in R Data visualization is the practice of representing data through visual elements like graphs, charts, and maps. It helps in understanding large datasets more easily, making it possible to identify patterns and trends that support better decision-making. R is a language designed for statistical analys
5 min read
Creating Time Series Visualizations in R Time series data is a valuable resource in numerous fields, offering insights into trends, patterns, and fluctuations over time. Visualizing this data is crucial for understanding its underlying characteristics effectively. Here, we'll check the process of creating time series visualizations in R Pr
7 min read