Open In App

Network Traffic Analysis Visualization in R

Last Updated : 19 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

The goal of a typical network traffic analysis project is to monitor, analyze, and visualize data flow across a network. This helps in identifying patterns and trends, detecting security threats, and making informed decisions about network infrastructure.

The Theory Behind Synthetic Data Generation

  • Timestamps: These are generated at regular intervals to simulate network activity over time, enabling the analysis of temporal patterns in network traffic.
  • IPv4 Addresses: Randomly generated IPv4 addresses are used to simulate communication between devices on the network.
  • Bytes Transferred: Randomly sampled within a realistic range to reflect real-world variations in data transfer volumes.
  • Dataframe Creation: The synthetic data is organized into a dataframe, where each row represents a record of network activity.

1. Creating the Dataset

We begin by generating a synthetic dataset that will allow us to explore network traffic patterns. Here’s how we can generate the synthetic data. We are generating synthetic data including timestamps, random IPv4 addresses and bytes transferred. This data is stored in a dataframe for easy analysis. Here,

  • Timestamps allow us to track when traffic occurred.
  • Source and Destination IPs help identify which devices are communicating.
  • Bytes Transferred shows the volume of data being transferred, which helps us identify high-traffic sources or destinations.
R
set.seed(123)
num_records <- 1000

timestamps <- seq.POSIXt(from = as.POSIXct("2024-06-03 00:00:00"), 
                         by = "hour", length.out = num_records)

generate_ipv4 <- function(n) {
  paste(sample(0:255, n, replace = TRUE), sample(0:255, n, replace = TRUE), 
        sample(0:255, n, replace = TRUE), sample(0:255, n, replace = TRUE), sep = ".")
}

source_ips <- generate_ipv4(num_records)
destination_ips <- generate_ipv4(num_records)

bytes_transferred <- sample(100:10000, num_records, replace = TRUE)

traffic_data <- data.frame(
  timestamp = timestamps,
  source_ip = source_ips,
  destination_ip = destination_ips,
  bytes_transferred = bytes_transferred
)

head(traffic_data)

Output:

data
Creating the Dataset

2. Visualizations for Network Traffic Analysis

We will plot various visualization for analyzing our data.

2.1 Time Series Analysis

We are plotting the total bytes transferred over time to understand how network traffic changes at different intervals.

R
install.packages("ggplot2")
library(ggplot2)

options(repr.plot.width=10, repr.plot.height=6)

ggplot(traffic_data, aes(x = timestamp, y = bytes_transferred)) +
  geom_line() +
  labs(title = "Network Traffic Over Time",
       x = "Timestamp",
       y = "Bytes Transferred")

Output:

time
Time Series Analysis

From the time series plot, we can analyze traffic patterns, peak hours and anomalies that might indicate unusual activity or congestion.

2.2 Top Talkers Analysis

Next, we identify the "top talkers" by aggregating the data based on source IPs to find which devices are responsible for the most traffic.

R
top_talkers <- aggregate(bytes_transferred ~ source_ip, data = traffic_data, FUN = sum)
top_talkers <- top_talkers[order(top_talkers$bytes_transferred, decreasing = TRUE), ]
top_talkers <- head(top_talkers, 10) # Selecting top 10 talkers for visualization

ggplot(top_talkers, aes(x = reorder(source_ip, bytes_transferred),
                        y = bytes_transferred, fill = bytes_transferred)) +
  geom_bar(stat = "identity", color = "black") +
  scale_fill_gradient(low = "lightblue", high = "blue") +
  labs(title = "Top 10 Network Talkers",
       x = "Source IP Address",
       y = "Bytes Transferred") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
        axis.title.x = element_text(size = 12, face = "bold"),
        axis.title.y = element_text(size = 12, face = "bold"),
        legend.position = "none") +
  geom_text(aes(label = scales::comma(bytes_transferred)), vjust = -0.3, size = 3.5)

Output:

Top Talkers Analysis

The bar plot of top talkers helps us pinpoint which devices are using the most bandwidth. This information is important for identifying potential network congestion or suspicious activity from specific devices.

2.3 Destination Analysis

We analyze the distribution of bytes transferred based on destination IP addresses to identify which resources are most accessed.

R
destination_summary <- aggregate(bytes_transferred ~ destination_ip, data = traffic_data,
                                 FUN = sum)
destination_summary <- destination_summary[order(destination_summary$bytes_transferred,
                                                 decreasing = TRUE), ]
top_destinations <- head(destination_summary, 10)

ggplot(top_destinations, aes(x = bytes_transferred, y = reorder(destination_ip, 
                                                                bytes_transferred))) +
  geom_segment(aes(x = 0, xend = bytes_transferred, y = reorder(destination_ip, 
                                                                bytes_transferred), 
                   yend = reorder(destination_ip, bytes_transferred)),
               color = "grey") +
  geom_point(aes(color = bytes_transferred), size = 4) +
  scale_color_gradient(low = "lightgreen", high = "darkgreen") +
  labs(title = "Top 10 Network Destinations",
       x = "Bytes Transferred",
       y = "Destination IP Address") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
        axis.title.x = element_text(size = 12, face = "bold"),
        axis.title.y = element_text(size = 12, face = "bold"),
        axis.text.y = element_text(size = 10),
        legend.position = "none") +
  geom_text(aes(label = scales::comma(bytes_transferred)), hjust = -0.1, size = 3.5)

Output:

Destination Analysis

The lollipop chart highlights which destinations are receiving the most traffic. This insight is useful for identifying heavily accessed resources, potential bottlenecks or areas requiring optimization.

2.4 Geographical Visualization

Visualizing network traffic geographically can provide insights into the regions generating or receiving traffic. This can be particularly useful for detecting unusual traffic patterns from specific locations.

R
install.packages("leaflet")
library(leaflet)

set.seed(123)
num_records <- nrow(traffic_data)
traffic_data$source_latitude <- runif(n = num_records, min = -90, max = 90)
traffic_data$source_longitude <- runif(n = num_records, min = -180, max = 180)
traffic_data$destination_latitude <- runif(n = num_records, min = -90, max = 90)
traffic_data$destination_longitude <- runif(n = num_records, min = -180, max = 180)

leaflet(data = traffic_data) %>%
  addTiles() %>%
  addCircleMarkers(~source_longitude, ~source_latitude, radius = 5, 
                   color = "blue", fillOpacity = 0.5, 
                   label = ~paste("Source:", source_ip)) %>%
  addCircleMarkers(~destination_longitude, ~destination_latitude, radius = 5, 
                   color = "red", fillOpacity = 0.5, label = ~paste("Destination:", 
                                                                    destination_ip)) %>%
  addPolylines(~c(source_longitude, destination_longitude), ~c(source_latitude, 
                                                               destination_latitude), 
               color = "green", weight = 2, opacity = 0.7) %>%
  addLegend("bottomright", colors = c("blue", "red", "green"), 
            labels = c("Source IP", "Destination IP", "Traffic Path"), 
            title = "Legend") %>%
  setView(lng = mean(traffic_data$source_longitude), 
          lat = mean(traffic_data$source_latitude), zoom = 1)

Output:

Geographical Visualization

The map allows us to visually explore the geographical distribution of network traffic. By observing the locations of source and destination IP addresses, we can identify traffic sources and destinations from specific regions, potentially highlighting abnormal activity.

Conclusion

Through the visualizations discussed in this article, we identified the following insights into network traffic patterns:

  • Trends in Network Traffic: We identified peak traffic periods, which can help network administrators optimize network resources and ensure smooth performance during high-traffic times.
  • Anomalies in Traffic Patterns: We found that certain patterns deviated from the norm, indicating potential issues like network congestion or malicious activity, enabling proactive intervention.
  • Potential Security Threats: We identified abnormal behaviors in traffic, which could signal security threats. This allows for early detection and quicker response to mitigate risks.

These insights contribute to better network management, enhanced security, and more effective decision-making.


Similar Reads