0% found this document useful (0 votes)
54 views

Graphblas

Uploaded by

Zszzz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Graphblas

Uploaded by

Zszzz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

GraphBLAS on the Edge:

High Performance Streaming of Network Traffic


Michael Jones1 , Jeremy Kepner1 , Daniel Andersen2 , Aydın Buluç3 , Chansup Byun1 , K Claffy2 , Timothy Davis4 ,
William Arcand1 , Jonathan Bernays1 , David Bestor1 , William Bergeron1 , Vijay Gadepally1 , Micheal Houle1 ,
Matthew Hubbell1 , Hayden Jananthan1 , Anna Klein1 , Chad Meiners1 , Lauren Milechin1 , Julie Mullen1 ,
Sandeep Pisharody1 , Andrew Prout1 , Albert Reuther1 , Antonio Rosa1 , Siddharth Samsi1 ,
Jon Sreekanth5 , Doug Stetson1 , Charles Yee1 , Peter Michaleas1
1
MIT, 2 CAIDA, 3 LBNL, 4 Texas A&M, 5 Accolade Technology
arXiv:2203.13934v1 [cs.NI] 25 Mar 2022

Abstract—Long range detection is a cornerstone of defense in representing approximately 1/256 of the Internet [18]–[21].
many operating domains (land, sea, undersea, air, space, ..,). In In general, long range detection requires the analysis of
the cyber domain, long range detection requires the analysis significant network traffic from a variety of observatories and
of significant network traffic from a variety of observatories
and outposts. Construction of anonymized hypersparse traffic outposts [22], [23].
matrices on edge network devices can be a key enabler by The data volumes, processing requirements, and privacy
providing significant data compression in a rapidly analyzable concerns of analyzing a significant fraction of the Internet
format that protects privacy. GraphBLAS is ideally suited for have been prohibitive. The North American Internet generates
both constructing and analyzing anonymized hypersparse traffic billions of non-video Internet packets each second [1], [2].
matrices. The performance of GraphBLAS on an Accolade
Technologies edge network device is demonstrated on a near The GraphBLAS standard provides significant performance
worse case traffic scenario using a continuous stream of CAIDA and compression capabilities which improve the feasibility
Telescope darknet packets. The performance for varying numbers of analyzing these volumes of data [24]–[38]. Specifically,
of traffic buffers, threads, and processor cores is explored. the GraphBLAS is ideally suited for both constructing and
Anonymized hypersparse traffic matrices can be constructed at analyzing anonymized hypersparse traffic matrices. Prior work
a rate of over 50,000,000 packets per second; exceeding a typical
400 Gigabit network link. This performance demonstrates that with the GraphBLAS has demonstrated rates of 75 billion
anonymized hypersparse traffic matrices are readily computable packets per second (pps) [39], while achieving compressions
on edge network devices with minimal compute resources and of 1 bit per packet [17], and enabling the analysis of the
can be a viable data product for such devices. largest publicly available historical archives with over 40
Index Terms—Internet defense, packet capture, streaming trillion packets [40]. Analysis of anonymized hypersparse
graphs, hypersparse matrices
traffic matrices from a variety of sources has revealed power-
law distributions [41], [42], novel scaling relations [17], [40],
I. I NTRODUCTION
and inspired new models of network traffic [43].
The Internet has become as essential as land, sea, air, and GraphBLAS anonymized hypersparse traffic matrices rep-
space for enabling activities as diverse as commerce, educa- resent one set of design choices for analyzing network traffic.
tion, health, and entertainment [1], [2]. Long range detection Specifically, the use case requiring some data on all packets
has been a cornerstone of defense in many operating domains (no down-sampling), high performance, high compression,
since anitquity [3]–[10]. In the cyber domain, observatories matrix-based analysis, anonymization, and open standards.
and outposts have been constructed to gather data on Internet There are a wide range of alternative graph/network anal-
traffic and provide a starting point for exploring long range ysis technologies and many good implementations achieve
detection [11]–[17] (see Figure 1). The largest public Internet performance close to the limits of the underlying computing
observatory is the Center for Applied Internet Data Analysis hardware [44]–[54]. Likewise, there are many network analysis
(CAIDA) Telescope that operates a variety of sensors including tools that focus on providing a rich interface to the full
a continuous stream of packets from an unsolicited darkspace diversity of data found in network traffic [55]–[57]. Each of
these technologies has appropriate use cases in the broad field
This material is based upon work supported by the Assistant Secretary
of Defense for Research and Engineering under Air Force Contract No. of Internet traffic analysis.
FA8702-15-D-0001, National Science Foundation CCF-1533644, and United Sending large volumes of raw Internet traffic to a central
States Air Force Research Laboratory and Artificial Intelligence Accelerator location to construct anonymized hypersparse traffic matrices
Cooperative Agreement Number FA8750-19-2-1000. Any opinions, findings,
conclusions or recommendations expressed in this material are those of the is prohibitive. To meet the goal of providing some data on
author(s) and do not necessarily reflect the views of the Assistant Secretary of all packets without down-sampling requires constructing the
Defense for Research and Engineering, the National Science Foundation, or anonymized hypersparse traffic matrices in the network itself
the United States Air Force. The U.S. Government is authorized to reproduce
and distribute reprints for Government purposes notwithstanding any copyright in order to realize the full data compression benefits. The goal
notation herein. of this paper is to explore the viability of this approach by
 1!-.-4'&%)#/*.

 *16/ -&'..'0!#/!24




5 4!-.5 +&!/. 





 #-4/-0)&5 4!-.5  
+&!/.  

 
 #-4/-0)&5 4!-.5 
+&!/.

 ,0%)*3#-4/-0)&5 
%# 
 
  -&'..#/!24 
4!-.5 +&!/. 

4
 -!4)*%.! #/!245 /%1!

.-
$*)!4+*/.

1!-
 '*'4!-''%)!#/!24*



#/*
$*)!4+*/"-(  
 

'&%)
 $ *2.!-1!- #/!245 


.%)&$*'! */)!/.

.

 

Fig. 1. Internet Observatories and Outposts. Traffic matrix view of the Internet depicting selected observatories and outposts and their notional proximity
to various types of network traffic [11]–[17].

measuring the performance of GraphBLAS on an on edge sparse traffic matrix A


network device. The performance is measured on a near worse
case traffic scenario using a continuous stream of CAIDA
Telescope darknet packets (mostly botnets and scanners) which external external

external
have an irregular distribution and almost no data payload (i.e., à à
internal external
all header).

sources
sources
The outline of the rest of the paper is as follows. First, the
CAIDA Telescope test data and some basic network quantities
are defined in terms of traffic matrices. Next, the anonymized
hypersparse traffic matrix pipeline is described followed by internal internal
internal

a description of the experimental setup and implementation. à destinations à


internal external
Finally, the results, conclusions, and directions for further
work are presented.
internal external
II. T EST DATA AND T RAFFIC M ATRICES destinations

The test data is drawn from the CAIDA Telescope darknet


Fig. 2. Network traffic matrix. The traffic matrix can be divided into
packets (mostly botnets and scanners) and is a near worse quadrants separating internal and external traffic. The CAIDA Telescope
case with a highly irregular distribution and almost no data monitors a darkspace, so only the upper left (external → internal) quadrant
payload (i.e., all header). The CAIDA Telescope monitors the will have data.
traffic into and out of a set of network addresses providing a
natural observation point of network traffic. These data can be
viewed as a traffic matrix where each row is a source and each • Registration with a repository and demonstration of le-
column is a destination. The CAIDA Telescope traffic matrix gitimate research need
can be partitioned into four quadrants (see Figure 2). These • Recipients legally agree to neither repost a corpus nor
quadrants represent different flows between nodes internal and deanonymize data
external to the set of monitored addresses. Because the CAIDA • Recipients can publish analysis and data examples nec-
Telescope network addresses are a darkspace, only the upper essary to review research
left (external → internal) quadrant will have data. • Recipients agree to cite the repository and provide pub-
Internet data must be handled with care, and CAIDA has lications back to the repository
pioneered trusted data sharing best practices that combine • Repositories can curate enriched products developed by
anonymizing source and destinations using CryptoPAN [58] researchers
with data sharing agreements. These data sharing best practices A primary benefit of constructing anonymized hypersparse
are the basis of the architecture presented here and include the traffic matrices with the GraphBLAS is the efficient com-
following principles [22] putation of a wide range of network quantities via matrix
• Data is made available in curated repositories mathematics. Figure 3 illustrates essential quantities found
• Using standard anonymization methods where needed: in all streaming dynamic networks. These quantities are all
hashing, sampling, and/or simulation computable from anonymized traffic matrices created from the
source packets unique source unique TABLE I
(packets from a source) sources fan-out links
N ETWORK Q UANTITIES FROM T RAFFIC M ATRICES
Formulas for computing network quantities from traffic matrix At at time t in
both summation and matrix notation. 1 is a column vector of all 1’s, T is the
transpose operation, and | |0 is the zero-norm that sets each nonzero value of
valid valid
its argument to 1 [65]. These formulas are unaffected by matrix permutations
source packet destination packet and will work on anonymized data.
window window
NV=217,218,…,227 NV=217,218,…,227 Aggregate Summation Matrix
(time window) (time window) Property Notation Notation
1T At 1
P P
Valid packets NV A (i, j)
P Pj t
i
link destination unique destination packets Unique links i j |At (i, j)|0 1T |At |0 1
packets fan-in destinations (packets to a destination)
Link packets from i to j At (i, j) At
Max link packets (dmax ) max A (i, j) max(At )
P Pij t
Fig. 3. Streaming network traffic quantities. Internet traffic streams of Unique sources i | P j At (i, j)|0 1T |At 1|0
NV valid packets are divided into a variety of quantities for analysis: source Packets from source i
packets, source fan-out, unique source-destination pair packets (or links), Pj At (i, j) At 1
Max source packets (dmax ) maxi P j At (i, j) max(At 1)
destination fan-in, and destination packets. Source fan-out from i Pj |At (i, j)|0 |At |0 1
Max source fan-out (dmax ) maxi j |At (i, j)|0 max(|At |0 1)
|1T At |0 1
P P
Unique destinations j | P i At (i, j)|0
source and destinations found in Internet packet headers. Destination packets to j 1T |At |0
Pi At (i, j)
The network quantities depicted in Figure 3 are computable Max destination packets (dmax ) maxj P i At (i, j) max(1T |At |0 )
from anonymized origin-destination traffic matrices that are Destination fan-in to j Pi |At (i, j)|0 1T At
widely used to represent network traffic [59]–[62]. It is com- Max destination fan-in (dmax ) maxj i |At (i, j)|0 max(1T At )
mon to filter the packets down to a valid set for any particular
analysis. Such filters may limit particular sources, destinations,
protocols, and time windows. To reduce statistical fluctuations, +
+ +
the streaming data should be partitioned so that for any chosen +
+ +
time window all data sets have the same number of valid
sparse sparse sparse
sources

sources

sources
packets [63]. At a given time t, NV consecutive valid packets sources

sources
traffic traffic traffic

sources
st m

are aggregated from the traffic into a hypersparse matrix At , matrix matrix matrix
re a
am tri

where At (i, j) is the number of valid packets between the At:t+T At:t+2T At:t+4T
of es

destinations destinations destinations


tra

source i and destination j. The sum of all the entries in At is


c

NV=217 NV=218 NV=219


ffi
c

equal to NV X
At (i, j) = NV Fig. 4. Multi-temporal streaming traffic matrices. Efficient computation of
network quantities on multiple time scales can be achieved by hierarchically
i,j
aggregating data in different time windows.
Constant packet, variable time samples simplify the statistical
analysis of the heavy-tail distributions commonly found in
network traffic quantities [41], [42], [64]. All the network aggregation of different streaming traffic matrices. Computing
quantities depicted in Figure 3 can be readily computed from each quantity at each hierarchy level eliminates redundant
At using the formulas listed in Table I. Because matrix computations that would be performed if each packet window
operations are generally invariant to permutation (reordering was computed separately. Hierarchy also ensures that most
of the rows and columns), these quantities can readily be computations are performed on smaller matrices residing in
computed from anonymized data. faster memory. Correlations among the matrices mean that
The contiguous nature of these data allows the exploration adding two matrices each with NV entries results in a matrix
of a wide range of packet windows from NV = 217 (sub- with fewer than 2NV entries, reducing the relative number of
second) to NV = 227 (minutes), providing a unique view into operations as the matrices grow.
how network quantities depend upon time. These observations One of the important capabilities of the SuiteSparse Graph-
provide new insights into normal network background traffic BLAS library is direct support of hypersparse matrices where
that could be used for anomaly detection, AI feature engineer- the number of nonzero entries is significantly less than either
ing, polystore index learning, and testing theoretical models of dimensions of the matrix. If the packet source and destination
streaming networks [66]–[68]. identifiers are drawn from a large numeric range, such as those
Network traffic is dynamic and exhibits varying behavior used in the Internet protocol, then a hypersparse representation
on a wide range of time scales. A given packet window of At eliminates the need to keep track of additional indices
size NV will be sensitive to phenomena on its corresponding and can significantly accelerate the computations [39].
timescale. Determining how network quantities scale with NV
provides insight into the temporal behavior of network traffic. III. H YPERSPARSE M ATRIX P IPELINE
Efficient computation of network quantities on multiple time The aforementioned analysis goals set the requirements for
scales can be achieved by hierarchically aggregating data in the GraphBLAS hypersparse traffic matrix pipeline. Specifi-
different time windows [63]. Figure 4 illustrates a binary cally, the compression benefits are maximized if the Graph-
Bit Network Packet Format Bit Sample Packet
0 31
EC 45 00 01 13 00 00 40 00 40 06 18 46 c0 a8 c8 1a
Version IHL Service Type N
Total Length
34 3d 64 9f c5 2e 01 bb df ea c7 4f dc 89 d2 c5
Identification Flags Fragment Offset 80 18 08 00 22 96 00 00 01 01 08 0a 49 67 5c 0a

(24 Bytes)
IP Header
86 23 3f 61 17 03 03 00 da 14 20 61 3a df ee a7
Time to Live Transport Protocol Header Checksum ff ba 29 05 86 7a c7 76 d1 de 2a a3 4e ae fa ca
bb fd 01 71 fc 8c 93 de ab 20 7a b5 8d 81 a2 c2
Source IP Address 48 69 0b 35 f5 c1 8f c9 31 37 4b 4e 0f 97 44 ae
4e b3 f5 11 dd 0e 0e 49 4c 95 79 77 c9 a1 b3 7a
Destination IP Address e5 9c 8b 88 80 38 4a 0f a5 a5 c9 67 cf 7e b2 59
4c 0c 8f b0 ea a4 38 e0 e9 de 1c d4 f5 80 f1 d5
Options + Padding
a8 db 95 04 a0 5f 50 d6 6e 97 9b 9f dd 82 14 26
a8 c8 fe 47 02 05 2d 33 28 37 87 3e 9c e9 f4 d5
65 84 45 78 e0 23 3f 34 94 2e 99 3c d5 05 ac 25
Payload 65 6b 78 11 fd 7b 08 a1 6e f3 47 62 3a e7 8a 43
5a 77 ef 6a 88 d0 46 a9 dc 19 34 05 4e 69 b2 40
56 a9 8f f5 d4 d6 21 e0 16 77 c9 35 a9 d0 20 4d
ef e2 d0 8f f4 15 7c 3a f6 d2 93 aa 65 34 be b4
a7 d0 7a

Source IP Address Destination IP Address 192.168.200.26 52.61.100.159

Fig. 5. Network Packet Description. A network packet consists of a header and a payload. To avoid downsampling and minimize privacy concerns only
the source and destination are selected.

Anonymization
Table
Bucket of NV packets Timestamp Source IP Destination IP

sources
t0 Source IP1
Destination
IP1
A
Src IP Dst IP Destination 1
t1 Source IP2
IP2 destinations
2
Destination 3
Sparse
t2 Source IP3
IP3 Traffic Matrix
4
Destination
t3 Source IP4
IP4

Destination 232 Files


t4 Source IP5
IP5

Fig. 6. Anonymized Hypersparse Matrix Pipeline. Continuous sequences of NV = 217 packets are extracted from packet headers, anonymized, formed
into a GraphBLAS hypersparse matrix, serialized, and saved in groups of 64 GraphBLAS matrices to a UNIX TAR file.

BLAS hypersparse traffic matrix is constructed in the network


as close to the network traffic as possible as this minimizes the
amount of data that needs to be sent over the network. In addi-
Mellanox 40GBASE-CR4 DAC
tion, collection at the network source allows the data owner to Sender Receiver
construct and own the anonymization scheme and only share
QSFP28

QSFP28

QSFP28

QSFP28
anonymized data under trusted data sharing agreements with
the parties tasked with analyzing the data [69].
HP ProLiant DL360 G9

HP ProLiant DL360 G9
The first step in the GraphBLAS hypersparse traffic matrix
Transmit Direct Memory

pipeline is to capture a packet, discard the data payload, and


Access (TXDMA)

PCIe Gen 3.0 x16

extract the source and destination Internet Protocol (IP) ad-


dresses (Figure 5). For the purposes of the current performance
testing, only IPv4 packets are used which are stored as 32
bit unsigned integers. Collections of NV = 217 consecutive
packets are then each anonymized using a cryptoPAN gener-
ated anonymization table. The resulting anonymized source Intel Xeon E5-2683 Intel Xeon E5-2683
and destination IPs are then used to construct a 232 ×232
hypersparse GraphBLAS matrix. 64 consecutive hypersparse Fig. 7. Experimental Setup. Test system consisted of two compute nodes
GraphBLAS matrices are each serialized in compressed sparse each with an Accolade card connected over a network. The sender node reads
CAIDA Telescope packet data into memory and then the Accolode card sends
rows (CSR) format with LZ4 compression and saved to a the data directly from memory over the network to the receiver node. The
UNIX TAR file (Figure 6). The TAR files can be further Accolade card on the receiver takes the data off the network, places the data
compressed using other compressing methods (if desired) and in a hardware ring buffer, and makes the data available to be processed by
the receiver processor.
then transmitted to the appropriate parties tasked for analysis.
For example, standard gzip compression reduces file size by
40% but also reduces performance by 80%.
IV. I MPLEMENTATION 100M 1000Gb
Max Input
Effective implementation of the GraphBLAS pipeline re-
80M 800Gb

Packets per Second

Effective Bandwidth
quires that the anonymization, creation, and saving of the
resulting files can keep up with the data rates of typical high
bandwidth links. To measure this performance two Accolade 60M 600Gb
Technology ANIC-200Kq dual port 100 gigabit flow classi-
fication and shunting adapters were installed into the PCIe
40M 400Gb
slots of two HP Proliant DL360 G9 servers (Figure 7). These
servers were connected via a Mellanox 40 gigabit network
connection. ANIC-200Kq cards are capable of a wide range 20M 200Gb
of analysis techniques, this experiment only used their data
transmission, shunting, and buffering capabilities. 0 0
The C implementation of the GraphBLAS hypersparse 0 5 10 15 20
traffic matrix pipeline shown in Figures 5 and 6 was run on Number of Rings
dual Intel Xeon E5-2683 processors in the receiver server. The
Accolade card on the receiver server collects packets in ring
Fig. 8. Performance Results. Packets per second processed versus the
buffers, the number of which can be set at initialization. Using number of hardware rings used. The number of hardware rings is approx-
C pthreads [70] an Accolade worker thread is assigned to each imately proportional to the number of threads/cores being used. The packet
Accolade hardware ring. Within each Accolade worker thread, performance can be converted into an estimated equivalent bandwidth using
a representative packets size of 10,000 bits per packet (see right vertical axis)
a block of 223 IPv4 packets are collected and a GraphBLAS
worker thread is spawned to process the block in subblocks of
217 packets. Each sublock is anonymized using a cryptoPAN V. R ESULTS
generated anonymization vector and the resulting anonymized
sources and destinations are used to construct a GraphBLAS Using the experimental setup shown in Figure 7, a num-
matrix. The matrix is the serialized and appended to a TAR ber of performance experiments were conducted. In these
buffer, which is saved to a file after all 64 subblocks have been experiments the sender server would load 100×223 CAIDA
processed. A more detailed outline of the code is as follows: Telescope packets into the sender Accolade card and sends
the packets at defined rate over the network to the receiver
Main Thread server where the receiver Accolade card loads the packets
• Set number of Accolade hardware rings into its hardware rings. Likewise, on the receiver server the
• Load 232 entry IPv4 anonymization table GraphBLAS hypersparse traffic matrix pipeline described in
• For each Accolade hardware ring
the previous section was executed. The cryptoPAN anonymiza-
– Launch Accolade Worker thread tion table was created offline, stored, and loaded at startup. The
Accolade Worker reason for this is that the single core cryptoPAN performance
• Create libpcap handle to Accolade device is approximately 700,000 IP address per second. The 232 entry
• Allocate 64MB buffer for packet processing cryptoPAN anonymization lookup table dramatically speeds
• For each packet
up the performance. Generation of the table is readily run in
– Retrieve packet from Accolade device buffers parallel and can be generated in a few seconds (if desired).
– Append source and destination IP addresses to buffer
The rate of packets being sent was adjusted to achieve the
– If buffer has 223 packets
maximum rate without dropped packets, which indicated that
◦ Launch GraphBLAS Worker thread with pointer to
packet buffer the GraphBLAS hypersparse traffic matrix pipeline was able
◦ Allocate new 64MB buffer for packet processing to keep up with the incoming traffic. The number of hardware
GraphBLAS Worker rings were varied; these are approximately proportional to the
number of threads/cores being used.
• Initialize GraphBLAS
• For each subblock of 217 packets CAIDA Telescope data are a near worse case scenario
– Create new GraphBLAS matrix because almost all of the packets have no payload and very
– Create row, column and value vectors few packets use the same source and destination connection
– For each packet in subblock so the resulting hypersparse traffic matrices have very few
◦ Lookup in anonymization table entries that are greater than 1. An advantage of the few
◦ Insert into row, column and value vectors payloads in the CAIDA Telescope data is that it allows the
– Build GraphBLAS matrix from row, column and value emulation of packet streams that are representative of much
vectors; summing duplicate entries higher bandwidth networks than the current experimental setup
– Serialize and compress GraphBLAS matrix is capable of.
– Append to TAR buffer
Figure 8 shows the performance in terms packets per second
• Write TAR buffer to file
versus the number of hardware rings used. GraphBLAS matrix
construction is highly cache sensitive due to the underlying Christian Prothmann, John Radovan, Steve Rejto, Daniela Rus,
index sorting required. The performance of a single ring using Matthew Weiss, Marc Zissman.
a few processing threads/cores is over 50M packets per second.
The performance increases significantly using two rings, and R EFERENCES
then drops with 3 rings because of cache effects. Using 16 [1] “Cisco Visual Networking Index: Forecast and Trends.”
https://newsroom.cisco.com/press-release-content?articleId=1955935.
rings makes up for the cache effects with increased parallelism [2] “Cisco Visual Networking Index: Forecast and Trends, 2018–2023.”
and the performance reaches the maximum number of packets https://www.cisco.com/c/en/us/solutions/collateral/executive-
the sender server can send to the receiver (approximately 88M perspectives/annual-internet-report/white-paper-c11-741490.html.
[3] W. P. Delaney, Perspectives on Defense Systems Analysis. MIT Press,
packets per second). The packet performance can be converted 2015.
into an estimated equivalent bandwidth using a representative [4] S. Topouzi, A. Sarris, Y. Pikoulas, S. Mertikas, X. Frantzis, and
packet size of 10,000 bits per packet (see Figure 8 right vertical A. Giourou, “Ancient mantineia’s defence network reconsidered through
a gis approach,” BAR INTERNATIONAL SERIES, vol. 1016, pp. 559–
axis). The performance measurements indicate that a standard 566, 2002.
server is a capable of constructing anonymized hypersparse [5] Y. Shu and Y. He, “Research on the historical and cultural value of and
traffic matrices at a rate above that corresponding to a typical protection strategy for rammed earth watchtower houses in chongqing,
400 Gigabit network link. china,” Built Heritage, vol. 5, no. 1, pp. 1–16, 2021.
[6] R. Cacciotti, “The ’guardian of the pontifical state’: structural assessment
of a damaged coastal watchtower in south lazio,” Master’s thesis,
VI. C ONCLUSIONS AND F UTURE W ORK Universidade do Minho, 2010.
For many operating domains (land, sea, undersea, air, space, [7] R. A. Watson-Watt, Three Steps to Victory: A Personal Account by
Radar’s Greatest Pioneer. London: Odhams Press, 1957.
..,) long range detection is a cornerstone of defense. Long [8] W. P. Delaney, “Air defense of the united states: Strategic missions and
range detection in the cyber domain requires significant net- modern technology,” International Security, vol. 15, no. 1, pp. 181–211,
work traffic to be analyzed from a variety of observatories 1990.
[9] J. Geul, E. Mooij, and R. Noomen, “Modelling and assessment of the
and outposts. Construction of anonymized hypersparse traffic current and future space surveillance network,” 7th ECSD, 2017.
matrices on edge network devices can be a key enabler by [10] K. W. O’Haver, C. K. Barker, G. D. Dockery, and J. D. Huffaker, “Radar
providing significant data compression in a rapidly analyz- development for air and missile defense,” Johns Hopkins APL Tech.
Digest, vol. 34, no. 2, pp. 140–153, 2018.
able format that protects privacy. Constructing and analyzing [11] “CAIDA Anonymized Internet Traces Dataset (April 2008 - January
anonymized hypersparse traffic matrices are operations ideally 2019).” https://www.caida.org/catalog/datasets/passive dataset/.
suited to the GraphBLAS high performance library. Using an [12] “UCSD Network Telescope.” https://www.caida.org/projects/network telescope/.
[13] “Global Cyber Alliance.” https://www.globalcyberalliance.org/.
Accolade Technologies edge network device the performance [14] “Greynoise.” https://greynoise.io/.
of the GraphBLAS is demonstrated on a near worse case [15] “MAWI Working Group Traffic Archive.” http://mawi.wide.ad.jp/mawi/.
traffic scenario using a continuous stream of CAIDA Telescope [16] “Shadowserver Foundation.” https://www.shadowserver.org/.
[17] J. Kepner, C. Meiners, C. Byun, S. McGuire, T. Davis, W. Arcand,
darknet packets. The performance was explored by varying the J. Bernays, D. Bestor, W. Bergeron, V. Gadepally, R. Harnasch,
number of traffic ring buffers, which are proportional to the M. Hubbell, M. Houle, M. Jones, A. Kirby, A. Klein, L. Milechin,
number of threads and processor cores used. Rates of over J. Mullen, A. Prout, A. Reuther, A. Rosa, S. Samsi, D. Stetson, A. Tse,
C. Yee, and P. Michaleas, “Multi-temporal analysis and scaling relations
50,000,000 packets per second for constructing anonymized of 100,000,000,000 network packets,” in 2020 IEEE High Performance
hypersparse traffic matrices were achieved which exceeds a Extreme Computing Conference (HPEC), pp. 1–6, 2020.
typical 400 Gigabit network link. [18] K. Claffy, “Measuring the internet,” IEEE Internet Computing, vol. 4,
no. 1, pp. 73–75, 2000.
This performance demonstrates that anonymized hyper- [19] B. Li, J. Springer, G. Bebis, and M. H. Gunes, “A survey of network flow
sparse traffic matrices are readily computable on edge network applications,” Journal of Network and Computer Applications, vol. 36,
devices with minimal compute resources and can be a viable no. 2, pp. 567–581, 2013.
[20] M. Rabinovich and M. Allman, “Measuring the internet,” IEEE Internet
data product for such devices. This work suggests a variety of Computing, vol. 20, no. 4, pp. 6–8, 2016.
future directions that could be pursued (1) exploring additional [21] k. claffy and D. Clark, “Workshop on internet economics (wie 2019)
network cards; (2) develpoing the appropriate key management report,” SIGCOMM Comput. Commun. Rev., vol. 50, p. 53–59, May
2020.
architecture for multiple observatories and outposts; (3) analy- [22] J. Kepner, J. Bernays, S. Buckley, K. Cho, C. Conrad, L. Daigle,
sis of spatial temporal patterns in anonymized traffic matrices K. Erhardt, V. Gadepally, B. Greene, M. Jones, R. Knake, B. Maggs,
to identify adversarial activities; (4) cross-correlation of data P. Michaleas, C. Meiners, A. Morris, A. Pentland, S. Pisharody,
S. Powazek, A. Prout, P. Reiner, K. Suzuki, K. Takhashi, T. Tauber,
from different observatories and outposts; (5) development L. Walker, and D. Stetson, “Zero botnets: An observe-pursue-counter
of AI algorithms for classification of background traffic; (6) approach.” Belfer Center Reports, 6 2021.
creation of underlying models of traffic. [23] S. Weed, “Beyond zero trust: Reclaiming blue cyberspace,” Master’s
thesis, United States Army War College, 2022.
[24] J. Kepner and J. Gilbert, Graph algorithms in the language of linear
ACKNOWLEDGMENTS algebra. SIAM, 2011.
The authors wish to acknowledge the following individu- [25] J. Kepner, D. Bader, A. Buluç, J. Gilbert, T. Mattson, and H. Mey-
erhenke, “Graphs, matrices, and the graphblas: Seven good reasons,”
als for their contributions and support: Bob Bond, Stephen Procedia Computer Science, vol. 51, pp. 2453–2462, 2015.
Buckley, Ronisha Carter, Cary Conrad, Alan Edelman, Tucker [26] J. Kepner, P. Aaltonen, D. Bader, A. Buluç, F. Franchetti, J. Gilbert,
Hamilton, Jeff Gottschalk, Nathan Frey, Chris Hill, Mike D. Hutchison, M. Kumar, A. Lumsdaine, H. Meyerhenke, S. McMillan,
C. Yang, J. D. Owens, M. Zalewski, T. Mattson, and J. Moreira, “Math-
Kanaan, Tim Kraska, Andrew Morris, Charles Leiserson, Dave ematical foundations of the graphblas,” in 2016 IEEE High Performance
Martinez, Mimi McClure, Joseph McDonald, Sandy Pentland, Extreme Computing Conference (HPEC), pp. 1–9, 2016.
[27] A. Buluç, T. Mattson, S. McMillan, J. Moreira, and C. Yang, “Design [45] M. Kumar, W. P. Horn, J. Kepner, J. E. Moreira, and P. Pattnaik,
of the graphblas api for c,” in 2017 IEEE International Parallel and “Ibm power9 and cognitive computing,” IBM Journal of Research and
Distributed Processing Symposium Workshops (IPDPSW), pp. 643–652, Development, vol. 62, no. 4/5, pp. 10–1, 2018.
2017. [46] J. Ezick, T. Henretty, M. Baskaran, R. Lethin, J. Feo, T.-C. Tuan, C. Co-
[28] J. Kepner, M. Kumar, J. Moreira, P. Pattnaik, M. Serrano, and H. Tufo, ley, L. Leonard, R. Agrawal, B. Parsons, and W. Glodek, “Combining
“Enabling massive deep neural networks with the graphblas,” in 2017 tensor decompositions and graph analytics to provide cyber situational
IEEE High Performance Extreme Computing Conference (HPEC), awareness at hpc scale,” in 2019 IEEE High Performance Extreme
pp. 1–10, IEEE, 2017. Computing Conference (HPEC), pp. 1–7, 2019.
[29] C. Yang, A. Buluç, and J. D. Owens, “Implementing push-pull efficiently [47] P. Gera, H. Kim, P. Sao, H. Kim, and D. Bader, “Traversing large graphs
in graphblas,” in Proceedings of the 47th International Conference on on gpus with unified memory,” Proceedings of the VLDB Endowment,
Parallel Processing, pp. 1–11, 2018. vol. 13, no. 7, pp. 1119–1133, 2020.
[30] T. A. Davis, “Algorithm 1000: Suitesparse:graphblas: Graph algorithms [48] A. Azad, M. M. Aznaveh, S. Beamer, M. Blanco, J. Chen,
in the language of sparse linear algebra,” ACM Trans. Math. Softw., L. D’Alessandro, R. Dathathri, T. Davis, K. Deweese, J. Firoz, H. A.
vol. 45, Dec. 2019. Gabb, G. Gill, B. Hegyi, S. Kolodziej, T. M. Low, A. Lumsdaine,
[31] J. Kepner and H. Jananthan, Mathematics of big data: Spreadsheets, T. Manlaibaatar, T. G. Mattson, S. McMillan, R. Peri, K. Pingali,
databases, matrices, and graphs. MIT Press, 2018. U. Sridhar, G. Szarnyas, Y. Zhang, and Y. Zhang, “Evaluation of
[32] T. A. Davis, “Algorithm 1000: Suitesparse: Graphblas: Graph algorithms graph analytics frameworks using the gap benchmark suite,” in 2020
in the language of sparse linear algebra,” ACM Transactions on Mathe- IEEE International Symposium on Workload Characterization (IISWC),
matical Software (TOMS), vol. 45, no. 4, pp. 1–25, 2019. pp. 216–227, 2020.
[33] T. Mattson, T. A. Davis, M. Kumar, A. Buluc, S. McMillan, J. Moreira, [49] Z. Du, O. A. Rodriguez, J. Patchett, and D. A. Bader, “Interactive graph
and C. Yang, “Lagraph: A community effort to collect graph algorithms stream analytics in arkouda,” Algorithms, vol. 14, no. 8, p. 221, 2021.
built on top of the graphblas,” in 2019 IEEE International Parallel and
[50] S. Acer, A. Azad, E. G. Boman, A. Buluç, K. D. Devine, S. Ferdous,
Distributed Processing Symposium Workshops (IPDPSW), pp. 276–284,
N. Gawande, S. Ghosh, M. Halappanavar, A. Kalyanaraman, A. Khan,
IEEE, 2019.
M. Minutoli, A. Pothen, S. Rajamanickam, O. Selvitopi, N. R. Tal-
[34] P. Cailliau, T. Davis, V. Gadepally, J. Kepner, R. Lipman, J. Lovitz, and
lent, and A. Tumeo, “Exagraph: Graph and combinatorial methods
K. Ouaknine, “Redisgraph graphblas enabled graph database,” in 2019
for enabling exascale applications,” The International Journal of High
IEEE International Parallel and Distributed Processing Symposium
Performance Computing Applications, vol. 35, no. 6, pp. 553–571, 2021.
Workshops (IPDPSW), pp. 285–286, IEEE, 2019.
[35] T. A. Davis, M. Aznaveh, and S. Kolodziej, “Write quick, run fast: [51] M. P. Blanco, S. McMillan, and T. M. Low, “Delayed asynchronous
Sparse deep neural network in 20 minutes of development time via iterative graph algorithms,” in 2021 IEEE High Performance Extreme
suitesparse: Graphblas,” in 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7, IEEE, 2021.
Computing Conference (HPEC), pp. 1–6, IEEE, 2019. [52] N. K. Ahmed, N. Duffield, and R. A. Rossi, “Online sampling of
[36] M. Aznaveh, J. Chen, T. A. Davis, B. Hegyi, S. P. Kolodziej, T. G. temporal networks,” ACM Transactions on Knowledge Discovery from
Mattson, and G. Szárnyas, “Parallel graphblas with openmp,” in 2020 Data (TKDD), vol. 15, no. 4, pp. 1–27, 2021.
Proceedings of the SIAM Workshop on Combinatorial Scientific Com- [53] A. Azad, O. Selvitopi, M. T. Hussain, J. R. Gilbert, and A. Buluç, “Com-
puting, pp. 138–148, SIAM, 2020. binatorial blas 2.0: Scaling combinatorial algorithms on distributed-
[37] B. Brock, A. Buluç, T. G. Mattson, S. McMillan, and J. E. Moreira, memory systems,” IEEE Transactions on Parallel and Distributed Sys-
“Introduction to graphblas 2.0,” in 2021 IEEE International Parallel and tems, vol. 33, no. 4, pp. 989–1001, 2021.
Distributed Processing Symposium Workshops (IPDPSW), pp. 253–262, [54] D. Koutra, “The power of summarization in graph mining and learning:
IEEE, 2021. smaller data, faster methods, more interpretability,” Proceedings of the
[38] M. Pelletier, W. Kimmerer, T. A. Davis, and T. G. Mattson, “The VLDB Endowment, vol. 14, no. 13, pp. 3416–3416, 2021.
graphblas in julia and python: the pagerank and triangle centralities,” in [55] R. Hofstede, P. Čeleda, B. Trammell, I. Drago, R. Sadre, A. Sperotto,
2021 IEEE High Performance Extreme Computing Conference (HPEC), and A. Pras, “Flow monitoring explained: From packet capture to data
pp. 1–7, 2021. analysis with netflow and ipfix,” IEEE Communications Surveys &
[39] J. Kepner, T. Davis, C. Byun, W. Arcand, D. Bestor, W. Bergeron, Tutorials, vol. 16, no. 4, pp. 2037–2064, 2014.
V. Gadepally, M. Hubbell, M. Houle, M. Jones, A. Klein, P. Michaleas, [56] R. Sommer, “Bro: An open source network intrusion detection system,”
L. Milechin, J. Mullen, A. Prout, A. Rosa, S. Samsi, C. Yee, and Security, E-learning, E-Services, 17. DFN-Arbeitstagung über Kommu-
A. Reuther, “75,000,000,000 streaming inserts/second using hierarchical nikationsnetze, 2003.
hypersparse graphblas matrices,” in 2020 IEEE International Parallel [57] P. Lucente, “pmacct: steps forward interface counters,” Tech. Rep., 2008.
and Distributed Processing Symposium Workshops (IPDPSW), pp. 207–
[58] J. Fan, J. Xu, M. H. Ammar, and S. B. Moon, “Prefix-preserving ip
210, 2020.
address anonymization: measurement-based security evaluation and a
[40] J. Kepner, M. Jones, D. Andersen, A. Buluç, C. Byun, K. Claffy, new cryptography-based scheme,” Computer Networks, vol. 46, no. 2,
T. Davis, W. Arcand, J. Bernays, D. Bestor, W. Bergeron, V. Gadepally, pp. 253–272, 2004.
M. Houle, M. Hubbell, A. Klein, C. Meiners, L. Milechin, J. Mullen,
S. Pisharody, A. Prout, A. Reuther, A. Rosa, S. Samsi, D. Stetson, [59] A. Soule, A. Nucci, R. Cruz, E. Leonardi, and N. Taft, “How to
A. Tse, C. Yee, and P. Michaleas, “Spatial temporal analysis of identify and estimate the largest traffic matrix elements in a dynamic
40,000,000,000,000 internet darkspace packets,” in 2021 IEEE High environment,” in ACM SIGMETRICS Performance Evaluation Review,
Performance Extreme Computing Conference (HPEC), pp. 1–8, 2021. vol. 32, pp. 73–84, ACM, 2004.
[41] J. Kepner, K. Cho, K. Claffy, V. Gadepally, P. Michaleas, and [60] Y. Zhang, M. Roughan, C. Lund, and D. L. Donoho, “Estimating point-
L. Milechin, “Hypersparse neural network analysis of large-scale internet to-point and point-to-multipoint traffic matrices: an information-theoretic
traffic,” in 2019 IEEE High Performance Extreme Computing Confer- approach,” IEEE/ACM Transactions on Networking (TON), vol. 13,
ence (HPEC), pp. 1–11, 2019. no. 5, pp. 947–960, 2005.
[42] J. Kepner, K. Cho, K. Claffy, V. Gadepally, S. McGuire, L. Milechin, [61] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela,
W. Arcand, D. Bestor, W. Bergeron, C. Byun, M. Hubbell, M. Houle, “Community structure in time-dependent, multiscale, and multiplex
M. Jones, A. Prout, A. Reuther, A. Rosa, S. Samsi, C. Yee, and networks,” science, vol. 328, no. 5980, pp. 876–878, 2010.
P. Michaleas, “New phenomena in large-scale internet traffic,” in Mas- [62] P. Tune, M. Roughan, H. Haddadi, and O. Bonaventure, “Internet traffic
sive Graph Analytics (D. Bader, ed.), pp. 1–53, Chapman and Hall/CRC, matrices: A primer,” Recent Advances in Networking, vol. 1, pp. 1–56,
2022. 2013.
[43] P. Devlin, J. Kepner, A. Luo, and E. Meger, “Hybrid power-law models [63] J. Kepner, V. Gadepally, L. Milechin, S. Samsi, W. Arcand, D. Bestor,
of network traffic,” arXiv preprint arXiv:2103.15928, 2021. W. Bergeron, C. Byun, M. Hubbell, M. Houle, M. Jones, A. Klein,
[44] A. Tumeo, O. Villa, and D. Sciuto, “Efficient pattern matching on P. Michaleas, J. Mullen, A. Prout, A. Rosa, C. Yee, and A. Reuther,
gpus for intrusion detection systems,” in Proceedings of the 7th ACM “Streaming 1.9 billion hypersparse network updates per second with
International Conference on Computing Frontiers, CF ’10, (New York, d4m,” in 2019 IEEE High Performance Extreme Computing Conference
NY, USA), p. 87–88, Association for Computing Machinery, 2010. (HPEC), pp. 1–6, 2019.
[64] J. Nair, A. Wierman, and B. Zwart, “The fundamentals of heavy tails:
Properties, emergence, and estimation,” Preprint, California Institute of
Technology, 2020.
[65] J. Karvanen and A. Cichocki, “Measuring sparseness of noisy signals,”
in 4th International Symposium on Independent Component Analysis
and Blind Signal Separation, pp. 125–130, 2003.
[66] A. J. Elmore, J. Duggan, M. Stonebraker, M. Balazinska, U. Cetintemel,
V. Gadepally, J. Heer, B. Howe, J. Kepner, T. Kraska, et al., “A
demonstration of the bigdawg polystore system,” Proceedings of the
VLDB Endowment, vol. 8, no. 12, p. 1908, 2015.
[67] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, “The case
for learned index structures,” in Proceedings of the 2018 International
Conference on Management of Data, SIGMOD 18, (New York, NY,
USA), pp. 489–504, Association for Computing Machinery, 2018.
[68] E. H. Do and V. N. Gadepally, “Classifying anomalies for network
security,” in ICASSP 2020 - 2020 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 2907–2911,
2020.
[69] S. Pisharody, J. Bernays, V. Gadepally, M. Jones, J. Kepner, C. Meiners,
P. Michaleas, A. Tse, and D. Stetson, “Realizing forward defense in the
cyber domain,” in 2021 IEEE High Performance Extreme Computing
Conference (HPEC), pp. 1–7, IEEE, 2021.
[70] B. Nichols, D. Buttlar, and J. P. Farrell, Pthreads programming. O’Reilly
& Associates, Inc., 1996.

You might also like