0% found this document useful (0 votes)
8 views

Detecting IoT Devices in The Internet

Uploaded by

Sarthak Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Detecting IoT Devices in The Internet

Uploaded by

Sarthak Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO.

5, OCTOBER 2020 2323

Detecting IoT Devices in the Internet


Hang Guo and John Heidemann, Fellow, IEEE, Senior Member, ACM

Abstract— Distributed Denial-of-Service (DDoS) attacks Mirai botnet used in these attacks has been estimated at
launched from compromised Internet-of-Things (IoT) devices 145k [33] and 100k [10]. Source code to the botnet was
have shown how vulnerable the Internet is to large-scale DDoS released [25], showing it targeted IoT devices with multiple
attacks. To understand the risks of these attacks requires vulnerabilities.
learning about these IoT devices: where are they? how many are
there? how are they changing? This paper describes three new If we are to defend against IoT security threats, we must
methods to find IoT devices on the Internet: server IP addresses understand how many and what kinds of IoT devices are
in traffic, server names in DNS queries, and manufacturer deployed. Our paper proposes three algorithms to discover the
information in TLS certificates. Our primary methods (IP location, distribution and growth of IoT devices. We believe
addresses and DNS names) use knowledge of servers run by our algorithms and results could help guide the design and
the manufacturers of these devices. Our third method uses TLS deployment of future IoT security solutions by revealing the
certificates obtained by active scanning. We have applied our scale of IoT security problem (how wide-spread are certain
algorithms to a number of observations. With our IP-based
algorithm, we report detections from a university campus over
IoT devices in the whole or certain part of Internet?), the
4 months and from traffic transiting an IXP over 10 days. We problem’s growth (how quickly do new IoT devices spread
apply our DNS-based algorithm to traffic from 8 root DNS over the Internet?) and the distribution of the problem (which
servers from 2013 to 2018 to study AS-level IoT deployment. countries or autonomous systems have certain IoT devices?).
We find substantial growth (about 3.5×) in AS penetration Our goal here is to assess the scope of the IoT problem;
for 23 types of IoT devices and modest increase in device type improving defenses is complementary future work.
density for ASes detected with these device types (at most Our IoT detection algorithms can also help network
2 device types in 80% of these ASes in 2018). DNS also shows
substantial growth in IoT deployment in residential households
researchers study the distribution and growth of target IoT
from 2013 to 2017. Our certificate-based algorithm finds 254k devices and help IT administrators discover and monitor IoT
IP cameras and network video recorders from 199 countries devices in their network. As more every-day objects get
around the world. connected into the Internet, our algorithms may even help
Index Terms— Internet-of-Things (IoT), measurement understand the physical world by, for example, detecting and
techniques. tracking network-enabled vehicles for crime investigation.
Our first contribution is to propose three IoT detection
I. I NTRODUCTION
methods. Our two main methods detect IoT devices from

T HERE is huge growth in sales and the installed base of


Internet-of-Things (IoT) devices like Internet-connected
cameras, light-bulbs, and TVs. Gartner forecasts the global
observations of network traffic: IPs in Internet flows (§II-A.2)
and stub-to-recursive DNS queries (§II-A.3). They both use
knowledge of servers run by manufacturers of these devices
IoT installed base will grow from 3.81 billion in 2014 to (called device servers). Our third method detects IoT devices
20.41 billion in 2020 [12]. supporting HTTPS remote access (called HTTPS-Accessible
This large and growing number of devices, coupled with IoT devices) from the TLS (Transport Layer Security [8])
multiple security vulnerabilities, brings an increasing con- certificates they use (§II-B). (We reported an early version
cern about the security threats they raise for the Internet of IP-based detection method [18]; here we add additional
ecosystem. A significant risk is that compromised IoT devices methods and better evaluate our prior method in §III-A.2.)
can be used to mount large-scale Distributed Denial-of- Our second contribution is to apply our three detec-
Service (DDoS) attacks. In 2016, the Mirai botnet, with over tion methods to multiple real-world network measurements
100k compromised IoT devices, launched a series of DDoS (Table II). We apply our IP-based method to flow-level traffic
attacks that set records in attack bit-rates. Estimated attack from a college campus over 4 months (§III-A.2) and a regional
sizes include a 620 Gb/s attack against cybersecurity blog IXP (Internet Exchange Point [6]) over 10 days (§III-A.3). We
KrebsOnSecurity.com (2016-09-20) [21], and a 1 Tb/s apply our DNS-based method to DNS traffic at 8 root name
attack against French cloud provider OVH (2016-09-23) [33] servers from 2013 to 2018 (§III-B.1) to study IoT deployment
and DNS provider Dyn (2016-10-21) [10]. The size of the by Autonomous Systems (ASes [22]). We find about 3.5×
Manuscript received March 26, 2019; revised November 8, 2019 and growth in AS penetration for 23 types of IoT devices and
June 23, 2020; accepted July 10, 2020; approved by IEEE/ACM T RANSAC - modest increase in device type density for ASes detected
TIONS ON N ETWORKING Editor H. Seferoglu. Date of publication July 28, with these device types (we find at most 2 known device
2020; date of current version October 15, 2020. This work was supported types in 80% of these ASes in 2018). We confirm substantial
by the Air Force Research Laboratory under Agreement FA8750-17-2-0280.
(Corresponding author: Hang Guo.) deployment growth at household-level by applying DNS-
The authors are with the Department of Computer Science, University of based method to DNS traffic from a residential neighborhood
Southern California (USC), Los Angeles, CA 90292-6695 USA, and also from 2013 to 2017 (§III-B.2). We apply our certificate-based
with the Information Sciences Institute, University of Southern California method to a public TLS certificate dataset (§III-C) and find
(USC), Marina Del Rey, CA 90292 USA (e-mail: [email protected];
[email protected]). 254K IP cameras and network video recorders (NVR) from
Digital Object Identifier 10.1109/TNET.2020.3009425 199 countries.
1063-6692 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2324 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

This paper builds on prior work in the area. We draw on TABLE I


data from University of New South Wales (UNSW) [38]. T HE 10 I OT D EVICES T HAT W E P URCHASED
Others are currently studying the privacy and vulnerabilities
of individual devices (for example [1]); we focus on detection.
Prior work has studied detection [3], [5], [9], [28], [36]–[38],
but we use different detection signals to observe devices
behind NATs (Network Address Translations devices [11]) as
well as those on public IP addresses (detailed comparisons
in §V). We published an early version of IP-based detection in
a workshop [18]. This paper adds two new detection methods:
DNS-based detection (§II-A.3) and certificate-based detection
(§II-B) and adds a new 4-month study of IoT devices on
college campus for IP-based detection (§III-A.2).
Our studies of IP-based and DNS-based detections are 1) Identifying Device Server Names: Our approach depends
approved by USC IRB as non-human subject research on knowing what servers devices talk to. Our goal is to find
(IRB IIR00002433 on 2018-03-27 and IRB IIR00002456 on domain names for all servers that IoT devices regularly and
2018-04-19). We make data captured from our 10 IoT devices uniquely talk to. However, we need to remove server names
(Table I) public at [16]. that are often shared across multiple types of devices, since
they would otherwise produce false detections.
II. M ETHODOLOGY Note that even with our filtering of common shared server
We next describe our three methods to find IoT devices. names, we sometimes find servers that are shared across
Our three detection methods use different types of network multiple types of devices. We handle this ambiguity from
measurements (IPs in Internet flows §II-A.2, stub-to-recursive shared servers by not trying to distinguish these devices types
DNS queries §II-A.3 and TLS certificates §II-B) to achieve in detection, as we explain later in this section.
different coverage of IoT devices. Combining our three meth- Identifying Candidate Server Names: We bootstrap our
ods reveals a more complete picture of IoT deployment in the list of candidate server names by purchasing samples of IoT
Internet. (However, even with all three methods, we do not devices and recording who they talk to. We describe the list of
claim complete coverage of global IoT deployment.) devices we purchased in Table I and provide the information
we learned as a public dataset [16].
For each IoT device we purchase, we boot it and record
A. IP and DNS-Based Detection Methods the traffic it sends. We extract the domain name of server
Our two main methods detect general IoT devices from two candidates from type A DNS requests made by target IoT
types of passive measurements: Internet flows, measured from device in operation. We capture DNS queries at the ingress
any vantage point in the Internet (IP-based method); and DNS side of recursive DNS resolver to mitigate effects of DNS
queries, measured between stub and recursive servers (DNS- caching.
based method). These two methods cover IoT devices that Filtering Candidate Server Names: We exclude domain
are visible to these two data sources, including those that use names for two kinds of servers that would otherwise cause
public IP addresses or are behind NAT devices. false positives in detection. One is third-party servers: servers
Our methods exploits the observation that most IoT devices not run by IoT manufacturers that are often shared across many
exchange traffic regularly with device-specific servers. (For device types. The other is human-facing servers: servers that
example, IoT inspector project observes 44,956 IoT devices also serve human.
from 53 manufactures talking to cloud servers during normal Third-party servers usually offer public services like time,
operation [20].) If we know these servers, we can identify IoT news and music streaming and video streaming. If we include
devices by watching traffic for these packet exchanges. Since them, they would cause false positives because they interact
servers are usually unique for each class of IoT device, we can many different clients.
also identify the types of devices. Our approaches consider We consider server name S as a third-party server for some
only with whom IoT devices exchange traffic, not patterns IoT product P if neither P ’s manufacturer nor the sub-brand
like timing or rates, because patterns are often obscured when P belongs to (if any) is a substring of S’s domain (regardless
traffic mixes (such as with multiple devices behind a NAT). of case). We define domain of a URL as the immediate left
Our two methods depend on identifying servers that devices neighbor of the URL’s public suffix. (We identify public suffix
talk to (§II-A.1), and looking for these servers by IP address based on public suffix list from Mozilla Foundation [30]). We
(§II-A.2) and DNS name (§II-A.3). use Python library tldextract to identify TLD suffixes [23].
Although our method is general, it requires knowledge of Human-facing servers serve both human and device (note
what servers devices talk to, and therefore it requires device- that all server candidates serve device because they are DNS
specific data obtained by us or others. We still detect devices queried by IoT devices in the first place). They may cause
that change the servers with which they interact provided they mis-classifying a laptop or cellphone (operated by human) as
continue to talk to most of their old servers. For IoT devices IoT devices.
behind NAT, our methods only identify the existence of each We identify human-facing servers by if they respond to web
type of IoT devices but can not know the exact number of requests (HTTP or HTTPS GET) with human-focused content.
devices for each type because we cannot count NATted devices We define respond as returning an HTML page with status
outside the NAT. code 200. We define human-focused content as the existence

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
GUO AND HEIDEMANN: DETECTING IoT DEVICES IN THE INTERNET 2325

of any web content instead of place-holder content. Typically Completeness Threshold Selection: Since some device
place-holder content is quite short. (For example, http: servers may serve both devices and individuals (due to we
//appboot.netflix.com shows place holder “Netflix use necessary condition to determine device-facing server in
appboot” and is just 487 bytes.) So we treat HTML text longer §II-A.1 and risk mis-classifying human-facing manufacturer
than 630 bytes as human-focused content. We determined server as device server) and sometimes we might miss traffic
this threshold empirically from HTTP and HTTPS content to a server name due to observation duration or lost captures,
at 158 server domain names queried by our 10 devices we set a threshold of server names required to indicate the
(Table I). presence of each IoT device type. This threshold is typically
We call the remaining server names device-facing manu- a majority, but not all, of the server names we observe a
facturer server, or just device servers, because they are run representative device talk to in the lab. (This majority-but-not-
by IoT manufacturers and serve devices only. We use device all threshold also mitigates potential detection misses caused
servers for detection. by devices that start talking to new servers.)
Handling Shared Server Names: Some device server Most devices talk to a handful of device server names (up
names are shared among multiple types of IoT devices from to 20, from our laboratory measurements §III-A.1). For these
the same manufacturer and can cause ambiguity in detection. types of devices, we require seeing at least 2/3 device server
If different device types share the exact set of server names, names to believe a type of IoT device exists at a given source
then we cannot distinguish them and simply treat them as the IP address. Threshold 2/3 is chosen because for devices with
same type—a device merge. 3 or more server names, requiring seeing anything more than
If different device types have partially overlapping sets of 2/3 server names will be equivalent to requiring seeing all
device server names, we can not guarantee they are distin- server names for some devices. For example, requiring at least
guishable. If we treat them as separate types, we risk false 4/5 server names is equivalent to requiring all server names
positives and confusing the two types. We avoid this problem for devices with 3 to 4 device server names.
with detection merge: when we detect device types sharing For devices that talk to many device server names (more
common server names, we conservatively report we detect at than 20), we lower our threshold to 1/2. Typically these are
least one of these device types. (Potentially we could look devices with many functions and the manufacturer uses a large
for unique device servers in each type; we do not currently pool of server names. (For example, our Amazon_FireTV,
do that.) as in Table I, has 41 device server names.) Individual devices
Handling Future Server Name Change: The server names will most likely talk to only a subset of the pool, at least over
that our devices (Table I) use are quite stable over 1 to short observations.
1.5 years (as shown in §IV-B). However, both our IP-based and Limitation: Although effective, IP-based detection faces
DNS-based detection risks missing devices that get software two limitations. First, it cannot detect IoT devices in previously
updates that cause them to talking to new server names. We stored traces, since we usually do not know device server
mitigate these potential missed detections by reporting that a IPs in the past, and coverage of commercial historical DNS
device exists when we see a majority of server names for datasets can be limited ( [18]). Second, we assume we can
that device (both IP-based method §II-A.2 and DNS-based learn the set of servers the IoT devices talk to. If we do not
method §II-A.3). For DNS-based method, we also propose learn all servers during bootstrapping (§II-A.1), or if device
a technique to discover new device server names during behavior changes (perhaps due to a firmware update), we
detection (§II-A.3). need to learn new servers. However we cannot learn new
2) IP-Based IoT Detection Method: Our first method detects device servers during IP-based detection because we find it
IoT devices by identifying packet exchanges between IoT hard to judge if an unknown IP is a device server, even with
devices and device servers. For each device type, we track help of reverse DNS and TLS certificates from that IP. These
device-type-to-server-name mapping: a list of device server limitations motivate our next detection method.
names that type of devices talks to. We then define a threshold 3) DNS-Based IoT Detection Method: Our second method
number of server names; we interpret the presence of traffic detects IoT devices by identifying the DNS queries prior
to that number of server names (identified by server IP) from to actual packet exchanges between IoT devices and device
a given IP address as indicating the presence of that type of servers.
IoT device. Strengths: This method addresses the two limitations
Tracking Server IP Changes: We search for device for IP-based detection (§II-A.2). First, we can directly apply
servers by IP addresses in traffic, but we discover device DNS-based detection to old network traces because server
servers by domain names in sample devices. We therefore need names are stable while server IP can change. Second, we can
to track when DNS resolution for server name changes. learn new device server names during DNS-based detection by
We assume server names are long-lived, but the IP addresses examining unknown server names DNS queried by detected
they use sometimes change. IoT devices and learning those look like device servers (using
We also assume server-name-to-IP mappings could be rules in §II-A.1).
location-dependent. Limitations: This method requires observation of DNS
We track changes of server-name-to-IP mapping by resolv- queries between end-user machines and recursive DNS servers,
ing server names to IP addresses every hour (frequent enough limiting its use to locations that can see “under” recursive
to detect possible DNS-based load balancing). To make sure DNS revolvers. This method also works with recursive-to-
IPs for detection are correct, we track server IPs across the authority DNS queries (see §III-B) when observations last
same time period and at roughly the same geo-location as the longer than DNS caching, since then we see users-driven
measurement of network traffic under detection. queries for server names even above the recursive. Detection

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2326 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

with recursive-to-authority DNS queries reveals presence of but forwarded to a public port. However, certificate scanning
IoT devices at the AS-level, since recursives are usually run will miss devices behind NATs that lack public-facing IP
by ISPs (Internet service providers [39]) for their users. addresses and IoT devices that do not use TLS
Method Description: Our DNS-based method has three Note that prior work has mapped TLS certificate to IoT
components: detection, server learning and device splitting. devices, both by matching text (like “IP camera”) with cer-
Figure 1 illustrates this method’s overall workflow: it repeat- tificates [36], and by using community-maintained annota-
edly conducts detections with the latest knowledge of IoT tions [9]. In comparison, our method uses multiple techniques
device server names, learns new device server names after to improve the accuracy of certificate matching, and also
each detection, and terminates when no new server names are confirms that matched certificates come from HTTPS servers
learned (see the loop of “Detection” and “Server Learning” running in IoT devices.
in Figure 1). This method also revises newly learned server We use existing public crawls of IPv4 TLS certificates.
names by device splitting if it suspects they are incorrect, as We first identify candidate certificates: the TLS certificates
signaled by decreased detection after new server names are that contain target devices’ manufacturer names and (option-
added (see “Device Splitting” in Figure 1). ally) product information. Candidate certificates most likely
Detection: Similar to §II-A.2, for each type of IoT devices, come from HTTPS servers related to target devices such as
we track a list of device server names that type of device talks HTTPS servers ran by their manufacturers and HTTPS servers
to. We interpret presence of DNS queries for above a threshold ran directly in them. We then identify IoT certificates: the
(same as §II-A.2) amount of device server names from a give candidate certificates that come from HTTPS servers running
IP address as presence of that IoT device type. (We call this directly in target devices. Each IoT certificate represents a
IP IoT user IP.) HTTPS-Accessible IoT device.
To cover possible variants of known device servers, in detec- 1) Identify Candidate Certificates: We identify candidate
tion, we treat digits in server name’s sub-domain as matching certificates for every target device by testing each TLS certifi-
any digit. We define sub-domain of a URL as everything cate against a set of text strings we associate with each device
on the left of the URL’s domain (URL’s domain as defined (called matching keys). (We describe where our list of target
in §II-A.1). devices is found in §III-C.)
Server Learning: After each detection, we learn new device Matching Keys: We build a set of matching keys for
server names and use them in subsequent detections. Specif- each target device with the goal to suppress false positives
ically, we examined unknown server names DNS queried in finding candidate certificates. If a target device’s manu-
by IoT user IPs and if we find any unknown server names facturer does not produce any other type of Internet-enabled
resemble device servers for certain IoT device detected at products (per product information on manufacturer websites),
certain IoT user IP (judged by rules in §II-A.1), we extend its matching key is simply the name of its manufacturer
this IoT device’ server name list with these unknown server (called manufacturer key). Otherwise, its matching keys will
names. be manufacturer key plus its product type (like “IP Camera”).
Device Splitting: We may incorrectly merge two types of We also include IoT-specific sub-brands (if any). For example,
devices that talk to different set of servers if we only know “American Dynamics” is the sub-brand associated the IP
their shared server names prior to detection. cameras manufactured by Tyco International.
Incorrect device merges can reduce detection rates. When We do two kinds of matching between a matching key K
we falsely merge different device types P 1 and P 2 as P , we and a field S in TLS Certificate: Match means K is a substring
risk learning new server names for the merged type P that of S (ignore case); Good-Match means K is a Match of S
P 1 and P 2 devices do not both talk to and causing reduced and the character(s) adjacent to K’s match in S are neither
detections of P in subsequent iterations because we miss some alphabetical nor numbers. For example, “GE” is a Match
P 1 (or P 2) devices by searching for the newly-acquired server but not a Good-Match of “Privilege” because the adjacent
names that P 1 (or P 2) do not talk to. characters of “GE” in “Privilege” is “e” (an alphabet). (We
Device splitting addresses this problem by reverting incor- do not simply look for identical K and S because often S
rect merge. If we detect fewer device types P at certain IP uses a prefix or suffix. For example, a certificate’s subject-
after learning new server names, we know P is an incorrect organization field “Amcrest Technologies LLC” will be a
merge of two different device types, P 1 and P 2, and that Good-Match with manufacturer key “Amcrest”, but is not
the new server names learned for P do not apply for both identical due to the suffix “Technologies LLC”.)
P 1 and P 2. We thus split P into P 1 and P 2, with P 1 Requiring Good-Match for manufacturer keys reduces false
talking to P ’s server names before last server learning (without positives caused by IoT manufacturer names being substrings
newly-learned server names) and P 2 talk to P ’s latest server of other companies. For example, name of IP camera man-
names (with the new server names). We show an example of ufacturer “Axis_Communications” is a substring of Telecom
how device splitting reverts an incorrect device merge later in company “Maxis_Communications” but they are not a
controlled experiment (§IV-B). Good-Match.
We use the Match (not Good-Match) rule for other keys
(product types and sub-brand) because they require greater
B. Certificate-Based IoT Detection Method flexibility. For example, product type “NVR” can be matched
Our third method detects IoT devices using HTTPS by to text string like ”myNVR”.
active scanning for TLS certificates and identifying target IoT Key Matching Algorithm: We test each TLS certifi-
devices’ TLS certificates. This method thus covers HTTPS- cate (input) with matching keys from each target device.
Accessible IoT devices either with public IPs or behind NATs Specifically, we examine four subject fields in a TLS certificate

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
GUO AND HEIDEMANN: DETECTING IoT DEVICES IN THE INTERNET 2327

Fig. 1. Workflow for DNS-based IoT detection with server learning.

C (organization CO , organization units COU , common name TABLE II


CCN and SubjectAltNames CDN , if present) and consider C D ATASETS FOR R EAL -W ORLD I OT D ETECTION
a candidate certificate for device P if P ’s manufacturer key
P
(Km ) Good-Matches CO and any non-manufacturer keys for
P Match any of these four subject fields in C.
P
We handle two edge cases when testing if Km Good-
Matches CO . If CO is empty, or an default (“SomeOrganiza-
P
tion” or “company”), we instead test if Km Good-Matches any
of the other three fields we examine (COU , CCN and CDN ). If
we compare Km P
to a field that is a URL, we only match Km P III. R ESULTS : I OT D EVICES IN THE W ILD
against the URL’s domain part (URL’s domain as defined in We next apply our detection methods with real-world net-
§II-A.1) because domain shows ownership of a server name. work traffic (Table II) to learn about the distribution and
(For example, Accedo Broadband owns *.sharp.accedo. growth of IoT devices in the wild.
tv’’, not Sharp.) Although ground truth for the entire Internet is impossible to
2) Identify IoT Certificate: We identify IoT-specific certifi- get, we demonstrate our methods show high detection accuracy
cates because they are not typically signed by a certificate in controlled experiments with controlled ground truth in §IV,
authority (CA). We identify them because they are self-signed and we evaluate at our university and with an IXP next.
and lack valid domain names.
Self Signing: Many HTTPS servers on IoT devices use A. IP-Based IoT Detection Results
self-signed certificates rather than CA-signed certificates to To apply our IP-based detection, we first extract device
avoid the cost and complexity of CA-signing. We consider a server names from 26 devices by 15 vendors (§III-A.1). We
candidate certificate C (for device P ) self signed if C’s issuer then apply detection to Internet flows at a college campus
organization CiO is either a copy of any of the 4 subject fields from a 4-month period (§III-A.2) and partial traffic from an
we examined (CO , COU , CCN and CDN ) or is Good-Matched IXP (§III-A.3).
P 1) Identifying Device Server Names: We use device servers
by P ’s manufacturer key (Km ).
Lacking Valid Domain Names: Often IoT users from two sets of IoT devices in detection: 10 IoT devices we
lack dedicated DNS domain names for their home net- purchased (Table I) and 21 IoT devices from data provided
work. The only exception we found is some devices by the UNSW (devices as listed in Figure.1b of [38]) (Our
use “www.”+manufacter+“.com” as a place holder for 10 devices were chosen for their popularity on amazon.com
CCN . (For example, www.amcrest.com for Amcrest in 2018.) We extract device server names from both sets of
IP Camera.) devices with method in §II-A.1.
We consider a candidate certificate C lacking valid domain We break-down server names we found. Of the 171 candi-
names if none of the values in CCN and CDN (if present) date server names from our 10 devices, about half (56%, 96)
is a valid domain name. We ignore Dynamic DNS names are third-party servers, providing time, news or music stream-
(using a public list of dynamic DNS providers [32]) and ing, while the other half (44%, 75) are manufacturer servers.
default names. Of these manufacturer servers, only a small portion (7%, 5)
are human-facing (like prime.amazon.com). The majority
of manufacturer servers (93%, 70) are device-facing and will
C. Adversarial Prevention of Detection be used in detection.
Although our methods generally work well in IoT detection, We manually examine the 171 candidate server names
they are not designed to prevent an adversary from hiding and confirm the classifications for most of them are correct
IoT devices. For example, use of a VPN (Virtual Private (for 157 out of 171, ownership of server domain is verified by
Network [13]) that tunnels traffic from the IoT to its servers whois or websites).
would evade IP-based detection. IoT devices that access device We cannot verify ownership of 11 candidate server names.
servers with hard-coded IP addresses rather than DNS names Luckily, our method lists them as third-party servers and
will avoid our DNS-based detection. Although an adversary they will not be used in detection. We find three candidate
can hide IoT devices, since they are designed for consumer server-names (api.xbcs.net, heartbeat.lswf.net,
use and to minimize costs, we do not anticipate widespread and nat.xbcs.net) are falsely classified as third-party
intentional concealment of IoT devices. (We did not observe servers. We confirm they are run by IoT manufacturer Belkin
any devices intentionally avoiding detection during our based on “whois lswf.net” and prior work [34] and add
study) them back to our list. These three server names fail our test

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2328 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

TABLE III TABLE IV


4-M ONTH I OT D ETECTION R ESULTS ON USC C AMPUS AND AUGUST I OT D ETECTION R ESULTS ON USC C AMPUS
O UR E STIMATIONS OF I OT U SERS AND D EVICES (M ERGING IP S W ITH I DENTICAL D ETECTIONS )

with multiple device types, suggesting the use of network-


address translation (NAT) devices. We also find two sets of
for manufacturer server (§II-A.1) because their domains show IoT user IPs (A and H; C and F), each sharing the exact set
no information of manufacturer. of IoT device types. A likely explanation is these two sets
Similarly, we extracted 48 device servers from 18 of 21 IoT of IPs belong to two IoT users using dynamically assigned
devices from UNSW (using datasets available on their website IP addresses, and these addresses change one time during our
https://iotanalytics.unsw.edu.au). The remain- 4-day observation. (More discussions of IoT users on campus
ing 3 of their devices are not detectable with our method later.)
because they only visit third-party and human-facing servers. Since USC guest WiFi dynamically assigns IPs, our counts
Combining server names measured from our 10 devices of IoT detections and IoT user IPs risk over-estimating actual
and the 18 detectable devices from UNSW (merging two IoT deployments on campus. When one user gets multiple IPs,
duplicate devices, Amazon_Echo and TPLink_Plug) gives our IoT user IP count over-estimates IoT user count. When one
us 26 detectable IoT devices; Among these 26 detectable user’s devices show up in multiple IPs, our IoT detection count
IoT devices, we merge TPLink_IPCam, TPLink_Plug and gets inflated. (We validate our claim that dynamic IPs inflate
TPLink_Lightbulb as a meta-device because they talk to detection in §IV-A.)
the same set of of device servers (a device merge, Estimating Numbers of IoT Users and Devices: To get a
recall in §II-A.1). Similarly, we merge Belkin_Switch and better knowledge of actual IoT deployments on campus, we
Belkin_MotionSensor. After device merge, we are left with estimate the number of IoT users on campus based on the
23 merged devices talking to 23 distinct sets of device server insight that although one user could get assigned different IPs,
names. (Together they have 99 distinct device server names.) he may still be identified by the combination of IoT device
By detecting with these server names, we are essentially types he owns. We then infer the number of IoT devices we
looking for 23 types of IoT devices that talk to these 23 set of see on campus given this many users.
server names, including but not limited to the 26 IoT devices We infer the existence of IoT users by clustering IoT user
owned by us and UNSW. IPs from the same month or adjacent months that have similar
2) IoT Deployment in a College Campus: We apply our detections. We consider detections at two IPs (represented by
IP-based detection method to partial network traffic from our two sets of detected IoT device types d1 and d2, without detec-
university campus for a 4-month period in 2018. tion merge) to be similar if they satisfy the following heuristic:
Input Datasets: We use passive Internet measurements size(intersect(d1, d2))/size(union(d1, d2)) ≥ 0.8.
at the University of Southern California (USC) guest WiFi While our technique risks under-estimating the number of
for 4 different 4-day-long periods from August to Novem- IoT users by combining different users who happen to own
ber in 2018 (Table II). To protect user privacy, packet payloads same set of device types into one user, we argue this error
are not kept and IPs are anonymized by scrambling the last is unlikely because most IP addresses that have IoT devices
byte of each IP address in a prefix preserving manner. (16 out of 21, 76%) show multiple device types (at least 4,
Input Server IPs: Since server-name-to-IP bindings could without detection merge), and the chance that two different
vary over time and physical locations (as discussed in §II-A.2), users have identical sets of device types seems low.
we collect latest IPv4 addresses for our 99 device server name We find three clusters of IPs: with one each spanning 4,
daily at USC, as described in §II-A.1. Ideally we would always 3 and 2 months. These three clusters of IPs likely belong to
use the latest server IPs in detection. However, due to outages three campus residents who could install their IoT devices
in our infrastructure, we can ensure the server IPs we use in relatively permanently on campus, such as students living on
detections are no more than one-month old. campus and faculty (or staff) who have office on campus.
IoT Detection Results: As shown in Table III, IoT detec- We find four IPs that do not belong to any clusters. These
tions increase on campus from August to September (from four IPs likely belong to four campus non-residents who only
13 to 23), but decrease in October and November (to 19 and brought their devices to campus briefly, such as students living
then 10). In comparison, IoT user IPs on campus remain the off-campus and other campus visitors.
same from August to October (6) and drop in November (3). We then estimate number of IoT devices on campus in each
(We discuss reasons behind these variations in campus IoT month by adding up devices owned by estimated IoT users
deployment later in this section.) in each month. We estimate devices owned by a user in a
We show our August detection results in Table IV. (Detec- given month by taking the union of device types detected from
tions in other months are similar.) Note that “Amazon_*” this user’s IPs in this month and assuming this user owns
in Table IV stands for at least one of Amazon_FireTV and exactly one device from each detected type. (Recall from §II-A
Amazon_Echo. Similarly “Withings_*” stands for at least one that for NATted IoT devices, our method only identifies the
of Withings_Scale and Withings_SleepSensor (recall detection existence of device types but cannot know the device count for
merge in §II-A.1). We find that IoT user IPs are often detected each type.)

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
GUO AND HEIDEMANN: DETECTING IoT DEVICES IN THE INTERNET 2329

We summarize our estimated numbers of IoT users and


devices in Table III. (Our estimated IoT device counts are
ranges of numbers because we do not always know the exact
number of detected device types due to detection merge). Our
first observations is campus residents are mostly stable except
an existing resident disappear in November (likely due to he
stops using his only detected device type: LiFX_LightBulb)
and a new resident show up in October (likely due to a faculty
or staff installing new IoT devices in their office).
Fig. 2. Overall AS penetration for our 23 device types from 2013 to 2018.
Our second observation is number of campus non-residents
differs a lot by month. While we find 3 non-residents in
September and 1 non-resident in October, we find none in
August and November. One explanation for this trend is
there are more campus events in the middle of the semester
(September and October) which attracts more campus visitors
(potentially bringing IoT devices).
We argue that the small number of IoT users and devices
we detect is an under-estimation of the actual campus IoT
deployment since our measurements only cover campus guest
Fig. 3. ECDF for device type density in IoT-ASes from 2013 to 2018.
WiFi and we expect IoT devices to be deployed on wired
networks and secure WiFi that we do not cover.
3) IoT Devices at an IXP: We also apply IP-based detection
to partial traffic from an IXP, using FRGPContinuousFlow-
Data (FRGP) dataset [41] collected by Colorado State Univer-
sity from 2015-05-10 to 2015-05-19 (10 days), as in Table II.
We find 122 triggered detections of 10 to 11 device types (we
do not know exact number of types due to detection merge
§II-A.1) from 111 IPs. (Similar to §III-A.2, since clients
of FRGPs may use dynamically assigned IPs, our detection
counts and IoT user IPs counts risk being inflated.) Please see Fig. 4. Detected IoT-ASes under extended observation at B root.
our tech report for details [17].
With more than half of all 13 root letters (62%, 8 out of 13),
B. DNS-Based IoT Detection Results we expect to observe queries from the majority of recursives
We next apply our DNS-based detections to two real-world in the Internet because prior work has showed that under 2-day
DNS datasets. observation, most (at least 80%) recursives query multiple root
1) Global AS-Level IoT Deployments: We apply detection letters (with 60% recursives query at least 6 root letters) [31].
to Day-in-the-Life of the Internet (DITL) datasets from 2013 to However, even with visibility to the majority of recursives, our
2018 to explore growth of AS-level deployments for our detection still risks under-estimating AS-level IoT deployment
23 device types. because the 2-day DITL measurement is too short to observe
Input Datasets: our detection uses DITL datasets from DNS queries from all known IoT device types behind these
8 out of 13 root DNS servers (each a root letter) between visible recursives. (Under short observation, IoT DNS queries
2013 and 2018 (excluding G, D, E and L roots for not could be hidden from root letters by both DNS caching and
participating in all these DITL data and I root for using non-IoT overshadowing: if a non-IoT device queries a TLD
anonymized IPs) to show growth in AS-level IoT deployment before an IoT device behind the same recursive does, the IoT
in this period, as summarized in Table II. Each DITL dataset DNS query, instead of being sent to a root letter, will be
contains DNS queries received by a root letter in a 2-day answered by the DNS caches created or renewed by the non-
window. IoT DNS query.) Consequently, we mainly focus on the trend
Since root DNS servers see requests from recursive DNS shown in our detection results instead of the exact number of
resolvers (usually run by ISPs for their users), our results detections.
detect devices at the AS-level, not for households. Our results Growth in AS Penetrations: We first study the “breadth”
thus show existence of device types in ASes. (They do not of AS-level IoT deployment by examining the number of ASes
show exact device counts, as described before §II-A.) To find that our 23 IoT device types have penetrated into.
out the ASes where detected devices come from, we map We show overall AS penetration for our 23 IoT device
recursive DNS resolvers’ IPs to AS numbers (ASN) with types (number of ASes where we find at least of one of our
CAIDA’s Prefix to AS mappings dataset [4]. 23 IoT device types) in Figure 2 as the blue crosses. We
Since the data represents ASes and instead of households, find the overall AS penetration for our device types increases
we do detection only (§II-A.3) and omit the server-learning significantly from 2013 to 2017 (from 244 to 846 ASes,
portion of our algorithm. With many households mixed about 3.5 times) but plateau from 2017 to 2018 (from 846 to
together, AS-size aggregation risk learning wrong servers. 856 ASes).
To count per-device-type detections, we do not use detection We believe the reason that overall AS penetration for our
merge (§II-A.1). 23 IoT device types plateau between 2017 and 2018 is the sales

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2330 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

Fig. 5. Per-device type as penetrations (omitting 7 device types appearing in less than10 ASes).

and deployment decline as these models replaced by newer For every AS detected with at least one of our 23 IoT device
releases. To support this hypothesis, we estimate release dates types (referred to as IoT-AS for simplicity) from 2013 to 2018,
for our device types and compare these estimated release dates we compute its device type density. We present the empirical
with per-device-type AS penetration (number of ASes where cumulative distribution (ECDF) for device type densities of
each of our 23 device types is found) from 2013 to 2018 IoT-ASes from 2013 to 2018 in Figure 3.
(Figure 5). Our first observation from Figure 3 is from 2013 to 2018,
We estimate release dates for 22 of our 23 device types not only are there 3.5 times more IoT-ASes (as shown by AS
based on estimated release dates for our 26 detectable IoT penetration), the device type density in these IoT-ASes are also
devices (recall §II-A.1). (We exclude device type HP_Printer constantly growing.
here because there are many HP wireless printers released Our second observation is despite the constant growth,
from a wide range of years and it would be inaccurate to device type density in IoT-ASes are still very low as of 2018.
estimate release date of this whole device type based on any In 2018, most (79%) of the IoT-ASes have at most 2 of our
HP_Printer devices.) If a device type includes more than one 23 device types, which is a modest increase comparing to
of our 26 detectable IoT devices (due to device merge), we 2013 where the similar percentage (80%) of IoT-ASes have at
estimate release dates for all these devices and use the earliest most 1 of our 23 device types.
date for this device type. We estimate release date for a given Our results suggest that for IoT devices, besides potential to
IoT device from one of three sources (ordered by priority high further grow in AS penetration (which would lead to growth
to low): release date found online, device’s first appearance in household penetration), there exists even larger potential
date and device’s first customer comment date on Amazon. to grow in device type density (which would lead to growth
com. We confirm all the 22 device types are released at least in device density). This unique potential of two-dimensional
two years before 2017 (2 in 2011, 7 in 2012, 3 in 2013, 5 in growth (penetration and density) sets IoT devices apart from
2014 and 5 in 2015), consistent with our claim that their sales other fast-growing electronic products in recent history such
are declining in 2017. as cell-phone and personal computer (PC) which mostly grow
We compare estimated release dates with per-device-type in penetration (considering that while a person may only own
AS penetration results (Figure 5) and find that detections 1 to 2 cell-phones and PCs, he could own many more IoT
of device types tend to plateau after release, consistent devices).
with product cycles and a decrease in sales and use We rule out the possibility that the increasing AS penetra-
of these devices. For example, Withings_SmartScale and tion and device type density we observe is an artifact of device
Netatmo_WeatherStation, which are released in 2012, stop servers we used in detection (measured around 2017) do not
growing roughly after 2016-10-04 and 2017-04-11, suggesting apply to IoT devices in the past by showing IoT device-type-
a product cycle of about 4 and 5 years. In comparison, TPLink- to-server-name mappings are stable over time in §IV-B.
IPCam/Plug/LightBulb is the only device type released ASes with Highest Device Type Density in 2018: We
around 2016 (TPLink_IPCam on 2015-12-15, TPLink_Plug on examined the top 10 ASes with highest device type density
2016-01-01 and TPLink_Lightbulb on 2016-08-09) and their in 2018 (detected with 8 to 14 of our 23 device types).
AS penetration continue to rise even on 2018-04-10, despite Our first observation is that they are pre-dominantly from the
AS penetration of other device types (released between U.S. (4 ASes) and Europe (3 ASes). There are also 2 ASes
2011 and 2015) roughly stop increasing by 2017. from Eastern Asia (Korea and China) and 1 from Haiti. This
Note the fact that the AS penetrations of our 23 device distribution also consistently show up in top 20 ASes with
types plateau does not contradict with the constant growth 10 ASes from the U.S. and 5 ASes from Europe. Our second
of overall IoT deployment because new IoT devices are observation is that these top 10 ASes are mostly major
constantly appearing. consumer ISPs in their operating regions such as Comcast,
Growth in Device Type Density: Having showed that our Charter, AT&T and Verizon from the U.S., Korea Telecom
23 IoT device types penetrate into about 3.5 times more ASes from South Korea and Deutsche Telekom for Germany.
from 2013 to 2018, we next study how many IoT device types Estimating Actual Overall AS Penetration in 2018: Recall
are found in these ASes—their device type density. We use that the overall AS penetrations for our 23 device types
device type density to show the “depth” of AS-Level IoT reported in Figure 2 are under-estimations of the ground truth,
Deployment. because our DITL data is not complete (8 of 13 root letters

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
GUO AND HEIDEMANN: DETECTING IoT DEVICES IN THE INTERNET 2331

TABLE V
I OT D EPLOYMENT FOR O NE H OUSE IN CCZ D ATA

types of device detected each year from this neighborhood.


(Similar to §III-B.1, to count per-device-type detection, we do
not use detection merge.)
We believe our detection counts in Figure 6 lower-bound
Fig. 6. IoT deployments for all houses in CCZ data.
the actual IoT device counts in this neighborhood for two
reasons: first, unlike our study on USC campus where dynam-
ically assigned IPs inflate IoT detection counts (§III-A.2),
provide visibility to most but not all global recursives), and IPs in CCZ data are static to each house and do not cause
because two-day data will miss many queries due to DNS such inflation; second, recalling that for NATted devices, our
caching and non-IoT overshadowing. method only detects the existence of device types but cannot
We estimate actual overall AS penetration in 2018 by know the device counts for each type (§II-A), our detection
applying detection to extended measurement at B root. With counts in Figure 6 under-estimate IoT device counts if any
this extended measurement, we expect to observe queries from household owns multiple devices of same types. We conclude
most global recursives at B root because most global recursives that the lower bound of IoT device count in this neighborhood
rotate among root letters (at least 80% [31]). We also hope to increases about 4 times from 2013 (at least 3 devices) to
observe IoT DNS queries that would otherwise get hidden by 2017 (at least 13 devices), consistent with our observation of
DNS caching and non-IoT overshadowing in short observation. increasing AS-level IoT deployment in this period.
(Ideally, when adding more observations leads to no new We want to track IoT deployment by house but we can
detections, we know we have detected all IoT-ASes that could do that for only about half the houses because (according to
be visible to B root.) author of this dataset) although IPs are almost static to each
To evaluate how many IoT-ASes we could see, we extend house, about half of the houses are rentals and see natural year-
2-day 2018 DITL observation at B root to 112 days. As to-year variation from student tenants. Our detection results are
shown in Figure 4, we see a constant increase in detection consistent with this variation: most IPs with IoT detections at
of IoT-ASes over longer observation. With 112-day observa- one year cannot be re-detected with the same set of device
tion, we detect 3106 IoT-ASes, 8× more than what we see types in the following years.
in 2 days of B root only (388 IoT ASes), and 3.6× more We show the increasing IoT deployment can also be
than 2 days with 8 roots (856 IoT ASes, as in Figure 2). In observed from a single house by tracking one house whose
112 days, we see about 5% of all unique ASes in the routing tenant looks very stable (since it is detected with consis-
system in early 2018 (about 60,000, reported by CIDR- tent set of IoT device types over the 5 years). As shown
report.org [42]) However, we do not see the detection in Table V, this household owns none of our known device
curve in Figure 4 flattening even after 112 days. types in 2013 (omitted in the table) and acquire HP_Printer
We model IoT query rates from an IoT-AS as seen by a in 2014, Nest_IPCam and Nest_SmokeAlarm in 2015, as
single root letter. Simple models (a root letter receives 1/13th well as Philips_LightBulb and Withings_SmartScale in 2016.
of the traffic) show a curve flattening after at least 300 days, Withings_SmartScale is missed in 2017 detection potentially
consistent with what we see in Figure 4. However, a detailed due to this type of device generates no background traffic and
model requires understanding the IoT query rates and the it is not used during the 7-day measurement of 201701 CCZ
aggregate (IoT and non-IoT) query rates, more information data.
than we have. We conclude that the real numbers of IoT-ASes Results with Server Learning: With server learning,
are much higher than our detections with DITL in Figure 2. we see no additional detections. We do observe that during
2) IoT Deployments in a Residential Neighborhood: We our detection to 5 years’ CCZ DNS data, 951 distinct server
next explore deployments of our 23 device types in a resi- names are learned and 3 known IoT device types are split.
dential neighborhood from 2013 to 2017. By analyzing these new server names, we conclude that
Input Datasets: We use DNS datasets from Case Connec- server learning could discover new sub-types of known IoT
tion Zone (CCZ) to study a residential neighborhood [2]. This device type but risk learning wrong servers from NATted
dataset records DNS lookups made by around 100 residential traffic.
houses in Cleveland, OH that connected to CCZ Fiber-To- We first show server learning could learn new device server
The-Home experimental network and covers a random 7-day names and even new sub-type for known IoT device types.
interval in each month between 2011 and 2017. Specifically, HP_Printer is originally mapped to 3 server names (per prior
we apply DNS-based detection (both with and without server knowledge obtained in §III-A.1). In the 2015-01 detection
learning) to the January captures of 2013 to 2017 CCZ DNS (others are similar), we learn 9 new server names for it
data (Table II). in first iteration. But with these updated 12 server names,
Results without Server Learning: As shown in Figure 6, we find 2 less HP_Printer in subsequent detection, suggesting
from 2013 to 2017, we see roughly more detections and more HP_Printer is in fact an aggregation of two sub-types (just

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2332 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

TABLE VI
IPC AM D ETECTION B REAK -D OWN

like we merge Belkin_Switch and Belkin_MotionSensor as TABLE VII


one type in §II-A.1): one sub-type talk to the original 3 server PARTIAL VALIDATION OF C ERTIFICATE -BASED D ETECTION R ESULTS
names while the other talk to the updated 12 server names.
We split HP_Printer into two sub-types and re-discover the
two missed HP_Printer in subsequent detection.
We show our method risks learning wrong servers for a
given IoT device type P behind NAT if there are non-IoT
devices behind the same NAT visiting servers run by P ’s
manufacturer. This is caused by two limitations in our method
design: first, our method tries learning all unknown server Initial Detection Results: Table VI shows 244,058 IPCam
names queried by IoT user IPs (§II-A.3) because we cannot devices we detect (represented by IoT certificates, 0.46% of all
distinguish between DNS queries from detected IoT devices 52,968,272 input TLS certificates) from 9 manufacturers (29%
and DNS queries from other non-IoT devices behind the same of 31 input manufacturers, we do not see any detection from
NAT; second, we risk mis-classifying human-facing manufac- other 22 manufacturers). Among the detected devices, most
turer server (that also serve non-IoT devices) as device server (228,045, 93.43%) come from the top manufacturer Dahua.
because we use necessary condition to determine device- (Dahua is responsible for most IP cameras used in one DDoS
facing server in §II-A.1. In the 2015-01 detection (others are attack [29].) Almost all (243,916, 99.94%) detected devices
similar), we learn suspiciously high 176 device servers for come from the top 5 manufacturers.
Amazon_Echo and 277 device servers for Amazon_FireTV Partial Validation: Due to lack of ground truth, it is not
in first iteration, suggesting many of these new servers are possible to directly validate our results. We indirectly validate
learned from non-IoT devices (like laptops using Amazon ser- our results by accessing (via browser) IPs of 50 random can-
vices) behind the same NAT as the detected Amazon devices didate certificates from each IPCam manufacturers where we
(because IoT devices usually only talk to at most 10 servers per found at least one candidate certificate. If browser accessing
day [38]). This false learning poisons our knowledge of device shows a login screen with the correct manufacturer name on
servers and causes us to detect two less Amazon_FireTV it, we consider it valid. This validation is limited since even a
and one less Amazon_Echo in second iteration. Luckily, our true positive may not pass it due to the device may be off-line
method splits Amazon_Echo and Amazon_FireTV into two or not show the manufacturer when we try it. (Our validation
sub-types where one sub-type still mapped to the original, un- tests were done only 3 days after TLS certificate collection,
poisoned, set of device servers, allowing us to re-discover these to minimize IP address changes.)
missing Amazon devices in subsequent detections. Table VII shows our results, with 66% of detections correct.
(We observe good performance in validation §IV-B where For the 106 false positives, in 40 cases the IP address did
we apply server learning inside the NAT.) not respond and in 53 cases, we get login screen showing no
manufacturer information. All 33 false negatives are due to
Foscam IPCam fail our two rules to find IoT certificates in
C. Certificate-Based IoT Detection Results §II-B.2: they are signed by a CA called “WoSign” and have
Certificate-based detection only applies to devices that uncommon CCN place holder *.myfoscam.org.
directly provide public web pages. IP cameras and Network By adding a special rule for Foscam devices (candidate
Video Recorders (NVR) both often export their content, so certificates of Foscam that are signed by WoSign and have *.
we search for these. We find distinguishing them is hard myfoscam.org as CCN are IoT certificates), our detection
because IP camera manufacturer often also produce NVR and correctness percentage increases to 70% (283 out of 404, with
to distinguish them requires finding non-manufacturer keys “IP 15 true negatives becoming false positives due to we cannot
Camera” and “NVR” in TLS certificates (per rules in §II-B.1). confirm ground truth for 15 newly detected Foscam IPCam)
Since we find certificates rarely contains these two text strings, and false negative percentage drops to 0%.
we do not try to distinguish them and report them together as Revised Detection Results: Last row of §II-B shows our
“IPCam”. revised detection results with the special rule for Foscam:
Input Datasets: We apply detection to ZMap’s 443-https- with 10,524 more detected Foscam devices, we have a total
ssl_3-full_ipv4 TLS certificate dataset captured on 2017-07- of 254,582 IPCam detections. (Our results show the sub-
26 [43] (as in Table II). This dataset consists of certificates set of IPCams that are on the public Internet using TLS,
found by ZMap TCP SYN scans on port 443 in the public but omit devices on private addresses and those not using
IPv4 address space. TLS, as per §II-B.)
We target IPCam devices from 31 manufacturers (obtained Geo-Location Analysis: We geo-locate our revised detection
from market reports [14], [15] and top Amazon sell- result with Maxmind data published on 2017-07-18 (8 days
ers). We build matching keys for these IPCams based on before collection of the TLS certificate data we use) and find
rules in §II-B.1. our detected IPCams come from 199 countries.

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
GUO AND HEIDEMANN: DETECTING IoT DEVICES IN THE INTERNET 2333

TABLE VIII device use. On day 5, we reboot each device, looking how a
D ETECTED IP C AMERAS AND NVR S BY C OUNTRIES restart affects device traffic.
Our detection algorithm uses the same set of device server
names that we describe in §III-A.1. We collect IPv4 addresses
for these device server names (by issuing DNS queries every
10 minutes) during the same 5-day period at the same location
as our controlled experiments.
Detection During Inactive Days: We begin with detec-
tion using the first 2 days of data when the devices are
inactive. We detect more than half of the devices (6 true
positives out of 10 devices); we miss the remaining 4 devices:
We examine what devices are in each country to gain Amazon_Button, Foscam_IPCam, Amcrest_IPCam, and Ama-
confidence in what we detect. Table VIII shows the top ten zon_Echo (4 false negative). We see no false positives. (All
countries by number of detected devices, and breaks down how 15 no-IoT devices are detected as non-IoT.) This result shows
many devices are found in country by manufacturer. (We show that short measurements will miss some inactive devices, but
show only manufacturer with at least 1000 global detections background traffic from even unused devices is enough to
in Table VI.) detect more than half.
We find manufacturers prefer different operating regions. Detection During Inactive and Active Days: We next con-
We believe these preferences are related to their business sider the first four days of data, including both inactive periods
strategies. While Dahua, Foscam and Hikvision are global, and active use of the devices. When observations include
the latter two show substantially more deployment in the U.S. device interactions, we find all devices.
and China, respectively. Amcrest (formerly Foscam U.S. [7]) We also see one false positive: a laptop is falsely classified
is almost exclusive to the American market. The German as Foscam_IPCam. We used the laptop to configure the device
company Mobotix, while is present in Europe and America, and change the device’s dynamic DNS setting. As part of this
seems completely absent from Asian markets. configuration, the laptop contacts ddns.myfoscam.org, a
device-facing server name. Since the Foscam_IPCam has only
one device server name, this overlap is sufficient to detect the
IV. VALIDATION laptop as a camera. This example shows that IoT devices that
We validate the accuracy of our two main methods by use only a few device server names are liable to false positive.
controlled experiments. Applying Detection to All Data: When we apply detection
Validation requires ground truth, so we turn to controlled to the complete dataset, including inactivity, active use, and
experiments with devices we own. We have 10 devices reboots, we see the same results as without reboots. We
(Table I) from 7 different manufacturers and at different conclude that user device interactions is sufficient for IoT
prices (from $5 to $85, in 2018). This diversity provides detection; we do not need to ensure observations last long
a range of test subjects, but the requirement to own the enough to include reboots.
devices means our controlled experiment is limited in size. Simulating Dynamic IPs: We next show how dynamically
In principle, we could scale up testing by by crowd-sourcing assigned IPs can inflate IoT detections (both at USC, §III-A.2
traffic captures, as shown in [20]. and at an IXP, §III-A.3).
Our experiments also show our method correctly detects We simulate dynamic-assigned IPs by manually re-assigning
multiple devices from same manufacturer (3 devices from random static IPs to our 25 devices every day during our 5-day
Amazon and 2 from TP-Link, as in Table I) using device merge experiment.
and detection merge (recalling §II-A.1). Our IP-based detection with this simulated 5-day dynamic-
IP measurements finds 26 true positive IoT detections from
25 dynamic IPs. One IP is detected with two IoT devices
A. Accuracy of IP-Based IoT Detection because they were each assigned to this IP on a different day.
We validate the correctness and completeness of our Similar to our 4-day and 5-day static-IP detection, we see a
IP-based method by controlled experiments. We set up our false detection of a laptop as Foscam_IPCam, and no false
experiment by placing our 10 IoT devices (Table I) and negatives. This experiment showed 2.6× more IoT devices
15 non-IoT devices in a wireless LAN behind a home router. than we have, less than the 5× inflation that would have
We assign static IPs to these 25 devices. We run tcpdump occurred with each device being detected on a different IP
inside the wireless LAN to observe all traffic from the LAN each day.
to the Internet. We conclude that dynamic addresses can inflate device
We run our experiments for 5 days to simulate 3 possible counts, and the degree depends on address lease times.
cases in real-world IoT measurements.
On Day 1 to 2 (inactive days), we do not interact with B. Accuracy of DNS-Based IoT Detections
IoT devices at all. So first 2 days’ data simulates observations We validate correctness and completeness of our DNS-based
of unused devices and contains only background traffic from detection method by controlled experiments. We use the same
the devices, not user-driven traffic. On day 3 to 4 (active set up, devices and device server names as in §IV-A. We also
days), we trigger the device-specific functionality of each of validate our claim that DNS-based detection can be applied
the 10 devices like viewing the cameras and purchasing items to old network measurements by showing IoT device-type-to-
with Amazon_Button. The first 4 days’ data shows extended server-name mappings are stable over time.

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2334 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

TABLE IX other manufacturers) and the way we drop servers (ensuring


R ESILIENCE OF D ETECTION AND S ERVER L EARNING each device mapped to at least one server name) guarantee
low false negatives.
We also find the learn-back ratio is relatively stable, fluc-
tuating around 50%. To explore how false detection happen
and why about half dropped mappings cannot be learned
back, we closely examine the detection and server learning
with 20% (15) mappings dropped (others are similar). This
experiment has only one false detection: Belkin_Plug is not
We run our experiments for 7 days and trigger device- detected due to 2 of its 3 server names are dropped while
specific functionality of each of the 10 devices every day to the remaining 1 server name is not queried in validation data.
mitigate the effect of DNS caching. This experiment fail to learn back 9 of 15 dropped mappings:
We first apply detections with the complete set of device 4 due to server names not seen in validation data, 2 due
server names to evaluate the detection correctness and server to non-detection of Belkin_Plug (recall we only try to learn
learning performance of our DNS-based method. We then server from detected devices) and the rest 3 due to server
detect with incomplete set of device server names to test the names are not considered unknown (recall we only try to learn
resilience of detection and server learning to incomplete prior unknown servers) because they are originally mapped to both
knowledge of device servers. Amazon_FireTV and Amazon_Echo and we only dropped
Detection With Complete Server Names: Results show them from server list of Amazon_Echo.
100% correctness (10 true positives and 15 true negatives), Stability of Device Server Names: We support our claim
with 13 new device server names learned and 1 known device that DNS-based detection can be applied to old network
type split. measurements by verifying IoT device-type-to-server-name
By analyzing the detection log, we show server learn- mappings are stable over time. We show 8 of our 10 IoT
ing and device splitting can correct incorrect device- devices (Table I) and a newly purchased Samsung_IPCam talk
merges. Recall in §III-A.1, we merge TPLink_Plug and to almost identical set of device server names across 1 to
TPLink_LightBulb as one type (TPLink_Plug/Bulb) per 1.5 years. We exclude Amazon_Echo and Amazon_FireTV
our prior knowledge, they talk to the same server name from this experiment because they talk to large number of
devs.tplinkcloud.com. After first iteration of detection, device servers (previously measured 15 and 45) and it is hard
we learn a new server deventry.tplinkcloud.com to track all of them over time. We update these 9 devices to
for TPLink_Plug/Bulb (from a detected TPLink_LightBulb, latest firmwares on May, 2018, measure latest servers name
as shown by ground truth). However with now 2 server they talk to and compare these servers name with those
names mapped to TPLink_Plug/Bulb, we see one less we used in detection (measured on Oct 2016 for 1 device,
detection of it in second iteration (ground truth shows a on Dec, 2016 for 6 devices and on June 2017 for 2 devices).
TPLink_Plug becomes un-detected). This reduced detection We found these 9 devices still talk to 17 of the 18 device
suggests TPLink_LightBulb and TPLink_Plug are in fact server names we measured from them 1 to 1.5 years ago.
different device types: the former talks to the updated set The only difference is D-Link_IPCam who changes 1 of
of servers (devs.tplinkcloud.com and deventry. its 3 device server name from signal.mydlink.com to
tplinkcloud.com) while the latter talk to the origi- signal.auto.mydlink.com. A close inspection shows
nal set of servers (devs.tplinkcloud.com). We split signal.auto.mydlink.com is CNAME of signal.
TPLink_Plug/Bulb back into two to fix this incorrect device mydlink.com, suggesting although D-Link_IPCam change
merge and re-discover the missed TPLink_Plug in subsequent the server names it queries (making it less detectable for our
detections. DNS-based method), it still talk to the same set of actual
Detection With Incomplete Set of Server Names: We detect servers (meaning our IP-based method is un-affected).
with incomplete set of device server names to test resilience of
detection and server learning to incomplete prior knowledge. V. R ELATED W ORK
Our goal is to simulate cases where we do not know all servers Prior groups considered detection of IoT devices:
devices contact. We can have incomplete information should
we not learn for long enough from them prior to detection Heuristic-Based Traffic Analysis
(§II-A.1), or because they change servers over time (perhaps IoTScanner detects LAN-side devices by passive measure-
due to firmware changes). ment within the LAN [37]. They intercept wireless signals
We randomly drop 10%, 20% to 50% known device-type- such as WiFi packets and identify existence of IoT devices
to-server-name mappings while ensuring each device type is by packets’ MAC addresses. While their work require LAN
still mapped to at least one server. We then compare the access and cannot generalize to Internet-wide detection, our
detection correctness and the learn-back ratio (how many three methods apply to whatever parts of the Internet that are
dropped mappings are learned back after detections) of each visible in available network measurements, and are able to
experiment. categorize device types.
Results (Table IX) show our detection correctness are fairly Work from Georgia Institute of Technology detects exis-
stable: with 50 % servers dropped we still have 96% correct- tence of Mirai-infected IoT devices by watching for hosts
ness. We believe two reasons cause this high correctness: our doing Mirai-style scanning (probes with TCP sequence num-
detection method suppress false positive (by ensuring device bers equal to destination IP addresses) [3]. Their detection
servers are not likely to serve human and IoT devices from reveals existence of Mirai-specific IoT devices, but does

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
GUO AND HEIDEMANN: DETECTING IoT DEVICES IN THE INTERNET 2335

not further characterize device types. In comparison, our Censys is similar to Shodan but they also support commu-
three detection methods reveal both existence and type of nity maintained annotation logic that annotate manufacturer
IoT devices. Our IP and DNS-based method cover general and model of Internet-connected devices by matching texts
IoT devices talking to device servers rather than just Mirai- with banner information [9].
infected devices. Compared to Shodan and Censys, our IP-based and DNS-
Work from University of Maryland detects Hajime infected based methods cover IoT devices using both public and private
IoT devices by measuring the public distributed hash IP addresses, because we use passive measurements to look
table (DHT) that Hajime use for C&C communication [19]. for signals that work with devices behind NATs. These two
They characterize device types with Censys [9], but types for methods thus cover all IoT devices that exchanges packets
most of their devices remain unknown. In comparison, our with device servers during operation. Our certificate-based
three detection methods detect existence of known devices and method, while also relying on TLS certificates crawled from
always characterize their device types. Our IP and DNS-based IPv4 space, provides a better algorithm to match TLS certifi-
methods cover general IoT devices talking to device servers cates with IoT related text strings (with multiple techniques to
rather than just those infected by Hajime. improve matching accuracy) and ensures matched certificates
come from HTTPS servers running in IoT devices.
Machine-Learning-Based Traffic Analysis Work from Concordia University infers compromised IoT
Work from Ben-Gurion University of the Negev (BGUN) devices by identifying the fraction of IoT devices detected
detect IoT devices from LAN-side measurement by identifying by Shodan that send packets to allocated but un-used IPs
their traffic flow statistics with machine learning (ML) models monitored by CAIDA [40]. Their focus on compromised IoT
such as random forest and GBM [26], [27]. They use a wide devices is different from our focus on general IoT devices.
range of features (over 300) extracted from network, transport Due to their reliance on Shodan data, they cover devices with
and application layers, such as number of bytes and number public IP while our IP-based and DNS-based method cover
of HTTP GET requests. devices on both public and private IP. We also report IoT
Similarly, work from the University of New South deployment growth over a much longer period (6 years) than
Wales (UNSW) characterizes the traffic statistics of 21 IoT they do (6 days).
devices such as packet rates and average packet sizes and Northeastern University infers devices hosting invalid cer-
briefly discusses detecting these devices from LAN-side by tificates (including IoT devices) by manually looking up model
identifying their traffic statistics with ML model (random numbers in certificates and inspecting web pages hosted on
forest) [38]. certificates’ IP addresses [5]. In comparison, our certificate-
Comparing to work from BGUN from UNSW, our work based method introduces an algorithm to map certificates to
uses different features: packet exchanges with particular device IoT devices and does not fully rely on manual inspection.
servers and TLS certificate for IoT remote access rather Work from University of Michigan detects industrial
than traffic statistics or traffic flow features. While they use control systems (ICS) by scanning the IPv4 space with
LAN-side measurement where traffic from each device can ICS-specific protocols and watching for positive
be separated by IP or MAC addresses, our IP-based and responses [28]. Unlike from their focus on ICS-protocol-
DNS-based methods can work with aggregated traffic from compliant devices and protocols, our approaches considers
outside the NAT and cover IoT devices both on public Internet general IoT devices. Our approach also uses different
and behind NAT. Not requiring LAN-side measurement also measurements and signals for detection.
enables our IP-based and DNS-based methods to do Internet- VI. C ONCLUSION
wide detection. Our certificate-based method covers HTTPS-
To understand the security threats of IoT devices requires
Accessible IoT devices on public Internet by crawling TLS
knowledge of their location, distribution and growth. To help
certificates in IPv4 space.
provide these knowledge, we propose two methods that detect
Work from IBM transforms DNS names into embeddings,
general IoT devices from passive network measurements (IPs
the numeric representations that capture the semantics of DNS
in network flows and stub-to-recursive DNS queries) with
names, and classify devices as either IoT or non-IoT based on
the knowledge of their device servers. We also propose a
embeddings of their DNS queries using ML model (multilayer
third method to detect HTTPS-Accessible IoT devices from
perceptron) [24]. In comparison, our three methods not only
their TLS Certificates. We apply our methods to multiple
detect existence of IoT devices, but also categorize their device
real-world network measurements. Our IP-based algorithm
types. While they rely on LAN-side measurement to aggregate
reports detections from a university campus over 4 months
DNS queries by device IPs, our three methods do not require
and from traffic transiting an IXP over 10 days. Our DNS-
measuring from inside the LAN.
based algorithm finds about 3.5× growth in AS penetration
IPv4 Scanners for 23 device types from 2013 to 2018 and modest increase in
device type density in ASes detected with these device types.
Shodan is a search engine that provides information (mainly
Our DNS-based method also confirms substantial growth in
service banners, the textual information describing services
IoT deployments at household-level in a residential neighbor-
on a device, like certificates from HTTPS TLS Service)
hood. Our certificate-based algorithm find 254K IP camera and
about Internet-connected devices on public IP (including IoT
NVR from 199 countries around the world.
devices) [36]. Shodan actively crawls all IPv4 addresses on
a small set of ports to detect devices by matching texts (like ACKNOWLEDGMENT
“IP camera”) with service banners and other device-specific The authors would like to thank Arunan Sivanathan at
information. the University of New South Wales for sharing their IoT

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.
2336 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 28, NO. 5, OCTOBER 2020

device data with us [38]. They thank Paul Vixie for providing [25] P. Loshin. (2016). Details Emerging on Dyn DDoS Attack. [Online].
historical DNS data from Farsight [35]. They also especially Available: http://searchsecurity.techtarget.com/news/450401962/Details-
emerging-on-Dyn-DNS-DDoS-attack-Mirai-IoT-botnet
thank Mark Allman for sharing his CCZ DNS Transactions [26] Y. Meidan et al., “ProfilIoT: A machine learning approach for IoT device
datasets [2] and help run our code on partially un-encrypted identification based on network traffic analysis,” in Proc. SAC, 2017,
version of this dataset. pp. 506–509.
The U.S. Government is authorized to reproduce and distrib- [27] Y. Meidan et al., “Detection of unauthorized IoT devices using machine
learning techniques,” 2017, arXiv:1709.04647. [Online]. Available:
ute reprints for Governmental purposes notwithstanding any https://arxiv.org/abs/1709.04647
copyright notation thereon. [28] A. Mirian et al., “An Internet-wide view of ICS devices,” in Proc. 14th
Annu. Conf. Privacy, Secur. Trust (PST), Dec. 2016, pp. 96–103.
R EFERENCES [29] Motherboard. 1.5 Million Hijacked Cameras Make an Unprecedented
Botnet. Accessed: Jul. 2018. [Online]. Available: https://motherboard.
[1] G. Acar, N. Apthorpe, N. Feamster, D. Y. Huang, Frank, and vice.com/en_us/article/8q8dab/15-million-connected-cameras-ddos-
A. Narayanan. IoT Inspector Project from Princeton Univer- botnet-brian-krebs
sity. Accessed: Nov. 2019. [Online]. Available: https://iot-inspector. [30] Mozilla. Public Suffix List. Accessed: Jul. 2018. [Online]. Available:
princeton.edu/ https://www.publicsuffix.org/
[2] M. Allman. (Jan. 2018). Case Connection Zone DNS Transactions. [31] M. Müller, G. C. M. Moura, R. O. de Schmidt, and J. Heidemann,
[Online]. Available: http://www.icir.org/mallman/data.html “Recursives in the wild: Engineering authoritative DNS servers,” in Proc.
[3] M. Antonakakis et al., “Understanding the mirai botnet,” in Proc. 26th ACM Internet Meas. Conf., 2017, pp. 489–495.
USENIX Secur. Symp., 2017, pp. 1093–1110. [32] No-IP. Domain Names Provided by No-IP. Accessed: Jul. 2018. [Online].
[4] CAIDA. Routeviews Prefix to AS Mappings Dataset. Available: http://www.noip.com/support/faq/free-dynamic-dns-domains/
Accessed: Mar. 2019. [Online]. Available: https://www.caida.org/data/ [33] OVH. DDoS didn’t Break VAC. [Online]. Available:
routing/routeviews-prefix2as.xml https://www.ovh.com/us/news/articles/a2367.the-ddos-that-didnt-break-
[5] T. Chung et al., “Measuring and applying invalid SSL certificates: the-camels-vac
The silent majority,” in Proc. Internet Meas. Conf., 2016, pp. 527–541. [34] SCIP. Belkin WeMo Switch Communications Analysis.
[6] Cloudflare. What is an IXP. Accessed: Nov. 2019. [Online]. Available: Accessed: Jul. 2018. [Online]. Available: https://www.scip.ch/en/?labs.
https://www.cloudflare.com/learning/cdn/glossary/internet-exchange- 20160218
point-ixp/ [35] Farsight Security. Passive DNS Historical Internet Database:
[7] Dahua. Important Message from Foscam Digital Technologies Regard- Farsight DNSDB. Accessed: Jul. 2018. [Online]. Available:
ing US Sales and Service. Accessed: Jul. 2018. [Online]. Available: https://www.farsightsecurity.com/solutions/dnsdb/
http://foscam.us/products.html/ [36] Shodan. Shodan Search Engine Front Page. Accessed: Jul. 2018.
[8] T. Dierks and E. Rescorla, The Transport Layer Security (TLS) Protocol, [Online]. Available: https://www.shodan.io/
[37] S. Siby, R. R. Maiti, and N. O. Tippenhauer, “IoTscanner: Detecting
document RFC 4346, Internet Request For Comments, 2006.
privacy threats in IoT neighborhoods,” in Proc. Workshop IoT Privacy,
[9] Z. Durumeric, D. Adrian, A. Mirian, M. Bailey, and J. A. Halderman,
Trust, Secur., 2017, pp. 23–30.
“A search engine backed by Internet-wide scanning,” in Proc. 22nd ACM
[38] A. Sivanathan et al., “Characterizing and classifying IoT traffic in smart
SIGSAC Conf. Comput. Commun. Secur. (CCS), 2015, pp. 542–553.
cities and campuses,” in Proc. IEEE Conf. Comput. Commun. Workshops
[10] Dyn. Analysis of October 21 Attack. Accessed: Jul. 2018. [Online].
(INFOCOM WKSHPS), May 2017, pp. 559–564.
Available: http://dyn.com/blog/dyn-analysis-summary-of-friday-october- [39] ThousandEyes. What is an ISP? Accessed: Nov. 2019. [Online].
21-attack/ Available: https://www.thousandeyes.com/learning/glossary/isp-internet-
[11] K. Egevang and P. Francis, The IP Network Address Translator (NAT), service-provider
document RFC 1631, Internet Request For Comments, 1994. [40] S. Torabi, E. Bou-Harb, C. Assi, M. Galluscio, A. Boukhtouta, and
[12] Gartner. IoT Installed Base Forcast. Accessed: Mar. 2019. [Online]. M. Debbabi, “Inferring, characterizing, and investigating Internet-scale
Available: https://www.statista.com/statistics/370350/internet-of-things- malicious IoT device activities: A network telescope perspective,” in
installed-base-by-category/ Proc. 48th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN),
[13] B. Gleeson, A. Lin, J. Heinanen, T. Finland, G. Armitage, and Jun. 2018, pp. 562-573.
A. Malis, A Framework for IP Based Virtual Private Networks, [41] USC/LANDER. (May 19, 2015). FRGP Continuous Flow Dataset.
document RFC 2764, Internet Request For Comments, 2000. [Online]. Available: http://www.isi.edu/ant/lander
[14] GlobalInfoResearch. IP Cam Market Report. Accessed: Jul. 2018. [42] Wikipedia. Autonomous System (Internet). Accessed: Mar. 2019.
[Online]. Available: https://goo.gl/254g2M [Online]. Available: https://en.wikipedia.org/wiki/Autonomous_system_
[15] GlobalInfoResearch. NVR Market Report. Accessed: Jul. 2018. [Online]. (Internet)
Available: https://goo.gl/sxQRis [43] ZMap. ZMap 443 HTTPS SSL Full IPv4 Datasets. Accessed: Jul. 2018.
[16] H. Guo and J. Heidemann. IoT Traces From 10 Devices we Purchased. [Online]. Available: https://censys.io/data/443-https-ssl_3-full_ipv4
Accessed: Jul. 2018. [Online]. Available: https://ant.isi.edu/datasets/iot/
[17] H. Guo and J. Heidemann, “Detecting IoT devices in the Internet Hang Guo received the B.S. degree from the Bei-
(extended),” USC/ISI, Marina del Rey, CA, USA, Tech. Rep. ISI-TR- jing University of Posts and Telecommunications
726B, 2018. in 2014 and the Ph.D. degree from the University of
[18] H. Guo and J. Heidemann, “IP-based IoT device detection,” in Proc. Southern California in 2020. His research interests
Workshop IoT Secur. Privacy, 2018, pp. 36–42. include Internet traffic analysis, network security,
[19] S. Herwig, K. Harvey, G. Hughey, R. Roberts, and D. Levin, “Measure- and the Internet of Things (IoT). In 2020, he joined
ment and analysis of Hajime, a peer-to-peer IoT botnet,” in Proc. Netw. Microsoft Azure Team, as a Software Engineer.
Distrib. Syst. Secur. Symp., 2019, pp. 1–15.
[20] D. Y. Huang, N. Apthorpe, G. Acar, F. Li, and N. Feamster, “IoT
inspector: Crowdsourcing labeled network traffic from smart home
devices at scale,” Sep. 2019, arXiv:1909.09848. [Online]. Available:
https://arxiv.org/abs/1909.09848. John Heidemann (Fellow, IEEE) received the B.S.
[21] B. Krebs. Krebs Hit With DDoS. Accessed: Jul. 2018. [Online]. Avail- degree from the University of Nebraska-Lincoln
able: https://krebsonsecurity.com/2016/09/krebsonsecurity-hit-with- in 1989, and the M.S. and Ph.D. degrees from
record-ddos/ the University of California at Los Angeles
[22] P. Krzyzanowski. Understanding Autonomous Systems. Accessed: in 1991 and 1995, respectively. He is currently a
Nov. 2019. [Online]. Available: https://www.cs.rutgers.edu/~pxk/352/ Principal Scientist at the University of Southern
notes/autonomous_systems.html California/Information Sciences Institute (USC/ISI)
[23] J. Kurkowski. Lib Tldextract. Accessed: Jul. 2018. [Online]. Available: and a Research Professor at USC in computer
https://pypi.python.org/pypi/tldextract science. At ISI, he leads the Analysis of Network
[24] F. Le, M. Srivatsa, and D. Verma, “Unearthing and exploiting latent Traffic (ANT) Lab, observing and analyzing Internet
semantics behind DNS domains for deep network traffic analysis,” in topology and traffic to improve network reliability,
Proc. Workshop AI for Internet of Things, 2019, pp. 1–6. security, protocols, and critical services. He is a Senior Member of the ACM.

Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on August 21,2024 at 07:44:37 UTC from IEEE Xplore. Restrictions apply.

You might also like