Efficient Spam Detection Technique For Iot Devices Using Machine Learning
Efficient Spam Detection Technique For Iot Devices Using Machine Learning
ABSTRACT
The Internet of Things (IoT) is a group of millions of devices having sensors and
actuators linked over wired or wireless channel for data transmission. IoT has
grown rapidly over the past decade with more than 25 billion devices are expected
to be connected by 2020. The volume of data released from these devices will
increase many-fold in the years to come. In addition to an increased volume, the
IoT devices produces a large amount of data with a number of different modalities
having varying data quality defined by its speed in terms of time and position
dependency. In such an environment, machine learning algorithms can play an
important role in ensuring security and authorization based on biotechnology,
anomalous detection to improve the usability and security of IoT systems. On the
other hand, attackers often view learning algorithms to exploit the vulnerabilities in
smart IoT-based systems. Motivated from these, in this paper, we propose the
security of the IoT devices by detecting spam using machine learning. To achieve
this objective, Spam Detection in IoT using Machine Learning framework is
proposed. In this framework, five machine learning models are evaluated using
various metrics with a large collection of inputs features sets. Each model
computes a spam score by considering the refined input features. This score depicts
the trustworthiness of IoT device under various parameters. REFIT Smart Home
dataset is used for the validation of proposed technique. The results obtained
proves the effectiveness of the proposed scheme in comparison to the other
existing schemes.
i
CONTEXT
ABSTRACT i
LIST OF FIGURES ii
1 INTRODUCTION 1
2 LITERATURE SURVEY 19
3 SYSTEM REQUIREMENTS 24
4.1 PURPOSE 29
4.2 SCOPE 29
5 SYSTEM DESIGN 32
6 MODULES 39
7 SYSTEM IMPLEMENTATION 42
8 SYSTEM TESTING 43
8.2 VERIFICATION 43
8.3 VALIDATION 43
10 CONCLUSION 61
11 REFERENCE 62
LIST OF FIGURES:
ii
CHAPTER 1
INTRODUCTION
The safety measures of IoT devices depends upon the size and type of organization
in which it is imposed. The behavior of users forces the security gateways to
cooperate. In other words, we can say that the location, nature, application of IoT
devices decides the security measures. For instance, the smart IoT security cameras
in the smart organization can capture the different parameters for analysis and
intelligent decision making. The maximum care to be taken is with web based
devices as maximum number of IoT devices are web dependent. It is common at
the workplace that the IoT devices installed in an organization can be used to
implement security and privacy features efficiently. For example, wearable devices
collect and send user’s health data to a connected smartphone should prevent
leakage of information to ensure privacy. It has been found in the market that
25-30% of working employees connect their personal IoT devices with the
organizational network. The expanding nature of IoT attracts both the audience,
i.e., the users and the attackers. However, with the emergence of ML in various
attacks scenarios, IoT devices choose a defensive strategy and decide the key
parameters in the security protocols for trade-off between security, privacy and
computation. This job is challenging as it is usually difficult for an IoT system with
limited resources to estimate the current network and timely attack status.
1
Random forest algorithm can use both for classification and the regression
kind of problems. In this you are going to learn, how the random forest
algorithm works in machine learning for the classification task.
A random forest algorithm consists of many decision trees. The ‘forest’ generated
by the random forest algorithm is trained through bagging or bootstrap
aggregating. Bagging is an ensemble meta-algorithm that improves the accuracy of
machine learning algorithms.
The below diagram explains the working of the Random Forest algorithm:
2
Fig 1.1: Explaining the working algorithm of the Random Forest algorithm
Below are some points that explain why we should use the Random Forest
algorithm:
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.
3
● It can produce a reasonable prediction without hyper-parameter tuning.
● It solves the issue of overfitting in decision trees.
● In every random forest tree, a subset of features is selected randomly at the
node’s splitting point.
A rain forest system relies on various decision trees. Every decision tree consists of
decision nodes, leaf nodes, and a root node. The leaf node of each tree is the final
output produced by that specific decision tree. The selection of the final output
follows the majority-voting system. In this case, the output chosen by the majority
of the decision trees becomes the final output of the rain forest system. The
diagram below shows a simple random forest classifier.
4
Fig 1.2: Explaining the Random Forest Classifier
5
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into subsets
and given to each decision tree. During the training phase, each decision tree
produces a prediction result, and when a new data point occurs, then based on the
majority of results, the Random Forest classifier predicts the final decision.
Consider the below image:
Fig 1.3: Explaining the Random Forest Classifier algorithm with example