0% found this document useful (0 votes)
10 views

Lab2

Uploaded by

abdulhamid.m.321
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lab2

Uploaded by

abdulhamid.m.321
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

CYB419 Laboratory Practical

Lecture 5 - Introduction to R and R Studio


The purpose of today's lab is to become familiar with R and / R Studio environments.
During the last practical session, students were guided on how to install R and R Studio on
their workstations.

In today’s lab sessions, students will be introduced to the following;

1. Launch the R Studio app on your workstation. This should open the R Studio
programming window.

2. On the file menu, click on New File and from the drop-down menu; choose R Script.
This will open a new working file for you where you can run your R and R Studio code
snippets.

3. Create a folder in your desktop and add the training dataset to that folder. Additionally,
save the R scripts file with any name you want in the same folder where your dataset
are.

Loading of training dataset on R and R Studio

#LOADING THE DATASET TO R STUDIO AND SHOWING A BRIEF DESCRIPTION OF THE


DATASET
#Note this is a comment….In R, #indicates comments.

Steps;

1. On the right side of the R Studio window, click on import; then select the second
option from the list of drop down options; i.e. From Text (readr)…….

2. From the new pop up window; select the location of the folder where in you saved
your dataset and your R Scripts file. Select the training dataset file and click on import. This
should import the training dataset on to your R environment.

3. Use the following command to view the dataset and visualise the dataset you just
uploaded….

>View(practical_dataset) ---- hit enter


> head(practical_dataset). ---- hit enter

This should give you a visual representation of the dataset you just uploaded on R.

Now, create a column name or header for the dataset; for instance; the data contains list of the most
visited websites in the world as well as some maliciously generated websites. The list of genuine
websites should be represented by ‘domain_names’; while whether or not the website is benign or
malicious is the ‘class’. Use the commands below to create a column name for the dataset and view the
dataset.

> colnames(practical_dataset)<-c("domain_name", "class")


> head(practical_dataset)

1
You need to install and load additional packages to use some special/statistical packages or services
in R. After installing the packages, you need to as well load those packages before using them.

To install a package, on the menu bar, go to Tools > Install packages. Type in the package name in
the dialog, then click install. Once you install the package, you need to load it so that it becomes
available to use. Install the following packages; stringr, caret, qdapDictionaries, qdabRegex, e1071,
randomForest, qdapTools, readr, class and cregg.

#Now load the required libraries after installing them to help with data
visualization and data preparation;

>library(stringr) ---- hit enter.


>library(qdapDictionaries) ---- hit enter.
>library(qdapRegex) ---- hit enter.
>library(caret) ---- hit enter.

#THIS IS THE FEATURE EXTRACTION PHASE - NECCESSARY FEATURES THAT CAN HELP IN
CLASSIFYING THE DOMAIN NAMES AS EITHER BENING OR MALICOUS ARE ADDED
1. #FIRST FEATURE IS EXTRACTING THE LENGTH OF THE DOMAIN

> practical_dataset$domain_name_length<-(nchar(practical_dataset$domain_name))

> head(practical_dataset)

Now, view the dataset to ensure that the feature has been extracted. This counts the number of
characters on each domain name which is an important feature in classifying if a domain name is
benign or malicious.

2. Next is to check if the domain name has numbers; use the code snippets below and ensure to
install and run the stringr library before running the code snippets.

>practical_dataset$has_numbers<-(as.integer(str_detect(practical_dataset$domain_name, "[0-9]")))

3. Next is to check if the domain name is alpha numeric; use the code snippets below and
ensure to install and run the stringr library before running the code snippets.

> practical_dataset$Is_Alpha_Numeric<-(as.integer(str_detect(practical_dataset$domain_name, "[0-


9]|[A-Z]|[a-z]"))).

4. Next is to check if the domain has special characters….use the code snippets below to do
that.

2
> practical_dataset$Has_Special_Char<-(as.integer(str_detect(practical_dataset$domain_name, '[~`!
@#$%^&*|:;"|-]')))
5. Next is to check if the domain name has consecutive numbers. Use the code snippet below to
achieve this….

> practical_dataset$Has_Consecutive_Numbers<-
(as.integer(str_detect(practical_dataset$domain_name, "([0-9]){3,}")))

6. Next is to check if the domain name consecutive alphabets….Use the code snippets below;

> practical_dataset$Has_Consecutive_Chars<-
(as.integer(str_detect(practical_dataset$domain_name, "([a-z]){3,}")))

7. Next is to check if the domain has consecutive consonants. Use the code snippets below.

> practical_dataset$Has_Consecutive_Consonants<-
(as.integer(str_detect(practical_dataset$domain_name, "([b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z])\\
1{1,}")))

8. Now we check if the domain name has consecutive vowels.

> practical_dataset$Has_Consecutive_Vowels<-
(as.integer(str_detect(practical_dataset$domain_name, "[aeiou]{2}")))

9. Next we check the number of digits in the domain. Use the following code snippets.

> practical_dataset$Number_of_Digits<-(as.integer(str_detect(practical_dataset$domain_name,
"[[:digit:]]")))

Assignment – As a take home assignment, extract from the domain names if it is a dictionary
word or not. I have given you the function to help you!

10. Next is to check if the domain name is a dictionary word or no……To do this, we need to run
the following function

> Dictionaryword<- function(x)x %in% GradyAugmented

3
4

You might also like