21NKU14 - Preprosessing Assignment
21NKU14 - Preprosessing Assignment
> class(X100_Data)
[4] "data.frame"
> dim(X100_Data)
[1] 100 14
Dimension was viewed as a number of rows and columns as 100 and 14 respectively
> nrow(X100_Data)
[1] 100
> ncol(X100_Data)
[1] 14
> names(X100_Data)
It shows the name of the headings in the dataset table. 14 headings = 14 columns
$ Region : chr [1:100] "Australia and Oceania" "Central America and the Caribbean"
"Europe" "Sub-Saharan Africa" ...
$ Country : chr [1:100] "Tuvalu" "Grenada" "Russia" "Sao Tome and Principe" ...
$ Item Type : chr [1:100] "Baby Food" "Cereal" "Office Supplies" "Fruits" ...
$ Units Sold : num [1:100] 9925 2804 1779 8102 5062 ...
$ Unit Price : num [1:100] 255.28 205.7 651.21 9.33 651.21 ...
$ Unit Cost : num [1:100] 159.42 117.11 524.96 6.92 524.96 ...
$ Total Revenue : num [1:100] 2533654 576783 1158503 75592 3296425 ...
$ Total Cost : num [1:100] 1582244 328376 933904 56066 2657348 ...
$ Total Profit : num [1:100] 951411 248406 224599 19526 639078 ...
- attr(*, "spec")=
.. cols(
.. Region = col_character(),
.. Country = col_character(),
.. )
- attr(*, "problems")=<externalptr>
The structure of the dataset gave the character and numeric differentiations in the table. Like
Unit price is structured as numeric such as 159.42 and the Sales channel is structured as a
character such as offline.
> summary(X100_Data)
Region Country
Length:100 Length:100
Length:100 Length:100
Length:100 Length:100
Class :character Class :character
It gave the data summary with the minimum values, 1st and 3rd quartile values, Mean, median
and mode, maximum values, and the count of not available values.
> head(X100_Data)
> head(X100_Data,n=15)
It shows the top 6 rows by default. When typing the command head (X100_Data, n=15), shows
the first 15 rows.
> tail(X100_Data)
1.Bar plot:
> Region<-table(X100_Data$Region)
> Region
2.Box plot:
> par(mfrow=c(1,2))
Outliers occurred.
> X100_Data1<-X100_Data
> attach(X100_Data)
> caps<-quantile(x,probs=c(.05,.95),na.rm=T)
> H <- 1.5 * IQR(x, na.rm = T)
> data("X100_Data")
> any(is.na(X100_Data[]))
[1] FALSE
> sum(is.na(X100_Data[]))
[1] 0
> colSums(is.na(X100_Data[]))
> nrow(X100_Data)
[1] 100
> nrow(X100_Data1)
[1] 100
[1] 441682
The non-available values are detected from the above diagram, and code is done for total profit
and the value was included in the dataset table.
>x
It can be an important pre-processing step for many machine-learning algorithms
> scale(x)
The center parameter takes either a numeric alike vector or logical value.