0% found this document useful (0 votes)
9 views54 pages

bda file

The document is a practical file for a Big Data Analytics course, detailing various tasks and R scripts related to data analysis. It includes operations such as basic mathematical calculations, descriptive statistics, data reading from different formats, and visualizations like histograms and correlation plots. The file is submitted by a student to their instructor and contains an index of tasks performed using R programming.

Uploaded by

kajalc4499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views54 pages

bda file

The document is a practical file for a Big Data Analytics course, detailing various tasks and R scripts related to data analysis. It includes operations such as basic mathematical calculations, descriptive statistics, data reading from different formats, and visualizations like histograms and correlation plots. The file is submitted by a student to their instructor and contains an index of tasks performed using R programming.

Uploaded by

kajalc4499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Practical File

Big Data Analytics


PCC-CSE-404G

Submitted by: Submitted to:


Kajal Dr. Chhavi Rana

27514

CSE-A
Index
Sr.No. Title Remarks
1 Perform basic Mathematical
Operations using R.
2 Write an R script to find basic
descriptive statistics using
summary, str, quartile function on
mtcars & cars datasets.
3 Write an R script to find subset of
dataset by using subset ( ),
aggregate ( ) functions on iris
dataset.
4 Reading different types of data
sets (.txt, .csv) from web and disk
and writing in file in specific disk
Location.
5 Reading Excel data sheet in R.
6 Reading XML dataset in R.
7 Find the data distributions using
box and scatter plot.
8 Find the outliers using the
previous plot.
9 Plot a histogram using the given
sample data.
10 Plot a bar chart using the given
sample data.
11 Plot the bar chart using the given
sample data.
12 Find a Correlation matrix and plot
the correlation on iris data set.
13 Plot the correlation plot on the
dataset and visualize, giving an
overview of relationships among
data on the iris data.
14 Analysis of covariance for the iris
dataset with categorical variables.
15 Plot the given cluster data using R
visualizations.
1. Perform basic Mathematical
Operations using R.

> A = 1563123

> B = 65132334

> C = A+B

>C

[1] 66695457

> D = B-A

>D

[1] 63569211

> E = A*B

>E

[1] 1.018098e+14

> F = A/B

>F

[1] 0.02399919

> class(A)

[1] "numeric"

> class(B)

[1] "numeric"

> class(C)

[1] "numeric"

> class(D)

[1] "numeric"
> class(E)

[1] "numeric"

> class(F)

[1] "numeric"

> E<F

[1] FALSE
Screenshots:
2. Write an R script to find basic
descriptive statistics using summary, str,
quantile function on mtcars & cars datasets.
> mtcars

mpg cyl disp hp drat wt qsec vs am gear carb

Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4

Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1

Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4

Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2

Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4

Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3

Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3

Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3

Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4

Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4

Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1

Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2


Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1

Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1

Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2

AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2

Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4

Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2

Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1

Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2

Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2

Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4

Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6

Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8

Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

> summary(mtcars)

mpg cyl disp hp

Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0

1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5

Median :19.20 Median :6.000 Median :196.3 Median :123.0

Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7

3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0

Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0

drat wt qsec vs

Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000

1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000

Median :3.695 Median :3.325 Median :17.71 Median :0.0000

Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375


3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000

Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000

am gear carb

Min. :0.0000 Min. :3.000 Min. :1.000

1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000

Median :0.0000 Median :4.000 Median :2.000

Mean :0.4062 Mean :3.688 Mean :2.812

3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000

Max. :1.0000 Max. :5.000 Max. :8.000

> str(mtcars)

'data.frame': 32 obs. of 11 variables:

$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...

$ disp: num 160 160 108 258 360 ...

$ hp : num 110 110 93 110 175 105 245 62 95 123 ...

$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...

$ wt : num 2.62 2.88 2.32 3.21 3.44 ...

$ qsec: num 16.5 17 18.6 19.4 17 ...

$ vs : num 0 0 1 1 0 1 0 1 1 1 ...

$ am : num 1 1 1 0 0 0 0 0 0 0 ...

$ gear: num 4 4 4 3 3 3 3 4 4 4 ...

$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

> quantile(mtcars$mpg)

0% 25% 50% 75% 100%

10.400 15.425 19.200 22.800 33.900

> cars
speed dist

1 4 2

2 4 10

3 7 4

4 7 22

5 8 16

6 9 10

7 10 18

8 10 26

9 10 34

10 11 17

11 11 28

12 12 14

13 12 20

14 12 24

15 12 28

16 13 26

17 13 34

18 13 34

19 13 46

20 14 26

21 14 36

22 14 60

23 14 80

24 15 20

25 15 26
26 15 54

27 16 32

28 16 40

29 17 32

30 17 40

31 17 50

32 18 42

33 18 56

34 18 76

35 18 84

36 19 36

37 19 46

38 19 68

39 20 32

40 20 48

41 20 52

42 20 56

43 20 64

44 22 66

45 23 54

46 24 70

47 24 92

48 24 93

49 24 120

50 25 85

> summary(cars)
speed dist

Min. : 4.0 Min. : 2.00

1st Qu.:12.0 1st Qu.: 26.00

Median :15.0 Median : 36.00

Mean :15.4 Mean : 42.98

3rd Qu.:19.0 3rd Qu.: 56.00

Max. :25.0 Max. :120.00

> class(cars)

[1] "data.frame"

> dim(cars)

[1] 50 2

> str(cars)

'data.frame': 50 obs. of 2 variables:

$ speed: num 4 4 7 7 8 9 10 10 10 11 ...

$ dist : num 2 10 4 22 16 10 18 26 34 17 ...

> quantile(cars$speed)

0% 25% 50% 75% 100%

4 12 15 19 25
Screenshots:
3. Write an R script to find subset of
dataset by using subset ( ), aggregate ( )
functions on iris dataset.
> iris

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.0 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa

7 4.6 3.4 1.4 0.3 setosa

8 5.0 3.4 1.5 0.2 setosa

9 4.4 2.9 1.4 0.2 setosa

10 4.9 3.1 1.5 0.1 setosa

11 5.4 3.7 1.5 0.2 setosa

12 4.8 3.4 1.6 0.2 setosa

13 4.8 3.0 1.4 0.1 setosa

14 4.3 3.0 1.1 0.1 setosa

15 5.8 4.0 1.2 0.2 setosa

16 5.7 4.4 1.5 0.4 setosa

17 5.4 3.9 1.3 0.4 setosa

18 5.1 3.5 1.4 0.3 setosa

19 5.7 3.8 1.7 0.3 setosa


20 5.1 3.8 1.5 0.3 setosa

21 5.4 3.4 1.7 0.2 setosa

22 5.1 3.7 1.5 0.4 setosa

23 4.6 3.6 1.0 0.2 setosa

24 5.1 3.3 1.7 0.5 setosa

25 4.8 3.4 1.9 0.2 setosa

26 5.0 3.0 1.6 0.2 setosa

27 5.0 3.4 1.6 0.4 setosa

28 5.2 3.5 1.5 0.2 setosa

29 5.2 3.4 1.4 0.2 setosa

30 4.7 3.2 1.6 0.2 setosa

31 4.8 3.1 1.6 0.2 setosa

32 5.4 3.4 1.5 0.4 setosa

33 5.2 4.1 1.5 0.1 setosa

34 5.5 4.2 1.4 0.2 setosa

35 4.9 3.1 1.5 0.2 setosa

36 5.0 3.2 1.2 0.2 setosa

37 5.5 3.5 1.3 0.2 setosa

38 4.9 3.6 1.4 0.1 setosa

39 4.4 3.0 1.3 0.2 setosa

40 5.1 3.4 1.5 0.2 setosa

41 5.0 3.5 1.3 0.3 setosa

42 4.5 2.3 1.3 0.3 setosa

43 4.4 3.2 1.3 0.2 setosa

44 5.0 3.5 1.6 0.6 setosa

45 5.1 3.8 1.9 0.4 setosa


46 4.8 3.0 1.4 0.3 setosa

47 5.1 3.8 1.6 0.2 setosa

48 4.6 3.2 1.4 0.2 setosa

49 5.3 3.7 1.5 0.2 setosa

50 5.0 3.3 1.4 0.2 setosa

51 7.0 3.2 4.7 1.4 versicolor

52 6.4 3.2 4.5 1.5 versicolor

53 6.9 3.1 4.9 1.5 versicolor

54 5.5 2.3 4.0 1.3 versicolor

55 6.5 2.8 4.6 1.5 versicolor

56 5.7 2.8 4.5 1.3 versicolor

57 6.3 3.3 4.7 1.6 versicolor

58 4.9 2.4 3.3 1.0 versicolor

59 6.6 2.9 4.6 1.3 versicolor

60 5.2 2.7 3.9 1.4 versicolor

61 5.0 2.0 3.5 1.0 versicolor

62 5.9 3.0 4.2 1.5 versicolor

63 6.0 2.2 4.0 1.0 versicolor

64 6.1 2.9 4.7 1.4 versicolor

65 5.6 2.9 3.6 1.3 versicolor

66 6.7 3.1 4.4 1.4 versicolor

67 5.6 3.0 4.5 1.5 versicolor

68 5.8 2.7 4.1 1.0 versicolor

69 6.2 2.2 4.5 1.5 versicolor

70 5.6 2.5 3.9 1.1 versicolor

71 5.9 3.2 4.8 1.8 versicolor


72 6.1 2.8 4.0 1.3 versicolor

73 6.3 2.5 4.9 1.5 versicolor

74 6.1 2.8 4.7 1.2 versicolor

75 6.4 2.9 4.3 1.3 versicolor

76 6.6 3.0 4.4 1.4 versicolor

77 6.8 2.8 4.8 1.4 versicolor

78 6.7 3.0 5.0 1.7 versicolor

79 6.0 2.9 4.5 1.5 versicolor

80 5.7 2.6 3.5 1.0 versicolor

81 5.5 2.4 3.8 1.1 versicolor

82 5.5 2.4 3.7 1.0 versicolor

83 5.8 2.7 3.9 1.2 versicolor

84 6.0 2.7 5.1 1.6 versicolor

85 5.4 3.0 4.5 1.5 versicolor

86 6.0 3.4 4.5 1.6 versicolor

87 6.7 3.1 4.7 1.5 versicolor

88 6.3 2.3 4.4 1.3 versicolor

89 5.6 3.0 4.1 1.3 versicolor

90 5.5 2.5 4.0 1.3 versicolor

91 5.5 2.6 4.4 1.2 versicolor

92 6.1 3.0 4.6 1.4 versicolor

93 5.8 2.6 4.0 1.2 versicolor

94 5.0 2.3 3.3 1.0 versicolor

95 5.6 2.7 4.2 1.3 versicolor

96 5.7 3.0 4.2 1.2 versicolor

97 5.7 2.9 4.2 1.3 versicolor


98 6.2 2.9 4.3 1.3 versicolor

99 5.1 2.5 3.0 1.1 versicolor

100 5.7 2.8 4.1 1.3 versicolor

101 6.3 3.3 6.0 2.5 virginica

102 5.8 2.7 5.1 1.9 virginica

103 7.1 3.0 5.9 2.1 virginica

104 6.3 2.9 5.6 1.8 virginica

105 6.5 3.0 5.8 2.2 virginica

106 7.6 3.0 6.6 2.1 virginica

107 4.9 2.5 4.5 1.7 virginica

108 7.3 2.9 6.3 1.8 virginica

109 6.7 2.5 5.8 1.8 virginica

110 7.2 3.6 6.1 2.5 virginica

111 6.5 3.2 5.1 2.0 virginica

112 6.4 2.7 5.3 1.9 virginica

113 6.8 3.0 5.5 2.1 virginica

114 5.7 2.5 5.0 2.0 virginica

115 5.8 2.8 5.1 2.4 virginica

116 6.4 3.2 5.3 2.3 virginica

117 6.5 3.0 5.5 1.8 virginica

118 7.7 3.8 6.7 2.2 virginica

119 7.7 2.6 6.9 2.3 virginica

120 6.0 2.2 5.0 1.5 virginica

121 6.9 3.2 5.7 2.3 virginica

122 5.6 2.8 4.9 2.0 virginica

123 7.7 2.8 6.7 2.0 virginica


124 6.3 2.7 4.9 1.8 virginica

125 6.7 3.3 5.7 2.1 virginica

126 7.2 3.2 6.0 1.8 virginica

127 6.2 2.8 4.8 1.8 virginica

128 6.1 3.0 4.9 1.8 virginica

129 6.4 2.8 5.6 2.1 virginica

130 7.2 3.0 5.8 1.6 virginica

131 7.4 2.8 6.1 1.9 virginica

132 7.9 3.8 6.4 2.0 virginica

133 6.4 2.8 5.6 2.2 virginica

134 6.3 2.8 5.1 1.5 virginica

135 6.1 2.6 5.6 1.4 virginica

136 7.7 3.0 6.1 2.3 virginica

137 6.3 3.4 5.6 2.4 virginica

138 6.4 3.1 5.5 1.8 virginica

139 6.0 3.0 4.8 1.8 virginica

140 6.9 3.1 5.4 2.1 virginica

141 6.7 3.1 5.6 2.4 virginica

142 6.9 3.1 5.1 2.3 virginica

143 5.8 2.7 5.1 1.9 virginica

144 6.8 3.2 5.9 2.3 virginica

145 6.7 3.3 5.7 2.5 virginica

146 6.7 3.0 5.2 2.3 virginica

147 6.3 2.5 5.0 1.9 virginica

148 6.5 3.0 5.2 2.0 virginica

149 6.2 3.4 5.4 2.3 virginica


150 5.9 3.0 5.1 1.8 virginica

> aggregate(. ~Species, data=iris, mean)

Species Sepal.Length Sepal.Width Petal.Length Petal.Width

1 setosa 5.006 3.428 1.462 0.246

2 versicolor 5.936 2.770 4.260 1.326

3 virginica 6.588 2.974 5.552 2.026

> subset(iris, iris$Sepal.Length==5.0)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

5 5 3.6 1.4 0.2 setosa

8 5 3.4 1.5 0.2 setosa

26 5 3.0 1.6 0.2 setosa

27 5 3.4 1.6 0.4 setosa

36 5 3.2 1.2 0.2 setosa

41 5 3.5 1.3 0.3 setosa

44 5 3.5 1.6 0.6 setosa

50 5 3.3 1.4 0.2 setosa

61 5 2.0 3.5 1.0 versicolor

94 5 2.3 3.3 1.0 versicolor


Screenshots:
4. Reading different types of data sets
(.txt, .csv) from web and disk and writing
in file in specific disk Location.
> library(utils)

> data<-read.csv("input.csv")

> data

id name salary start_date dept

1 1 Rick 623.30 2012-01-01 IT

2 2 Dan 515.20 2013-09-23 Operations

3 3 Michelle 611.00 2014-11-15 IT

4 4 Ryan 729.00 2014-05-11 HR

5 5 Gary 843.25 2015-03-27 Finance

6 6 Nina 578.00 2013-05-21 IT

7 7 Simon 632.80 2013-07-30 Operations

8 8 Guru 722.50 2014-06-17 Finance

> print(is.data.frame(data))

[1] TRUE

> print(ncol(data))

[1] 5

> print(nrow(data))

[1] 8

> # Getting the max salary.

> sal <- max(data$salary)

> sal
[1] 843.25

> # Getting the details of the person with the max salary.

> details <-subset(data, salary==sal)

> details

id name salary start_date dept

5 5 Gary 843.25 2015-03-27 Finance

> # Getting the details of all the employees working in the IT department.

> it_details<-subset(data, dept=="IT")

> it_details

id name salary start_date dept

1 1 Rick 623.3 2012-01-01 IT

3 3 Michelle 611.0 2014-11-15 IT

6 6 Nina 578.0 2013-05-21 IT

> # Getting the details of the employees employed after 2014-01-01

> join_details <- subset(data, as.Date(start_date)>as.Date("2014-01-01"))

> join_details

id name salary start_date dept

3 3 Michelle 611.00 2014-11-15 IT

4 4 Ryan 729.00 2014-05-11 HR

5 5 Gary 843.25 2015-03-27 Finance

8 8 Guru 722.50 2014-06-17 Finance

> # Writing the join_details into a new file.

> write.csv(join_details, "output.csv")

> newdata <- read.csv("output.csv")

> newdata

X id name salary start_date dept


1 3 3 Michelle 611.00 2014-11-15 IT

24 4 Ryan 729.00 2014-05-11 HR

35 5 Gary 843.25 2015-03-27 Finance

48 8 Guru 722.50 2014-06-17 Finance


Screenshots:
5. Reading Excel data sheet in R.
> install.packages("xlsx")

Installing package into ‘C:/Users/Avi/AppData/Local/R/win-library/4.2’

(as ‘lib’ is unspecified)

--- Please select a CRAN mirror for use in this session ---

also installing the dependencies ‘rJava’, ‘xlsxjars’

trying URL 'https://rweb.crmda.ku.edu/cran/bin/windows/contrib/4.2/rJava_1.0-6.zip'

Content type 'application/zip' length 1245703 bytes (1.2 MB)

downloaded 1.2 MB

trying URL 'https://rweb.crmda.ku.edu/cran/bin/windows/contrib/4.2/xlsxjars_0.6.1.zip'

Content type 'application/zip' length 9485708 bytes (9.0 MB)

downloaded 9.0 MB

trying URL 'https://rweb.crmda.ku.edu/cran/bin/windows/contrib/4.2/xlsx_0.6.5.zip'

Content type 'application/zip' length 374907 bytes (366 KB)

downloaded 366 KB

package ‘rJava’ successfully unpacked and MD5 sums checked

package ‘xlsxjars’ successfully unpacked and MD5 sums checked

package ‘xlsx’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

C:\Users\Avi\AppData\Local\Temp\RtmpyWUuTJ\downloaded_packages
> library("xlsx")

> getwd()

[1] "C:/Users/Avi/Documents"

> setwd("G:/Avi/8th Sem/BDA/LabWork")

> getwd()

[1] "G:/Avi/8th Sem/BDA/LabWork"

> library("xlsx")

> data<-read.xlsx("input.xlsx", sheetIndex=1)

> data

id name salary start_date dept

1 1 Rick 623.30 2012-01-01 IT

2 2 Dan 515.20 2013-09-23 Operations

3 3 Michelle 611.00 2014-11-15 IT

4 4 Ryan 729.00 2014-05-11 HR

5 5 Gary 843.25 2015-03-27 Finance

6 6 Nina 578.00 2013-05-21 IT

7 7 Simon 632.80 2013-07-30 Operations

8 8 Guru 722.50 2014-06-17 Finance


Screenshots:
6. Reading XML dataset in R.
> library("XML")

> library("methods")

> result<-xmlParse(file="input.xml")

> result

<?xml version="1.0"?>

<?xml-stylesheet href="sheet.css"?>

<dataset>

<data>

<id>"1"</id>

<name>"Rick"</name>

<salary>"623.3"</salary>

<start_date>"2012-01-01"</start_date>

<dept>"IT"</dept>

</data>

<data>

<id>"2"</id>

<name>"Dan"</name>

<salary>"515.2"</salary>

<start_date>"2013-09-23"</start_date>

<dept>"Operations"</dept>

</data>

<data>

<id>"3"</id>

<name>"Michelle"</name>
<salary>"611"</salary>

<start_date>"2014-11-15"</start_date>

<dept>"IT"</dept>

</data>

<data>

<id>"4"</id>

<name>"Ryan"</name>

<salary>"729"</salary>

<start_date>"2014-05-11"</start_date>

<dept>"HR"</dept>

</data>

<data>

<id>"5"</id>

<name>"Gary"</name>

<salary>"843.25"</salary>

<start_date>"2015-03-27"</start_date>

<dept>"Finance"</dept>

</data>

<data>

<id>"6"</id>

<name>"Nina"</name>

<salary>"578"</salary>

<start_date>"2013-05-21"</start_date>

<dept>"IT"</dept>

</data>

<data>
<id>"7"</id>

<name>"Simon"</name>

<salary>"632.8"</salary>

<start_date>"2013-07-30"</start_date>

<dept>"Operations"</dept>

</data>

<data>

<id>"8"</id>

<name>"Guru"</name>

<salary>"722.5"</salary>

<start_date>"2014-06-17"</start_date>

<dept>"Finance"</dept>

</data>

</dataset>
Screenshots:
7. Find the data distributions using box
and scatter plot.
> library(ggplot2)

> input <- mtcars[,c('mpg','cyl')]

> input

mpg cyl

Mazda RX4 21.0 6

Mazda RX4 Wag 21.0 6

Datsun 710 22.8 4

Hornet 4 Drive 21.4 6

Hornet Sportabout 18.7 8

Valiant 18.1 6

Duster 360 14.3 8

Merc 240D 24.4 4

Merc 230 22.8 4

Merc 280 19.2 6

Merc 280C 17.8 6

Merc 450SE 16.4 8

Merc 450SL 17.3 8

Merc 450SLC 15.2 8

Cadillac Fleetwood 10.4 8

Lincoln Continental 10.4 8

Chrysler Imperial 14.7 8

Fiat 128 32.4 4

Honda Civic 30.4 4


Toyota Corolla 33.9 4

Toyota Corona 21.5 4

Dodge Challenger 15.5 8

AMC Javelin 15.2 8

Camaro Z28 13.3 8

Pontiac Firebird 19.2 8

Fiat X1-9 27.3 4

Porsche 914-2 26.0 4

Lotus Europa 30.4 4

Ford Pantera L 15.8 8

Ferrari Dino 19.7 6

Maserati Bora 15.0 8

Volvo 142E 21.4 4

> boxplot(mpg~cyl, data=mtcars,xlab="Number of Cylinders", ylab="Miles per Gallon",


main="Mileage Data")
Screenshots:
8. Find the outliers using the previous
plot.
> v=c(50, 75, 100, 125, 150, 175, 200)

> boxplot(v)
Screenshots:
9. Plot a histogram using the given
sample data.
Histogram:
> library(graphics)

> v <- c(9, 13, 21, 8, 36, 22, 12, 41, 31, 33, 19)

> hist(v, xlab="Weight", col="green", border="green")


Screenshots:
10. Plot a bar chart using the given
sample data.
Bar Chart:
> H <- c(7, 12, 28, 3, 41)

> M <- c("Jan", "Feb", "Mar", "Apr", "May")

> barplot(H, names.arg=M, xlab="Month", ylab="Revenue", col="blue", main="Revenue Chart",


border="blue")
Screenshots:
11. Plot the bar chart using the given
sample data.
Pie Chart:
> library(graphics)

> x <- c(21, 62, 10, 53)

> labels <- c("London", "NewYork", "Singapore", "Mumbai")

> pie(x, labels)


Screenshots:
12. Find a Correlation matrix and plot the
correlation on iris data set.
> d <- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10))

> cor(d)

x1 x2 x3

x1 1.0000000 0.47514914 -0.21575367

x2 0.4751491 1.00000000 0.09190779

x3 -0.2157537 0.09190779 1.00000000

> m <- cor(d)

> library(corrplot)

corrplot 0.92 loaded

> corrplot(m, method="square")

> x <- matrix(rnorm(2), nrow=5, ncol=4)

> y <- matrix(rnorm(15), nrow=5, ncol=3)

> COR <- cor(x, y)

> COR
Screenshots:
13. Plot the correlation plot on the dataset
and visualize, giving an overview of
relationships among data on the iris data.
> Y <- seq(dim(y)[2])

> Z <- COR

> image(x, Y, Z, xlab="X Column", ylab="Y Column")


Screenshots:
14. Analysis of covariance for the iris
dataset with categorical variables.
> data(iris)

> str(iris)

'data.frame': 150 obs. of 5 variables:

$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

> ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length), geom_point(size=2, colour="black"),


geom_point(size=1, colour="white"), geom_smooth(aes(colour="black"), method="lm"),
ggtitle("sepal.lengthvspetal.length"), xlab("Sepal.Length"),
ylab("Petal.Length"),these(legend.position="none"))
Screenshots:
15. Plot the given cluster data using R
visualizations.
> library(cluster)

> set.seed(20)

> irisCluster <- kmeans(iris[, 3:4], 3, nstart=20)

> irisCluster

K-means clustering with 3 clusters of sizes 52, 48, 50

Cluster means:

Petal.Length Petal.Width

1 4.269231 1.342308

2 5.595833 2.037500

3 1.462000 0.246000

Clustering vector:

[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
31111111111111111111111111112111112111

[88] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2
222122222222222

Within cluster sum of squares by cluster:

[1] 13.05769 16.29167 2.02200

(between_SS / total_SS = 94.3 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size"


"iter" "ifault"
Screenshots:

You might also like