0% found this document useful (0 votes)
51 views

Susmita Prajapati University of Cumberlands Dr. Cynthia Mcmahon Analyzing and Visualizing Data

The document discusses analyzing a dataset on personal computer prices, specifications, and attributes using R Studio. It includes summaries and correlations of the dataset, as well as subsets focusing on specific variables. It determines that 90% of computers were premium, 528 non-premium computers were sold without CDs, and 1945 premium computers priced over $2000 were sold with CDs.

Uploaded by

Arti Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Susmita Prajapati University of Cumberlands Dr. Cynthia Mcmahon Analyzing and Visualizing Data

The document discusses analyzing a dataset on personal computer prices, specifications, and attributes using R Studio. It includes summaries and correlations of the dataset, as well as subsets focusing on specific variables. It determines that 90% of computers were premium, 528 non-premium computers were sold without CDs, and 1945 premium computers priced over $2000 were sold with CDs.

Uploaded by

Arti Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Running Head: “R Studio” 1

Susmita Prajapati

University of Cumberlands

Dr. Cynthia McMahon

Analyzing and Visualizing Data

Date :30-05-2020
Running Head: “R Studio” 2

Create a summary of stats for the dataset

> dppcomp <- read.csv("dataset_price_personal_computers_Chpt4.csv")


> View(dppcomp)
> summary(dppcomp)
X price speed hd ram
Min. : 1 Min. : 949 Min. : 25.00 Min. : 80.0 Min. : 2.000
1st Qu.:1568 1st Qu.: 1794 1st Qu.: 33.00 1st Qu.: 214.0 1st Qu.: 4.000
Median :3136 Median : 2145 Median : 50.00 Median : 340.0 Median : 8.000
Mean :3136 Mean : 2244 Mean : 52.06 Mean : 417.6 Mean : 8.303
3rd Qu.:4703 3rd Qu.: 2595 3rd Qu.: 66.00 3rd Qu.: 528.0 3rd Qu.: 8.000
Max. :6270 Max. :12223 Max. :100.00 Max. :2100.0 Max. :32.000
screen cd multi premium ads trend
Min. :14.00 no :3354 no :5395 no : 612 Min. : 39 Min. : 1.00
1st Qu.:14.00 yes:2916 yes: 875 yes:5658 1st Qu.:162 1st Qu.:10.00
Median :14.00 Median :246 Median :16.00
Mean :14.61 Mean :221 Mean :15.96
3rd Qu.:15.00 3rd Qu.:275 3rd Qu.:22.00
Max. :17.00 Max. :339 Max. :35.00

>
Running Head: “R Studio” 3

Create a correlation of stats for the dataset

> dppcomp$cd <- gsub("no", "0", dppcomp$cd)


> dppcomp$cd <- gsub("yes", "1", dppcomp$cd)
> dppcomp$multi <- gsub("no", "0", dppcomp$multi)
> dppcomp$multi <- gsub("yes", "1", dppcomp$multi)
> dppcomp$premium <- gsub("no", "0", dppcomp$premium)
> dppcomp$premium <- gsub("yes", "1", dppcomp$premium)
> dppcomptrans <- transform(dppcomp, cd = as.numeric(cd), multi = as.numeric(multi),
premium = as.numeric(premium))
> View(dppcomptrans)
> cor(dppcomptrans)
X price speed hd ram screen
X 1.00000000 -0.129498835 0.39059070 0.55765399 0.2685809 0.184878602
price -0.12949883 1.000000000 0.25822702 0.38307401 0.5315736 0.239556026
speed 0.39059070 0.258227023 1.00000000 0.37352199 0.2363263 0.189107005
hd 0.55765399 0.383074010 0.37352199 1.00000000 0.7789573 0.232534827
ram 0.26858095 0.531573568 0.23632632 0.77895731 1.0000000 0.209353665
screen 0.18487860 0.239556026 0.18910701 0.23253483 0.2093537 1.000000000
cd 0.45860251 0.166546218 0.25821198 0.50323013 0.4395044 0.130066721
multi 0.21762387 -0.012697293 0.08365003 0.09279931 0.0460338 -0.001503534
premium 0.03826171 -0.058406028 0.11470025 0.19709227 0.1973022 0.018873196
ads -0.27794995 0.001722223 -0.21900389 -0.32911767 -0.1869304 -0.094398804
trend 0.98948890 -0.117300478 0.40786546 0.58096323 0.2812251 0.188590555
cd multi premium ads trend
X 0.45860251 0.217623870 0.03826171 -0.277949946 0.98948890
price 0.16654622 -0.012697293 -0.05840603 0.001722223 -0.11730048
speed 0.25821198 0.083650026 0.11470025 -0.219003885 0.40786546
hd 0.50323013 0.092799306 0.19709227 -0.329117668 0.58096323
ram 0.43950444 0.046033796 0.19730219 -0.186930380 0.28122505
screen 0.13006672 -0.001503534 0.01887320 -0.094398804 0.18859055
cd 1.00000000 0.431912797 0.21615625 -0.062955373 0.44530962
multi 0.43191280 1.000000000 0.12469609 -0.030723034 0.21011532
premium 0.21615625 0.124696091 1.00000000 -0.152622659 0.04328013
ads -0.06295537 -0.030723034 -0.15262266 1.000000000 -0.32553576
trend 0.44530962 0.210115320 0.04328013 -0.325535759 1.00000000
>
Running Head: “R Studio” 4

What is the Min, Max, Median, and Mean of the Price

> summary(dppcomptrans)
X price speed hd ram
Min. : 1 Min. : 949 Min. : 25.00 Min. : 80.0 Min. : 2.000
1st Qu.:1568 1st Qu.: 1794 1st Qu.: 33.00 1st Qu.: 214.0 1st Qu.: 4.000
Median :3136 Median : 2145 Median : 50.00 Median : 340.0 Median : 8.000
Mean :3136 Mean : 2244 Mean : 52.06 Mean : 417.6 Mean : 8.303
3rd Qu.:4703 3rd Qu.: 2595 3rd Qu.: 66.00 3rd Qu.: 528.0 3rd Qu.: 8.000
Max. :6270 Max. :12223 Max. :100.00 Max. :2100.0 Max. :32.000
screen cd multi premium ads
Min. :14.00 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 39
1st Qu.:14.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:162
Median :14.00 Median :0.0000 Median :0.0000 Median :1.0000 Median :246
Mean :14.61 Mean :0.4651 Mean :0.1396 Mean :0.9024 Mean :221
3rd Qu.:15.00 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:275
Max. :17.00 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :339
trend
Min. : 1.00
1st Qu.:10.00
Median :16.00
Mean :15.96
3rd Qu.:22.00
Max. :35.00
>
> mean(~price, data = dppcomptrans)
[1] 2243.977
> median(~price, data = dppcomptrans)
[1] 2145
> min(~price, data = dppcomptrans)
[1] 949
> max(~price, data = dppcomptrans)
[1] 12223
> favstats(~price, data = dppcomptrans)
min Q1 median Q3 max mean sd n missing
949 1794 2145 2595 12223 2243.977 726.7628 6270 0
>
Running Head: “R Studio” 5

What is the correlation values between Price, Ram, and Ads

> cor(dppcomptrans[c("price", "ram", "ads")])


price ram ads
price 1.000000000 0.5315736 0.001722223
ram 0.531573568 1.0000000 -0.186930380
ads 0.001722223 -0.1869304 1.000000000

>
Running Head: “R Studio” 6

Create a subset of the dataset with only Price, CD, and Premium

> dppcomptrans.sub0 <- subset(dppcomptrans, select = c("price", "cd", "premium"))


> dppcomptrans.sub0 <- subset(dppcomptrans, select = c("price", "cd", "premium"))
> View(dppcomptrans.sub0)
> cor(dppcomptrans.sub0)
price cd premium
price 1.00000000 0.1665462 -0.05840603
cd 0.16654622 1.0000000 0.21615625
premium -0.05840603 0.2161562 1.00000000

>
Running Head: “R Studio” 7

Create a subset of the dataset with only Price, HD, and Ram where Price is greater than or equal to $1750

> dppcomptrans.sub1 <- subset(dppcomptrans, select = c("price", "hd", "ram"))


> dppcomptrans.sub1 <- subset(dppcomptrans, price >= 1750, select = c("price", "hd",
"ram"))
> View(dppcomptrans.sub1)
> cor(dppcomptrans.sub1)
price hd ram
price 1.0000000 0.2996405 0.4224131
hd 0.2996405 1.0000000 0.7761332
ram 0.4224131 0.7761332 1.0000000
>

>
Running Head: “R Studio” 8

What percentage of Premium computers were sold

> tally(~premium, data = dppcomptrans, margins = TRUE, format = "perc")


premium
0 1 Total
9.760766 90.239234 100.000000

>
Running Head: “R Studio” 9

How many Premium computers with CDs were sold

> tally(~premium + cd, data = dppcomptrans, margins = TRUE)


cd
premium 0 1 Total
0 528 84 612
1 2826 2832 5658
Total 3354 2916 6270

>
Running Head: “R Studio” 10

How many Premium computers with CDs priced over $2000 were sold

> tally(~premium + cd|price > 2000, data = dppcomptrans, margins = TRUE)


, , price > 2000 = TRUE

cd
premium 0 1 Total
0 329 65 394
1 1301 1945 3246
Total 1630 2010 3640

, , price > 2000 = FALSE

cd
premium 0 1 Total
0 199 19 218
1 1525 887 2412
Total 1724 906 2630

>

You might also like