0% found this document useful (0 votes)
23 views21 pages

Practical List

The document outlines a practical list for a Basic Statistics course using R, detailing various tasks such as creating data files, calculating statistical measures (mean, median, mode), and performing data analysis on different datasets. It includes exercises on interpreting results, generating plots, and computing correlation coefficients. The tasks cover a wide range of statistical concepts and applications, aiming to enhance practical skills in data analysis using R.

Uploaded by

Film City
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views21 pages

Practical List

The document outlines a practical list for a Basic Statistics course using R, detailing various tasks such as creating data files, calculating statistical measures (mean, median, mode), and performing data analysis on different datasets. It includes exercises on interpreting results, generating plots, and computing correlation coefficients. The tasks cover a wide range of statistical concepts and applications, aiming to enhance practical skills in data analysis using R.

Uploaded by

Film City
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Basic Statistics Using R

Sem-IV
(Practical List)
1 Create file (data.txt) using following data
12,15,17,18,14,12,6,23,22,12,17,18,15,20,25
Generate mean , Median, Mode using separate command.
2 Create file (data.txt) using following data
33.5 57.1 49.7 40.2 44.2 45.2 47.8 38.0
53.9 41.1 41.7 40.8 55.5 43.5 49.1 49.9
Generate mean, Median reading file created.
3 Create file (data.txt) using following data
71 70 75 77 85 80 71 79 80 90 70 60 56 45 57 60 67
Generate mean , Median, Mode reading file created.
4 Create file (data.txt) using following data
25,13,24,35,36,45,47,12,15,17,18,14,12,6,23,22,12,17,18,15,20,25
Generate mean , Median, Mode reading file created.
5 Create file (data.txt) using following data
253 113 104 169 118

80 175 134 131 225

11 158 467 95 124

55 254 198 0 151

101 69 161 129 0

Generate mean , Median, Mode reading file created.


6 Use R to answer the following questions. Write down the answers.

Calculate sqrt(100 – 3 x 4.52)


If x = 3, y= 2 and z = -5 what is the value of x3 – 2xy2 + 3z + z2?
Find the mean value of the sample below: 43, 52, 25, 44, 36, 37, 40, 43, 40, 32
You recorded your car’s mileage at your last eight fill-ups as
65311, 65624, 65908, 66219, 66499, 66821, 67145, 67447
Enter these numbers into the variable petrol. Use the function diff( ) on the data.
What does it give?
Interpret what both of these commands return: mean(petrol) and
mean(diff(petrol)).

7 The data below are the number of hours spent watching television per week for a
sample of 34 households.
23.1 15.9 21.0 26.0 25.1 14.7 24.2 16.6
18.2 16.5 20.7 15.3 17.7 19.1 22.7 21.9
14.6 26.3 25.8 9.4 17.0 21.2 17.9 24.7
21.1 17.2 19.1 22.7 24.0 24.7 22.5 8.3 2.5
30.4
create file using data given and calculate
For each data set find the mean, median, standard deviation, quartiles and interquartile
range using separate commands.
8 Use R to answer the following questions. Write down the answers.
a. Create a vector of coefficients for a quadratic equation, using the sample
function. Here, we draw a sample of size 3 from − 20,−19,...,19,20 with
replacement. (2)
b. Determine the names associated with the vector.
c. Prepare to plot the equation, by constructing a regularly spaced vector for the
horizontal axis.
d. Create a vector of type character, and display the second element.
Find the Median of the sample below: 43, 52, 25, 44, 36, 37, 40, 43, 40, 32
9 The data below are the number of hours between charging on a particular type of
mobile phone
45.8 41.1 55.9 46.6 57.0 45.0 58.5 46.7
49.3 52.7 54.9 48.5 40.4 44.4 51.0 44.2
59.1 46.9 50.7 43.7 41.7 52.8 60.5 38.5
60.4 53.8 47.3 50.2 58.8 50.7

create file using data given and calculate


For each data set find the mean, median, standard deviation, quartiles and interquartile
range using seprate commands.
10 Use R to answer the following questions. Write down the answers.
a. use scan command to enter the data 35 62 81 45 24 46 .
b. Find the mean of these data.
c. Find the log values of data and store in another variable and display log values.
d. Find the standard deviation of the sample below:
65311, 65624, 65908, 66219, 66499, 66821, 67145, 67447
11 The data below are the number of miles travelled to and from work each day by a
sample of 12 company employees.
3.7 14.3 11.0 26.5 5.2 4.8 24.2 16.9 8.2
26.5 40.7 5.3
create file using data given and calculate
For each data set find the mean, median, standard deviation, quartiles and interquartile
range using seprate commands.
12 Use R to answer the following questions. Write down the answers.
a. Create a logical vector
b. negate this vector.
c. Explore arithmetic with logical and numeric.
d. Compute the intersection of {1,2,...,10} and {5,6,...,15} .
e. There are many pre-defined system objects. Display the value of pi - note that
this is a reserved word .
13 Use R to answer the following questions. Write down the answers.
a. Generate a numbers from 1-100.
b. If x = 5, y= 3 and z = -4 what is the value of x2 – 2xy + 3z + z2?
c. Find the mean value of the sample below:
14, 21, 23 , 21, 16, 19, 22, 25, 16, 16
14 The closing prices of 40 common stocks follow.
29.63 34.00 43.25 8.75 37.88 8.63 7.63 30.38 35.25 19.38
9.25 16.50 38.00 53.38 16.63 1.25 48.38 18.00 9.38 9.25
10.00 25.02 18.00 8.00 28.50 24.25 21.63 18.50 33.63 31.13
35.25 29.63 79.38 11.38 38.88 11.50 52.00 14.00 9.00 33.50
create file using data given and calculate
For each data set find the mean, median, standard deviation, quartiles and interquartile
range using seprate commands.
15 Assume that we have registered the height and weight for four people: Heights in cm
are 180, 165, 160, 193; weights in kg are 87, 58, 65, 100.
a. Make two vectors,height and weight, with the data.
b. The bodymass index (BMI) is defined as weight in kg / (height in m)2. Make a
vector with the BMI values for the four people,
c. Make a vector with the natural logarithm to the BMI values.
16 eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
a. Find the interquartile range of eruption duration in the data set faithful.
b. Find the mean eruption duration in the data set faithful.
c. Find the quartiles of the eruption durations in the data set faithful.
17 Police records show the following number of daily crime reports for a sample of days
during the winter months and a sample of days during the summer months.

Winter Summer
18 28
20 18
15 24
16 32
21 18
20 29
12 23
16 38
19 28
20 18
a) Compute the range and inter quartile range for the each period.
b) Compute the variance and standard deviation for the each period.
18 Use R to answer the following questions. Write down the answers.

a. Create a logical vector


b. negate this vector.
c. use scan command to enter the data 35 62 81 45 24 46
d. Find the mean of these data.
Find the log values of data and store in another variable and display log values.
19 Following data shows marks of 10 students in two subjects SM and OS. Using
coefficient of variation, determine the subject in which students have consistent
performance.
SM 15 20 18 30 25 12 22 24 20 10
OS 12 18 20 25 20 15 25 20 15 15
20 A department of transportation’s study on driving speed and mileage for midsize
automobiles in the following data.
Driving Speed 30 50 40 55 30 25 60 25 50 55

Mileage 28 25 25 23 30 32 21 35 26 25

Compute and interpret the sample correlation coefficient.


21 The following data are the monthly salaries y and GPA x for the students who obtained
the bachelors’ degree in business administration.
GPA 2.6 3.4 3.6 3.2 3.5 2.9
Y 3300 3600 4000 3500 3900 3600
a. Enter these data into two columns in R.
b. Find the correlation coefficient.
22 Assume that we have registered the height and weight for four people: Heights in cm
are 180, 165, 160, 193; weights in kg are 87, 58, 65, 100.
Calculate correlation coefficient between height and wegiht. (2)
Write interpretation about correlation coefficient. (2)
23 Find the correlation coefficient of the eruption duration and waiting time in the file
faithful. Observe if there is any linear relationship between the variables. Interpret the
result.
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
24 Find the covariance of the eruption duration and waiting time in the data set faithful.
Observe if there is any linear relationship between the two variables and interpret the
result.
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55

25 Police records show the following number of daily crime reports for a sample of days
during the winter months and a sample of days during the summer months.

Winter Summer
18 28
20 18
15 24
16 32
21 18
20 29
12 23
16 38
19 28
20 18
Compute the Coefficient of variation for the each period.
Compare the variability of the two periods.
26 Find the scatter plot of the eruption durations and waiting intervals in faithful. Does it
reveal any relationship between the variables
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
27 The closing prices of 40 common stocks follow.
29.63 34.00 43.25 8.75 37.88 8.63 7.63 30.38 35.25 19.38
9.25 16.50 38.00 53.38 16.63 1.25 48.38 18.00 9.38 9.25
10.00 25.02 18.00 8.00 28.50 24.25 21.63 18.50 33.63 31.13
35.25 29.63 79.38 11.38 38.88 11.50 52.00 14.00 9.00 33.50
create file using data given and Obtain a Histogram.
28 Use scan command to enter the data 35 62 81 45 24 46 .Obtain the box plot the data.
29 The data below are the number of miles travelled to and from work each day by a
sample of 12 company employees.
3.7 14.3 11.0 26.5 5.2 4.8 24.2 16.9 8.2
26.5 40.7 5.3
create file using data given and Obtain a smooth curve through the histogram.
30 The data below are the number of hours between charging on a particular type of
mobile phone
45.8 41.1 55.9 46.6 57.0 45.0 58.5 46.7
49.3 52.7 54.9 48.5 40.4 44.4 51.0 44.2
59.1 46.9 50.7 43.7 41.7 52.8 60.5 38.5
60.4 53.8 47.3 50.2 58.8 50.7
create file using data given and Obtain a Histogram for each dataset
31 The data below are the number of hours spent watching television per week for a
sample of 34 households.

23.1 15.9 21.0 26.0 25.1 14.7 24.2 16.6


18.2 16.5 20.7 15.3 17.7 19.1 22.7 21.9
14.6 26.3 25.8 9.4 17.0 21.2 17.9 24.7
21.1 17.2 19.1 22.7 24.0 24.7 22.5 8.3 2.5
30.4
Obtain a boxplot for each dataset.
32 Create file using following data
DM FOP ERP
120 120 140
130 150 140
160 140 160
140 150 160
190 220 260
Plot histogram reading file data.
33 In a study of job satisfaction, a series of tests was administered to 50 subjects. The
following data were obtained: higher scores represent greater dissatisfaction.
Construct a stem-and-leaf display for the data with creating .csv file.
87 67 92 41 90 76 58 59 50 75

80 70 69 88 85 81 73 61 46 97

50 81 75 65 77 47 87 60 92 71

70 53 61 84 70 74 43 89 83 46

84 78 69 78 74 76 64 76 67 64

34 Plot box plot for the following data.


8408 1374 1872 8879 2459 11413608 14138 6452 1850 2818 1356 10498 7478 4019
4341 739 2127
35 Create file using following data
SM FOC SS
233 159 316
285 138 319
277 152 139
315 235 243
250 344 238
Plot histogram reading file data.

36 b) Consider a Poisson distribution with μ = 3.


a. Compute f (2).
b. Compute f (1).
c. Compute P(x = 2).

37 Use ANOVA to analysis the data. Also write summary.


Plant-1: 29,27,30,27,28
Plant-2 : 32,33,31,34,30
Plant-3 : 25, 24,24,25,26
38 Create file using following data
DM FOP ERP
12 12 14
13 15 14
16 14 16
14 15 16
19 22 26
Plot histogram reading file data.
39 plot stem & leaf display for the following data.
152 159 154 142
144 150 142 148
155 154 160 155
144 162 162 157
145 146 143 156

40 If n = 5 and p = 0.2 find p(x=2) using binomial distribution.


41 a) If n = 5 and p = 0.5 find p(x=3) using binomial distribution.
b) Consider a Poisson distribution with μ = 2.5.
a. Compute f (3).
b. Compute f (2).
c. Compute P(x ≤ 2).
42 a) If n = 5 and p = 0.4 find p(x≤2) using binomial distribution
b) Consider a Poisson distribution with μ = 2.
a. Compute f (3).
b. Compute f (2).
c. Compute P(x≤3).
43 a) If n = 7 and p = 0.7 find p(x=3) using binomial distribution.
b) Consider a Poisson distribution with μ = 0.5.
a. Compute f (0).
b. Compute f (3).
c. Compute xP(x = 2).
44 a)If n = 10 and p = 0.6 find p(x=4) using binomial distribution.
b) Consider a Poisson distribution with μ = 1.5.
a. Compute f (3).
b. Compute f (0).
c. Compute P(x ≤4).
45 a) If n = 5 and p = 0.5 find p(x>=4) using binomial distribution
b) Consider a Poisson distribution with μ = 4.
a. Compute f (3).
b. Compute f (2).
c. Compute P(x ≤ 2).
46 A coin is given 10 independent tosses, each of which lands heads with probability a. Let
the random variable N denote the total number of heads obtained.
a. Given a and any value (or vector of values!) of x, how we can calculate
probability in R.
b. Set a = 0.3 (not a very fair coin!) and verify that the above equation and the R
command agree for x=2.
c. Generate the entire probability function of the random variable N
d. Use sum to verify that the sum of these probabilities is 1, and plot a bar chart to
show their distribution.
Calculate the mean and standard deviation of this distribution.
47 A coin is given 20 independent tosses, each of which lands heads with probability a. Let
the random variable N denote the total number of heads obtained.
a. Given a and any value (or vector of values!) of x, how we can calculate
probability in R.
b. Set a = 0.5 (very fair coin!) and verify that the above equation and the R
command agree for x=4.
c. Generate the entire probability function of the random variable N
d. Use sum to verify that the sum of these probabilities is 1, and plot a bar chart to
show their distribution.
Calculate the mean and standard deviation of this distribution.
48 a. Assume that the test scores of a college entrance exam fits a normal
distribution. Furthermore, the mean test score is 72, and the standard
deviation is 15.2. What is the percentage of students scoring 84 or more in the
exam?
b. Suppose there are twelve multiple choice questions in an English class quiz.
Each question has five possible answers, and only one of them is correct. Find
the probability of having four correct answers if a student attempts to answer
every question at random.
49 If there are twelve cars crossing a bridge per minute on average, find the probability of
having seventeen or more cars crossing the bridge in a particular minute.
50 a. Select ten random numbers between one and three.
b. Suppose a coin toss turns up 12 heads out of 20 trials. At .05 significance level,
can one reject the null hypothesis that the coin toss is fair?

51 Calculate one sample t –test for the following data.


11 10 9 10 11 11 10 11 10 10 8 10 13 7 10 11 10 7 15 12
52 The data below are the number of hours spent watching television per week for a
sample of 34 households.

23.1 15.9 21.0 26.0 25.1 14.7 24.2 16.6


18.2 16.5 20.7 15.3 17.7 19.1 22.7 21.9
14.6 26.3 25.8 9.4 17.0 21.2 17.9 24.7
21.1 17.2 19.1 22.7 24.0 24.7 22.5 8.3 2.5
30.4
find a 95% confidence interval for the population mean for each sample. (3)
53 Nine computer-components dealers in major metropolitan areas were asked for
their prices in $ on two similar colour inkjet printers. The results of this survey are
given below. At α = 0.05, is it reasonable to assert that, on average, the Apson
printer is less expensive than the Okaydata printer?
Apson 250 319 285 260 305 295 289 309 275

Okaydata 270 325 269 275 289 285 295 325 300

54 The data below are the number of hours between charging on a particular type of
mobile phone
45.8 41.1 55.9 46.6 57.0 45.0 58.5 46.7
49.3 52.7 54.9 48.5 40.4 44.4 51.0 44.2 59.1
46.9 50.7 43.7 41.7 52.8 60.5 38.5 60.4
53.8 47.3 50.2 58.8 50.7
find a 95% confidence interval for the population mean for each sample. (3)
55 The data below are the number of miles travelled to and from work each day by a
sample of 12 company employees.
3.7 14.3 11.0 26.5 5.2 4.8 24.2 16.9 8.2 26.5 40.7 5.3
find a 95% confidence interval for the population mean for each sample. (3)
56 The closing prices of 40 common stocks follow.
29.63 34.00 43.25 8.75 37.88 8.63 7.63 30.38 35.25 19.38
9.25 16.50 38.00 53.38 16.63 1.25 48.38 18.00 9.38 9.25
10.00 25.02 18.00 8.00 28.50 24.25 21.63 18.50 33.63 31.13
35.25 29.63 79.38 11.38 38.88 11.50 52.00 14.00 9.00 33.50
create file using data given and find a 95% confidence interval for the population mean
for each sample.
57 Assume the population standard deviation σ of the student height in survey is 9.48.
Find the margin of error and interval estimate at 95% confidence level.
18.5 18.0 19.5 20.5 18.0 13.3
58 Suppose the mean weight of King Penguins found in an Antarctic colony last year was
15.4 kg. In a sample of 35 penguins same time this year in the same colony, the mean
penguin weight is 14.6 kg. Assume the population standard deviation is 2.5 kg. At .05
significance level, can we reject the null hypothesis that the mean penguin weight does
not differ from last year? (6)
Write Interpertaion of the result. (4)

59 Use ANOVA to analysis the data.


Plant-1: 19,17,30,17,18
Plant-2 : 22,23,21,24,20
Plant-3 : 15, 14,14,15,16
60 Use ANOVA to analysis the data.
Plant-1: 23,24,27,31, 29,27,30,27,28
Plant-2 : 33,34,36,38,32,33,31,34,30
Plant-3 : 34,37,38,39,25, 24,24,25,26
61 Use ANOVA to analysis the data.
Plant-1: 24,27,34,35,37,38,39
Plant-2 : 24,26,32,33,31,34,30
Plant-3 :29,25, 25, 24,24,25,26
62 Use ANOVA to analysis the data.
Plant-1: 23 , 24 , 25, 26 , 28
Plant-2 : 40, 34, 23, 26, 28
Plant-3 : 28, 14, 25, 26, 29,
63 Create file using following data
DS FOP
126 124
137 154
168 144
144 153
198 112
Create .csv file and read that file to calculate regression coefficients.
64 Consider the results below that give data on highway traffic flow. It shows the
relationship between traffic flow, x (in thousands of vehicles per day) and lead content,
y (micrograms per gram) of the bark of nearby trees.

a. Enter these data into two columns in R.(2)


b. Plot the data with labelling the diagram appropriately. (2)
c. Find the regression line of y on x. This has the equation y= a + bx .print the
values a and b . (2)
d. Write down the equation of the best straight line. (2)
e. Find the residuals. (2)
65 The following data are the monthly salaries y and GPA x for the students who obtained
the bachelors’ degree in business administration.
GPA 2.6 3.4 3.6 3.2 3.5 2.9
Y 3300 3600 4000 3500 3900 3600

c. Enter these data into two columns in R. (2)


d. Plot the data with labelling the diagram appropriately. (2)
e. Find the regression line of y on x. This has the equation y= a + bx .print the
values a and b . (2)
f. Write down the equation of the best straight line. (2)
66 Find the 95th percentile of the Chi-Squared distribution with 7 degrees of freedom.
67 The following data are the monthly salaries y and GPA x for the students who obtained
the bachelors’ degree in business administration.
X 2 3 5 1 8
Y 25 25 20 30 16

g. Enter these data into two columns in R. (2)


h. Plot the data with labelling the diagram appropriately. (2)
i. Find the regression line of y on x. This has the equation y= a + bx .print the
values a and b . (2)
j. Write down the equation of the best straight line. (2)
k. Find the correlation coefficient. (2)
68 Suppose the following table represents the sales figures of the 3 new menu items in the
18 restaurants after a week of test marketing. At .05 level of significance, test whether
the mean sales volume for the 3 new menu items are all equal.
Copy and paste the sales figure above into a table file named "fastfood-1.txt" with a
text editor.
Item1 Item2 Item3
22 52 16
42 33 24
44 8 19
52 47 18
45 43 34
37 32 39

69 Find a point estimate of the female student proportion from survey


Sex Wr.Hnd NW.Hnd ...
Female 18.5 18.0 ...
Male 19.5 20.5 ...
Male 18.0 13.3 ...
Compute the margin of error and estimate interval for the female students proportion
in survey at 95% confidence level.
70 Assume out of 500 people 280 are rice eater and rest are wheat eater. Can we say the
proportion of rice and wheat eater are same at 95%?
71 Find the 95th percentile of the Chi-Squared distribution .
Gender Smoke
Male Yes
female Yes
Male Yes
Male no
female No
Male Yes
female No
Male No
female No
Male Yes

You might also like