0% found this document useful (0 votes)
103 views

BUSI2045 Midterm Questions 2024 Spring

This document contains a midterm test question paper for the course BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING. It has two parts - multiple choice questions worth 32% and empirical questions worth 68%. For the multiple choice section, there are 16 questions testing concepts related to data visualization, sampling, distributions, and basic R operations. The empirical questions section involves loading datasets, conducting descriptive analysis, creating visualizations, and answering questions based on the analysis. Submission requires including name, ID, R code, and results in one Moodle file upload.

Uploaded by

rinniechan630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

BUSI2045 Midterm Questions 2024 Spring

This document contains a midterm test question paper for the course BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING. It has two parts - multiple choice questions worth 32% and empirical questions worth 68%. For the multiple choice section, there are 16 questions testing concepts related to data visualization, sampling, distributions, and basic R operations. The empirical questions section involves loading datasets, conducting descriptive analysis, creating visualizations, and answering questions based on the analysis. Submission requires including name, ID, R code, and results in one Moodle file upload.

Uploaded by

rinniechan630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Midterm Test Question Paper

1. This is a 3-hour open-book test, which accounts for 20% of your final score.
2. Total Score is 100.
1. Part 1 Multiple Choices (32%),
2. Part II Empirical Questions (68%).
3. Submission format.
1. Include your name and ID in the first line of your answer sheet.
2. You can upload only one file via Moodle submission link.
3. You need to report both R codes and results in the answer sheet.

Part I: Multiple Choice Questions (32 points)


There are 16 questions in total, there is only one correct answer for each question. Please organize your answer
one by one with both question number and your answer correspondingly.

Q1. Which of the following plot is used to test whether a variable is normally distributed?
A. Pie chart
B. Error bars
C. Box plot
D. QQ plot

Q2. If the median value for a variable is larger than its mean, and its mode value is larger than its median,
then the distribution of values of this variable tends to be
A. Negatively skewed
B. Positively skewed
C. Symmetrically distributed
D. None of the above

1
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Q3. Which of the below terms refers to the procedure of random sampling with replacement to create
multiple re-samples from a sample data?
A. Central Limit Theorem
B. Random Sampling
C. Selection Bias
D. Bootstrapping

Q4. Which of the following statements is NOT correct?


A. The distribution of sample means tends to resemble a bell-shaped normal curve, if we draw multiple
samples (of the same size) from a population repeatedly.
B. Standard deviation measures the variability of individual data points in a sample, while standard error
measures the variability of a sample statistic (e.g., mean) from multiple samples.
C. Standard error would be a good estimate of the standard deviation of the population.
D. The distribution of sample means would be more normally distributed when sample size gets larger.

Q5. Given x1 = 1:4, x2 = 5:8 and x3 = 9:12, you want to create a matrix with 4 rows and 3 columns named
m1 by combining the three vectors. Which of the following statements is correct?
A. You can create m1 by m1 = rbind(x1, x2, x3).
B. You can create m1 by m1 = cbind(x1, x2, x3).
C. You can create m1 by m1 = matrix(x1, x2, x3).
D. The output of length(m1) is 4.

Q6. Which of the following codes will produce FALSE as the output?
A. is.numeric(1:5)
B. is.numeric(c(1,2,3))
C. is.numeric('123')
D. none of the above

2
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Q7. Which of the following scenarios fulfils the principle of random sampling?
A. Estimating the average GPA of HKBU students with only BUSI2045 students in the sample.
B. Estimating the average income of all Hong Kong workers with only doctors included in the sample.
C. Estimating the average housing price in Hong Kong with the houses in the Hong Kong Island.
D. None of the above.

Q8. If you would like to produce a scatter plot and add a small amount of random variation to the location of
each point, which of the following function should you use?
A. geom_point()
B. geom_jitter()
C. geom_boxplot()
D. geom_pointrange()

Please answer Q9 to Q11 with below data frame df.

3
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Q9. Which of the following statements can produce a plot like the below?

A. ggplot(df, aes(period, sales)) + geom_point()


B. ggplot(df, aes(period, sales, colour = "red")) + geom_point()
C. ggplot(mpg, aes(period, sales)) +geom_point(colour = "red")
D. ggplot(mpg, aes(period, sales)) +geom_point(aes(colour = "red"))

Q10. Which of the following statements can produce a plot as below?

A. ggplot(df, aes(period, sales)) + geom_line()


B. ggplot(df, aes(period, sales)) + geom_path()
C. ggplot(df, aes(period, sales)) + geom_smooth()
D. ggplot(df, aes(period, sales)) + geom_area()

4
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Q11. What is the output for the code IQR(df$sales)?


A. 16
B. 16.25
C. 16.6
D. 17

Load the iris data from package datasets, which have been preinstalled in Base R, and answer Q12 -14
accordingly. (Hint: you may simply run the code data(iris) to load the data)

Q12. What is the median value for the variable Petal.Length?


A. 3.66
B. 3.76
C. 3.81
D. 4.35

Q13. Create a subset of the iris data in which Sepal.Width values are larger than 2.5. For the variable
Species, how many times the value ‘setosa’ appears in this subset?
A. 41
B. 49
C. 50
D. 139

Q14. Create another subset of the iris data in which Petal.Width values are larger than 1. How many unique
Species values are there in this subset?
A. 3
B. 2
C. 1
D. 0

5
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Load the data set Duncan from the package carData, and answer Q15 - 16. This dataset records information on
the prestige and other characteristics of 45 U.S. occupations in 1950, based on a social survey data. Occupation
names were set as row names.
(Hint: you may need to install and load the package carData before loading its data Duncan into R)

Q15. The variable prestige records the percentage of respondents who rated the occupation as “good” or
better in prestige. What is the max prestige value and which occupation receives the highest prestige?
A. 97, physician
B. 17, engineer
C. 17, professor
D. 3, shoe.shiner

Q16. The variable type records the type of occupation, with “prof” representing professional and managerial,
“wc” representing white-collar, and “bc” representing blue-collar. Which occupation type occurs most often
in this dataset?
A. Professional and managerial
B. Blue-collar
C. White-collar
D. none of the above

6
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Part II Empirical Questions (68 Points)

Question 1 Data Processing and Description (20 Points)


Read the file simulated_data.csv into R and answer the following questions.

(a) How many variables and observations are there? What are the data types for these variables?
(b) Find the mean, the 0.25 and 0.75 quantiles of the variable alpha.
(c) Construct a frequency table of the variable delta as below.

First Second Third

? ? ?

Create a subset named subset1 in which variable beta contains no missing value, answer below questions.
(d) How many observations are there in subset1 ?
(e) Find the mean, the 0.25 and 0.75 quantiles of variable alpha in subset1.
(f) With subset1, visualize the average gamma value in each delta level. Your output should look similar
as the below graph. (Hints: pay attention to the axis labels, legend title, and plot title)

7
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Question 2 Data Description and Visualization (24 Points)


Read the file marketing_campaign.csv into R and answer the following questions. The dataset records 1560
customers’ different information, each row represents one customer.

(a) What are the unique values in the variable Marital_Status?

(b) Create a two-way table to show the number of customers separated by Education levels and
Marital_Status. How many customers are both “married” and with a master’s degree?

(c) Display the number of customers across different Marital_Status and Education levels with a bar plot.
The plot should look like the below. (Hints: pay attention to plot title, legend title, and axis labels)

(d) Create a subset named subset2 in which the “Divorced” people are excluded (variable
Marital_Status). How many customers are there in the subset?

(e) With subset2 , visualize the distribution of variable Income across different Marital_Status and
Education with a boxplot. Your result should look like the below. (Hints: Pay attention to the title, x-
axis, and y-axis labels)

8
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

(f) Based on the boxplot in step (e), answer the following two questions:
i. Which education level tends to have lower income in general? Explain your answer.
ii. Is the income of the customers with education level ‘Graduation’ higher after getting married in
general? Explain your answer.

9
BUSI 2045 DATA ANALYTICS FOR BUSINESS DECISION MAKING

Question 3 Data and Sampling Distribution (20 Points)


Run the following codes to generate a random sample (named 𝑿) of 400 values from a normal distribution
with mean as 172, standard deviation as 10. (Note: set the seed as 2024)

set.seed(2024)
X <- rnorm(400, mean=172, sd=10)

(a) Visualize values in X with a density plot and mark their mean with a red vertical line. You result should
look like the below. (Hint: you may need to convert the vector X as a data frame before plotting)

(b) Assume 𝑿 is a random sample representing the height of all residents in Hong Kong. If we collect
multiple random samples, each with the same sample size (i.e., 𝑛 = 400), from the Hong Kong
population, will these sample means be normally distributed? Why?

(c) Calculate the standard error (of the mean) with the mathematical approximation based on sample standard
deviation and sample size. What does the standard error measure?

(d) Calculate the 95% confidence interval (of the mean) via bootstrapping with 5000 resamples. What does it
tell us? (Note: set the random seed as 2024)

10

You might also like