Data Collection and Analysis Methods
Data Collection and Analysis Methods
Part A..................................................................................................................................1
Existing sources..............................................................................................................1
Statistical studies............................................................................................................2
The internet....................................................................................................................3
Quantitative....................................................................................................................3
Qualitative.......................................................................................................................4
Part B..................................................................................................................................6
Part C................................................................................................................................34
1. T-test.......................................................................................................................34
2. Regression model..................................................................................................37
Part A
Advantage Disadvantage
Statistical studies
Two main types of studies include: experimental and observational. Where
experimental research is a type of research where the researcher will base on a
definite variable and collect data on the factors affecting this definite variable in a
controlled environment. Studies are of the observational type, which is the study of
population variability. The researcher would observe spontaneously and not use
controls.
Advantage Disadvantage
Experimental study - Highest level of - Not likely to be
control accurate to real-life
- Provide specific situation
and relevant results - Difficult to
- Really good to replicate the result
identify problem, - Made by human so
cause, effect and there is high chance
connections be human flaw
The internet
With the rapid development of the Internet, this data source has become extremely
popular and popular. Today corporations and companies from small to large, both public
and limited companies have their own websites to publicize information. In addition,
government agencies also post information resources on their own websites.
Advantage Disadvangte
Quantitative
a. Questionnarires and surveys methods
In these methods, the researcher creates his own, often closed-ended questions and
leads to observations and information gathering through answers. The main
advantage of this method is that it is cheap and saves time. Thanks to the ubiquity
of the internet, creating pre-sampled questionnaires and distributing them became
very easy, fast, and almost at no cost. So it is very convenient for surveys with a
large number of observations and to select observations at random. However, the
answers and scoring will have to be based on a certain scale, so it is difficult for
the respondents to give the most accurate feedback. The responses collected are
also often not so detailed and complete compared to methods such as interviews.
Finally, respondents often refuse to answer questionnaires like these because of
the time they spend and get nothing in return
Qualitative
a. Interviews
The researchers gathered information through interviews by asking open-ended
questions. The strength of this approach is that it helps researchers dig into the
details and dig into certain areas of interest. However, the weakness of this method
requires the researcher to have soft skills because the interview is a conversation
between people. In addition, organizing an interview is also more expensive in
terms of both time and cash. To minimize these disadvantages, researchers
nowadays often use online interviews to save time and travel costs.
b. Observation
Observation is a form of information collection based solely on the researcher and
not through questions. The use of observation is only used where the use of other
methods of information collection is too difficult to measure and too complex.
Because the information collected through this method is very subjective because
the researcher will apply his own judgment to many observations. In a few cases
the probability of deviation is not too significant.
c. Focus group
This method is a combination of questioning, surveying and observation.
Researchers use this tool to collect data from a group of people who have
something in common. By collecting data from individual to a common problem.
This is a fairly comprehensive method as it helps the researcher to get a general
understanding of the problem and can ask questions or conduct interviews to dig
deeper. However, gathering a large group of people for interviews takes time and
money, which is the biggest weakness of this method.
Data analysis methods
Data analysis method was divided into two main types: Descriptive statistic and
Inferential statistic. This two methods are adopted and turn raw data into
meaningful and specify information into statistical knowledge.
Both descriptive and inferential statistics methods have in common the same analysis of
data for the whole population. (Laerd, 2019) However descriptive statistics are used by
selecting a small group from the general data that the researcher wants to describe. Then
measure the data in this group and compare with the statistical overview. Inferential
statistics is the technique of analyzing a small sample in a review and then making
inferences and interpretations for the whole population. (Frost, 2018)
Part B
I. Summary for qualitative data
Std.
N Minimum Maximum Mean Deviation
Number of 501 0 4 1.12 .496
bedrooms
Valid N (listwise) 501
Frequency Percent Valid Percent Cumulative Percent
Valid 0 24 4.8 4.8 4.8
1 403 80.4 80.4 85.2
2 65 13.0 13.0 98.2
3 8 1.6 1.6 99.8
4 1 0.2 0.2 100.0
Total 501 100.0 100.0
The number of bedrooms data is quantitative, but since there are so few values and the
valids are all real and determinable, I treat it as a quantitative variable.
Out of the total of 501 rooms, there is only 1 room with 4 bedrooms, accounting for
0.2%. There are 403 rooms with 1 bedroom, the highest percentage is 80.4%. 13% of the
rooms have 2 bedrooms with a total of 65 rooms. 24 of these rooms have no bedrooms,
accounting for 4.8% of the total number of rooms. Finally, the number of rooms with 3
bedrooms, including 8 rooms, accounts for 1.6%.
Full price of accommodation for two people and two nights in EUR
Std.
N Minimum Maximum Mean Deviation
Full price of accommodation for 501 77.38 12886.24 386.1338 702.33537
two people and two nights in EUR
Valid N (listwise) 501
The average price of a room for two people for two nights in EUR is 386,133 euros. The
lowest price is 77.38 euros and the highest is 12886.24. The price range is very large with
the standard deviation up to 702.33. Looking at the chart, it can be seen that most room
rates fluctuate between 0-2500. The big price spikes account for a very small percentage
and are the outlines.
Cleanliness rating
N Minimum Maximum Mean Std. Deviation
Cleanliness rating 501 2 10 9.03 1.177
Valid N (listwise) 501
Overall rating
N Minimum Maximum Mean Std. Deviation
Overall rating 501 20 100 88.69 10.813
Valid N (listwise) 501
The average rating of the overall rating is 88.69 with the lowest rating of 20 and the
highest rating of 100. The standard deviation of the overall rating is quite high at 10,813.
The reason for this high index can be seen in the chart when most ratings are in the range
of 60-100, but ratings outside this range can be considered as outliners.
Attraction index
N Minimum Maximum Mean Std. Deviation
Attraction index 501 7.610 76.335 21.9449 9.488960
0
Valid N (listwise) 501
The average attraction index for this dataset is 21.94. With a minimum of 7,610 and a
maximum of 76,335. The standard deviation of 9,488 is quite high because most of the
attraction index is in the range of 0-50.
Restaurant index
N Minimum Maximum Mean Std. Deviation
Restaurant index 501 4.171 38.666 11.5161 4.489105
1
Valid N (listwise) 501
The average restaurant index reached 11,516. The lowest index of 4,171 is 34.5 points
less than the room with the highest restaurant index of 38,666 points. The standard
deviation is 4.48. Looking at the chart, it can be seen that most of the restaurant indexes
of the rooms fluctuate in the range of 0-25 points.
III. Price and quantitative data
1. Number of bed room
Full price of accommodation for
Number of two people and two nights in
bedrooms EUR
Number of Pearson 1 0.221**
bedrooms Correlation
Sig. (2-tailed) 0.000
N 501 501
Full price of Pearson 0.221** 1
accommodation for Correlation
two people and Sig. (2-tailed) 0.000
two nights in EUR N 501 501
**. Correlation is significant at the 0.01 level (2-tailed).
The pearson correlation between price and number of bedrooms rating is 0.221 which is
almost zero and the sig index of 0.00 is quite far from 0.5 (Statistical significance). This
proves that the correlation between these two variables is very low. However, the scatter plot
shows that a high number of bedroooms has no effect on price.
2. Cleanliness rating
Full price of
accommodat
ion for two
people and
two nights
in EUR Cleanliness rating
Full price of Pearson 1 -0.011
accommodation for Correlation
two people and two Sig. (2-tailed) 0.809
nights in EUR N 501 501
Cleanliness rating Pearson -0.011 1
Correlation
Sig. (2-tailed) 0.809
N 501 501
The pearson correlation between price and cleanliness rating is -0.11 which is almost zero
and the sig index of 0.809 is quite far from 0.5 (Statistical significance). This proves that
the correlation between these two variables is very low. However, the scatter plot shows
that a high cleanliness rating has a positive effect on room rates but not strong.
3. Overall rating
Full price of
accommodat
ion for two
people and
two nights Overall
in EUR rating
Full price of Pearson 1 0.011
accommodation for Correlation
two people and two Sig. (2-tailed) 0.814
nights in EUR N 501 501
Overall rating Pearson 0.011 1
Correlation
Sig. (2-tailed) 0.814
N 501 501
The correlation coefficient of the room rate for 2 people for two nights in Eu for the
overall rating of 0.011 is almost zero, and the sig coefficient 0.814 is quite far from 0.5
(Statistical significance). It can be concluded that the change in overall rating has a very
small impact on the price, the chart shows the same thing.
Full price of
accommodat Distance
ion for two from nearest
people and metro
two nights station in
in EUR km
Full price of Pearson 1 -0.095*
accommodation for Correlation
two people and two Sig. (2-tailed) 0.034
nights in EUR N 501 501
Distance from nearest Pearson -0.095* 1
metro station in km Correlation
Sig. (2-tailed) 0.034
N 501 501
The correlation coefficient of -0.95 is pretty close to -1 between the two price variables
and the distance to the nearest metro station. This proves that the distance to the nearest
metro station has a strong negative impact on the price. The scatter plot does not see this
clearly because some of the variables are too high. But the value of sig is 0.034 which is
very low so it is not statistically significant
5. Attraction index
Full price of
accommodat
ion for two
people and
two nights Attraction
in EUR index
Full price of Pearson 1 0.182**
accommodation for Correlation
two people and two Sig. (2-tailed) 0.000
nights in EUR N 501 501
Attraction index Pearson 0.182** 1
Correlation
Sig. (2-tailed) 0.000
N 501 501
The correlation value of 0.182 is almost zero so it shows that the attraction index effect
has no effect on the price and the scatter plot also shows this. However, the sig value is 0,
so it is not statistically significant.
6. Distance from city centre in km
Full price of
accommodat
ion for two
people and Distance
two nights from city
in EUR centre in km
Full price of Pearson 1 -0.119**
accommodation for Correlation
two people and two Sig. (2-tailed) 0.008
nights in EUR N 501 501
Distance from city Pearson -0.119** 1
centre in km Correlation
Sig. (2-tailed) 0.008
N 501 501
The correlations between the price of the room and the distance to the city center is -
0.119 which is very low, which shows that there is almost no relationship between the
price and the distance to the city center. However, the sig coefficient of 0.008 is almost
zero, indicating no statistical value. Looking at the scatter plot we also see that there is no
effect between the distance to the city center and the room rate.
7. Restaurant index
Full price of
accommodat
ion for two
people and
two nights Restaurant
in EUR index
Full price of Pearson 1 0.154**
accommodation for Correlation
two people and two Sig. (2-tailed) 0.001
nights in EUR N 501 501
Restaurant index Pearson 0.154** 1
Correlation
Sig. (2-tailed) 0.001
N 501 501
Finally, the price and restaurant index with the pearson corelation value of 0.154, very
low, shows that there is almost no relationship between these two variables. However, a
Sig value of 0.001 shows no statistical value, the scatter plot shows no relationship
between the two variables.
IV. Price and qualitative data
Full price of
accommodat
ion for two
people and
two nights
in EUR
Type of the Entire Mean 609.82
accommodation home/apt Standard 605.55
Deviation
Private room Mean 265.32
Standard 727.63
Deviation
Shared room Mean 175.45
Standard 27.60
Deviation
The average price of entire home is 609.82 euros which is 400 euros higher than the
private room, and the shared room has the lowest average price of 175.45. So it can be
seen that the type of room has an effect on the price. The standard deviation of entire
home and private is very high at 605 and 727 respectively which means that the price of
this two type are much higher than the average price. Shared rooms with a standard
deviation of 27.6 have price near the average price. It can also be concluded that it is
more likely that the entire home and private room have multiple outliners and the price
fluctuation of this room type is very high.
Full price of
accommodat
ion for two
people and
two nights
in EUR
Whether the room is Not Mean 388.26
shared or not shared Standard 705.55
Deviation
Shared Mean 175.45
Standard 27.60
Deviation
Not shared room has an average price of 388.26 compared to the average price of 175 of
shared room can see whether the room is shared or not can affect the price. However, the
standard deviation of not shared room is very high, so the price fluctuation of this room
type is very high and it may be because outliners should be much higher than the average
price. The price of the shared room has a standard deviation of 27.6 which is closer to the
mean price.
Full price of
accommodat
ion for two
people and
two nights
in EUR
Whether the room is Not Mean 597.89
private or not private Standard 601.37
Deviation
Private Mean 265.32
Standard 727.63
Deviation
The average price of not private room is 597.89 which is higher than that of private room
with the average price of 265.32. So there is a chance a room is private or not affect the
price. However, the fluctuation range of both types is very high at 601.37 and 727.63
respectively, showing that the price volatility is very large, there is also a lot of possibility
of outliners.
The average price of a room without a superhost of 380 is lower than the average price of
a superhost room of 418. There is a possibility of having a superhost or not affecting the
price. However, the standard deviations of these two variables are very high at 679.77
and 816.46, respectively. This means price swings are very large and there are likely to be
outliners.
Full price of
accommodat
ion for two
people and
two nights
in EUR
Whether the listing is Not multiple Mean 390.50
for multiple rooms or rooms Standard 779.48
not Deviation
Multiple rooms Mean 371.31
Standard 326.89
Deviation
Multiple room and not multiple room have average prices that are not too different at 390
and 326, respectively. So it is unlikely that the price will be affected. However, the
standard deviation of both variables is very large at 779.48 and 326.89, respectively, so it
can be seen that the price fluctuation is very large and most likely due to outliners.
Full price of
accommodat
ion for two
people and
two nights
in EUR
Whether the listing is Not for business Mean 423.17
for business purposes purposes Standard 944.88
or not Deviation
For business purposes Mean 348.35
Standard 291.24
Deviation
Non-business rooms have an average price of 432.17 more than business room rates by
an average of 348.35. The standard deviation of these two variables is very high at 944.88
and 291.24 respectively, indicating very large price volatility and possibly due to
outliners.
Part C
1. T-test
Group Statistics
Whether the listing is
for business purposes Std. Std. Error
or not N Mean Deviation Mean
Full price of Not for business 253 423.171 944.87984 59.40409
accommodation for purposes
two people and two For business purposes 248 348.349 291.24384 18.49400
nights in EUR
Independent Samples Test
Levene's
Test for
Equality
t-test for Equality of Means
of
Variance
s
95%
Sig. Std. Confidence
Mean
Sig (2- Error Interval of the
F t df Differen
. taile Differen Difference
ce
d) ce Low
Upper
er
Equal
varianc -
1.80 .18 1.1 0.23
Full price of es 499 74.822 62.73 48.4 198.07
3 0 9 4
accommodati assume 2
on for two d
people and Equal
two nights in varianc -
1.2 300.3
EUR es not .230 74.822 62.21 47.6 197.25
0 38
assume 1
d
The median non-business room rate of 423.71 was higher than the average business room
rate of 348.24. The standard deviations of both variables are very high at 944.87 and
291.24.
Because Sig.(F) =0.180 > α = 0.05
Ho is accepted
Equal variances assumed
Application of T-test:
H 0 : µ1 = µ 2
H 1 : µ1 ≠ µ 2
H 1 : µ1 ≠ µ 2
The median private room rate of 265.321 was lower than the average price of not private
room with the rate of 597.887. The standard deviations of both variables are quite similar
40.73932 and 44.57660.
Because Sig.(F) =0.059 > α = 0.05
Ho is accepted
Equal variances assumed
Application of T-test:
H 0 : µ1 = µ 2
H 1 : µ1 ≠ µ 2
H 1 : µ1 ≠ µ 2
2. Regression model
Model Summary
Mode R Adjusted R Std. Error of
l R Square Square the Estimate
1 0.316a 0.100 0.085 671.76140
R Square= 0.1, so 10% of the price affected by all the quantitative data below.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -1163.814 441.555 -2.636 0.009
Maximum number of 69.864 32.049 0.122 2.180 0.030
guests that can stay in
the room
Cleanliness rating -38.014 40.821 -0.064 -0.931 0.352
Overall rating 4.307 4.502 0.066 0.957 0.339
Number of bedrooms 191.143 78.159 0.135 2.446 0.015
Distance from city 106.258 38.940 0.297 2.729 0.007
centre in km
Distance from nearest -76.680 57.201 -0.068 -1.341 0.181
metro station in km
Attraction index 25.966 8.199 0.351 3.167 0.002
Restaurant index 7.636 15.900 0.049 0.480 0.631
B2: Cleanliness rating increases by 1* the average room rate will decrease by 38,014
euros
B3: Overall rating increased by 1* average room rate increased by 4,307 euros
B4: The number of bedrooms in a residence increases by 1, the average room price
increases by 191,143 euros
B5: The distance from the city center increases by 1km, the average room price increases
by 106,258 euros
B6: The distance to the nearest metro increases by 1km, the average room price decreases
by 76,680 euros
B7: Attraction index increases by 1, the average room rate increases by 25,966 euros
B8: Restaurant index increases by 1, the average room rate increases by 7,636 euros
The maximum number of people in a room: Sig = 0.030 < 0.05 => Unaccepted H0 =>
Significant
Cleanliness rating: Sig = 0.352 > 0.05 => Accepted H0 => Insignificant
Overall rating: Sig = 0.339 > 0.05 => Accepted H0 => Insignificant
The number of bedrooms: Sig = 0.015 < 0.05 => Unaccepted H0 => Significant
The distance from the city center: Sig = 0.007 < 0.05 => Unaccepted H0 => Significant
The distance to the nearest metro: Sig = 0.181 > 0.05 => Accepted H0 => Insignificant
Attraction index: Sig = 0.002 < 0.05 => Unaccepted H0 => Significant
Restaurant index: Sig = 0.631 > 0.05 => Accepted H0 => Insignificant
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -1034.529 324.138 -3.192 .002
Maximum number of guests 67.680 31.732 .118 2.133 .033
that can stay in the room
Number of bedrooms 191.087 77.243 .135 2.474 .014
Distance from city centre in 87.617 35.076 .245 2.498 .013
km
Attraction index 27.770 7.220 .375 3.846 .000
a. Dependent Variable: Full price of accommodation for two people and two nights in EUR