0% found this document useful (0 votes)
12 views

2023 Khan Thelwall Kousha Data Sharing and Reuse Disciplinary Differences and Improvements

data sharing and reuse disciplinary differences and improvements

Uploaded by

gonzatorte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

2023 Khan Thelwall Kousha Data Sharing and Reuse Disciplinary Differences and Improvements

data sharing and reuse disciplinary differences and improvements

Uploaded by

gonzatorte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Online Information Review

Data sharing and reuse practices: Disciplinary differences


and improvements needed
On

Journal: Online Information Review


lin
Manuscript ID OIR-08-2021-0423.R2

Manuscript Type: Research Paper


eI
Keywords: Data sharing, Data reuse, cross-sectional survey, Disciplinary difference
nf
or
m
at
ion
Re
vie
w
Page 1 of 38 Online Information Review

1
2
3
4
Data sharing and reuse practices: Disciplinary differences and improvements
5 needed
6
7
8 Abstract
9
Purpose
10
11
This study investigates differences and commonalities in data production, sharing and reuse across the
12 widest range of disciplines yet, and identifies types of improvements needed to promote data sharing and
reuse.
On
13
14
15 Design
16 The first authors of randomly selected publications from 2018 and 2019 in 20 Scopus disciplines were
lin
17 surveyed for their beliefs and experiences about data sharing and reuse.
18
19 Findings
eI
20 From the 3,257 survey responses, data sharing and reuse are still increasing but not ubiquitous in any subject
21 area and are more common among experienced researchers. Researchers with previous data reuse
22 experience were more likely to share data than others. Types of data produced and systematic online data
nf
23 sharing varied substantially between subject areas. Although the use of institutional and journal-supported
24 repositories for sharing data is increasing, personal websites are still frequently used. Combining multiple
25 existing datasets to answer new research questions was the most common use. Proper documentation,
or

26 openness, and information on the usability of data continue to be important when searching for existing
27
datasets. However, researchers in most disciplines struggled to find datasets to reuse. Researcher feedback
28
m

suggested 23 recommendations to promote data sharing and reuse, including improved data access and
29
30
usability, formal data citations, new search features, and cultural and policy-related disciplinary changes to
increase awareness and acceptance.
at

31
32
33 Originality
ion

34 This study is the first to explore data sharing and reuse practices across the full range of academic discipline
35 types. It expands and updates previous data sharing surveys and suggests new areas of improvement in
36 terms of policy, guidance, and training programs.
37
38
Re

39 Keywords
40 Data sharing, data reuse, cross-sectional survey.
41
vie

42
43
Introduction
44 Collecting and producing new data is an integral part of research in many disciplines and a good
45 dataset can even count as a standard research output in the UK (REF, 2019). Subsequently sharing
w

46 research data in a findable, accessible, and interoperable format (Wilkinson et al., 2016) supports
47 reproducibility, efficiency, collaboration and interdisciplinarity (Borgman et al., 2019). Sharing
48
49
research data also confers a citation advantage (Piwowar et al., 2007; Henneken & Accomazzi,
50 2011; Colavizza et al., 2020). Nevertheless, there are powerful reasons for not sharing data, such
51 as the time needed to do it effectively and the perception or reality that shared data are rarely reused
52 (Bezuidenhout, 2019; Hansson & Dahlgren, 2022).
53
54 Data sharing is increasingly mandated by funders (Kiley et al., 2017), but not always by journals
55
(Wiley, 2018), allowing many researchers to avoid it, unless encouraged by organisational or other
56
57
58
59
60
Online Information Review Page 2 of 38

1
2
3 factors (Mason et al., 2020). Although accreditation can be an incentive for data sharing (Dorta-
4
5
González et al., 2021), few studies have asked researchers what else would incentivise data sharing
6 and promote data reuse (Whitty et al., 2015; Rowhani-Farid et al., 2017; Devriendt et al., 2021).
7 Moreover, current suggestions sometimes have limited scope. For example, Whitty et al. (2015)
8 suggested that journal editors can play a key role in incentivising data sharing in a public health
9 emergency by only publishing data-driven research when the data had already been shared in a
10 timely fashion with relevant authorities. Because of the incomplete uptake of data sharing, it is
11
important to understand the enablers and barriers to data sharing and reuse in different disciplines.
12
This will help stakeholders and policy makers design effective, and possibly tailored, interventions
On
13
14 to increase data sharing when and where relevant.
15
16 Whilst there have been many studies of data sharing and reuse in narrow contexts, the lack of
lin
17 substantial science-wide investigations is a problem because of likely sharp disciplinary
18 differences. Previous studies are difficult to generalise because of their focus on one or a small
19
number of disciplines (Piwowar, 2011; Wallis et al., 2013; Federer et al., 2015; Faniel & Yakel,
eI
20
21 2017; Zenk-Möltgen et al., 2018; Sardanelli et al., 2018) or specific data repositories (Bishop &
22 Kuula-Luumi, 2017; Coady et al. 2017; Borgman et al., 2019). In the few surveys of multiple
disciplines, ad-hoc participant recruitment via email and social media have led to few responses
nf
23
24 from disciplines where data sharing is apparently less common, such as Business and Economics,
25 Arts and Humanities (Tenopir et al., 2015), obscuring the general picture. Although the Kim and
or

26 Stanton’s (2016) large-scale survey did not have this problem, it was limited to Science,
27
28
Technology, Engineering, and Mathematics (STEM). In contrast, secondary analyses of previously
m

29 collected questionnaires (Sayogo & Pardo, 2013; Curty et al., 2017; Kim et al., 2018) have
30 struggled to give timely findings. Moreover, the data sharing environment is evolving rapidly as
funder mandates take hold and journal data sharing requirements increase in some fields.
at

31
32 Understanding disciplinary differences and updating prior surveys are therefore important to
33 develop new national, international, and disciplinary research policies.
ion

34
35
36
This study addresses the above gaps by comparing data production, sharing and reuse practices
37 across 20 disciplines, including understudied research areas in Arts and Humanities, and Business
38 and Economics, driven by the following research questions.
Re

39
40 1. How do types and formats of data produced by researchers differ between disciplines?
41 2. How do researchers share data on the web? Does data sharing differ between disciplines and
vie

42
research experience?
43
44 3. How do researchers find repositories to share data and what factors influence their choice of
45 repositories?
w

46 4. How frequently do researchers reuse existing data in different disciplines and for which
47 purposes? How does it compare to data sharing in those disciplines?
48 5. How do researchers find datasets to reuse? Which factors are considered important when
49 searching for existing datasets? How easy is it to find relevant datasets for reuse?
50
51
6. What can be improved in current systems to encourage and promote data sharing and reuse?
52
53
54
55
56
57
58
59
60
Page 3 of 38 Online Information Review

1
2
3 Background
4
5 More than three decades ago, Ceci (1988) proposed a scheme for mandatory data sharing between
6 social scientists. In two surveys on the issue, most respondents (87%) were willing to share data,
7 but 59% claimed that their colleagues were not, even for funded research. Today, most research
8 data are produced in digital format, with infrastructures and standards available to support sharing
9 of these data. Rapidly increasing numbers of data repositories now allow for effective curation,
10
11
storage, and long-term access to data (Pampel et al., 2013). Nevertheless, disciplinary cultures,
12 sizes and data types affect how and whether researchers share their data (Bell et al., 2009; Tenopir
et al., 2015; Faniel & Yakel, 2017). For example, scholars in qualitative research fields are less
On
13
14 prepared to openly share research data than those in data intensive fields (Mozersky et al., 2020).
15 Previous studies suggest that journal mandates, disciplinary norms, perceived career benefits and
16 scholarly altruism are all important for data sharing (Kim & Stanton, 2016; Zenk-Möltgen et al.,
lin
17
2018). In comparison, perceived effort, trust in colleagues and a lack of incentives can all
18
19
undermine it (Piwowar, 2011; Wallis et al., 2013; Sayogo & Pardo, 2013; Fecher et al., 2015).
eI
20
21 In the last decade, sharing data directly with other researchers (Federer et al., 2015) or through
22 personal data storage have been common, with only 11.3% using institutional repositories, 9.5%
nf
23 using disciplinary repositories, and 2.4% using publisher-related repositories (Tenopir et al., 2015).
24 There have been significant differences between disciplines in terms of disciplinary repository
25
usage. It has been more likely in Ecology (44.6%) than in Physical Sciences (7.1%) and Social
or

26
27 Sciences (10.8%) (Tenopir et al., 2015). These numbers have subsequently increased slightly, but
28 personal storage remained commonly used for sharing (Tenopir et al., 2020). A recent study
m

29 suggests that standard data repositories can be rarely used even in research fields with relatively
30 strong data sharing norms, such as genomics (Thelwall et al., 2020). Even when data sharing
at

31 statements are included in journal articles, repository links to meaningfully access the data are
32 often missing (Federer et al., 2018). More comprehensive disciplinary information about
33
repository uptake is therefore needed as a key step to long term sustainable data sharing.
ion

34
35
36 When datasets are made easily accessible, researchers are generally willing to reuse others’ data
37 (Wallis et al., 2013; Bishop & Kuula-Luumi, 2017; Tenopir et al., 2020). However, where and
38 how researchers find datasets to reuse are less known. The social media and web recruitment
Re

39 survey of Kratz and Strasser (2015) suggested that scholars use multiple search strategies to find
40
data, including checking article references, searching discipline-specific databases, and using a
41
general-purpose search engine. Their study did not report disciplinary differences, however. In a
vie

42
43 recent survey (using multiple online recruitment methods), 52% of 728 respondents self-reported
44 reusing others’ research data, but difficulties in finding appropriate data were common
45 (Hrynaszkiewicz et al., 2021). Since finding data is critical to reusability, it is important to
w

46 understand disciplinary differences in the core issues. Furthermore, there is a lack of understanding
47 about what type of data are frequently reused across different disciplines and for what purposes.
48
49
50
51 Method
52 Consulting active researchers is the most direct method to get insights into how data are currently
53 shared and reused. A survey is the only practical way to get large-scale evidence of attitudes and
54 practices of data sharing across many different disciplines, and allows comparisons with previous
55
studies (Fink, 2003). Two key assumptions are that respondents are representative of the
56
57
58
59
60
Online Information Review Page 4 of 38

1
2
3 population of researchers and that they are able to accurately describe their experiences. To get a
4
5
wide range of perspectives, the survey was international and targeted all career stages.
6
7 Questionnaire design
8 Fifteen questions (supplement 1) were designed to address the six research questions above,
9 informed by the existing literature and previous surveys (Kratz & Strasser, 2015; Tenopir et al.,
10 2015), but with some new questions for important omissions. An open-ended question was
11
included on the type of data produced by researchers, since specific subject knowledge would be
12
required to design a comprehensive list of options. Questions about data sharing methods were
On
13
14 adapted from Kratz and Strasser (2015) and Tenopir et al. (2015) with additional questions on how
15 researchers find repositories to share data and the factors that influence their choice of repositories.
16 Questions about data reuse purposes used the typology of Pasquetto et al. (2019). A multiple-
lin
17 choice question on how researchers find datasets to reuse was adapted from Kratz and Strasser
18 (2015), with additional questions on important factors when searching for existing datasets and
19
ease of finding datasets to reuse. Finally, an open-ended question was designed to explore what
eI
20
21 can be improved in current systems to encourage and promote data reuse. Prior to circulating the
22 survey, a pilot study was conducted with the researchers at the University of Wolverhampton to
test the questions and identify necessary adjustments.
nf
23
24
25 Selection of subject areas
or

26 Since previous studies have focused on data intensive STEM disciplines (Kim & Stanton, 2016)
27
28
or specific repositories (Pasquetto et al., 2019) or had relatively small samples of less represented
m

29 disciplines (Tenopir et al., 2015), an overview and comparison of different qualitative and
30 quantitative disciplines was missing. In contrast, this study compared subject classifications in
both Scopus1 and Web of Science2 (WoS) for pre-selecting a wide range of disciplines. Scopus
at

31
32 was selected since its All Science Journal Classification (ASJC) subfield codes were more granular
33 (333 disciplines within 27 subject areas) than the subject classifications in WoS. In total, 20 Scopus
ion

34
disciplines in nine Scopus subject areas were selected (Table I) to partly replicate disciplines and
35
36
subject areas addressed in previous studies and to include new disciplines that have not been
37 previously reported.
38
Re

39 Data collection
40 a) Population and sampling
41
To ensure systematic coverage of researchers in the selected disciplines, the survey used direct
vie

42
43 email (Kim & Stanton, 2016) instead of soliciting responses from professional channels and social
44 media (Kratz & Strasser, 2015; Tenopir et al., 2015). For each of 20 selected Scopus disciplines,
45 Scopus was searched using their ASJC code (Table I), limiting the results to journal articles
w

46 published in 2018 and 2019 to focus on currently active researchers. Metadata from 8,000
47 randomly selected studies were collected, half from each year. First author email addresses were
48 extracted, where available, resulting in 3,500, on average, per discipline. A total of 70,060
49
50
researchers were identified for the study. Due to the interdisciplinary nature of some disciplines
51 and papers in Scopus, survey respondents were allowed to self-identify their discipline, if different
52 from the one suggested by their article, by selecting ‘Other’ or only select a broader subject area.
53
54
55 1 https://www.scopus.com/
56 2 https://clarivate.com/webofsciencegroup/solutions/web-of-science/
57
58
59
60
Page 5 of 38 Online Information Review

1
2
3 b) Survey data
4
5
Ethical approval for survey data collection was received from the University of Wolverhampton
6 Life Sciences Ethics Committee (LSEC/201920/MT/125) on June 12, 2020. The Jisc Online
7 survey platform was used to send individual survey invitations. The survey opened on July 14,
8 2020 and closed on August 17, 2020. In total, 70,060 invitations were emailed and 3,257 responses
9 received (response rate 4.65%) (Khan et al., 2022). 214 respondents only selected a broader subject
10 area and did not report their specific disciplines. The survey platform does not record whether
11
emails have been blocked or returned, so the underlying response rate may have been slightly
12
higher.
On
13
14
15 Data analysis
16 Originally, 402 responses were reported under ‘Other’, outside of the nine categories defined.
lin
17
18
However, 149 were variations of the disciplines listed in the study and these were merged with the
19 main categories, leaving 253 responses in ‘Other’.
eI
20
21 Four out of 20 disciplines received fewer than 30 responses: Organic Chemistry; Radiology,
22 Nuclear Medicine and Imaging; Aerospace Engineering; and Biomedical Engineering (Table I).
nf
23 These disciplines were excluded when analysing disciplinary differences. The cut-off 30 was
24 chosen as a common statistical sample size threshold, in the absence of a theoretical reason to pick
25
a given number.
or

26
27
28 The survey included single-choice and multiple-choice questions with an optional ‘Other’ field.
m

29 These answers were tallied for different groups and content analysis was conducted on open-text
30 answers in ‘Other’ fields. Free text from the open-ended question on data types was analyzed to
at

31 find term frequencies in broader subject areas. Chi-square tests were used to examine the
32
independence between categorical variables. Binomial multiple logistic regression was used to
33
ion

34 explore the effect of research experience and disciplinary differences on data sharing and reuse
35 experiences. The assumptions for binary logistic regression were met by the following: 1. Binary
36 dependent variable, 2. Each observation is independent of each other, 3. There is no
37 multicollinearity among the independent variables, and 4. Adequate sample size – minimum 10
38 cases for each independent variable. The glm function3 in the stats package (version 3.6.2) in R
Re

39 was used to perform binomial logistic regression. A manifest content analysis with an inductive
40
41
approach was used to analyse the final open-ended question (Bengtsson, 2016).
vie

42
43
44 Results
45 The 3,257 respondents mostly had over 10 years of research experience (64.4%), followed by 6-9
w

46
47
years (15.5%), 3-6 years (13.8%) and 0-3 years (6.2%), with similar levels in all subject areas.
48 More experienced researchers may be more familiar with the concepts of data sharing and data
49 reuse, and therefore more inclined to respond.
50
51
52
53
54
55 3 https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm
56
57
58
59
60
Online Information Review Page 6 of 38

1
2
3 The Social Sciences had the most responses (22.5%) within the broader subject areas, and
4
5 Medicine had the fewest (5.2%). The percentage of responses in specific disciplines ranged from
6 5% (Organic Chemistry) to 60% (Astronomy and Astrophysics). Many selected ‘Other’ disciplines
7 under a broader subject area (on average 39%), with the most in Engineering (63%) and the least
8
9 in Environmental Sciences (17%) (Table I). The number of responses in previously underreported
10 disciplines was significantly higher than in previous studies, including for education, linguistics,
11 visual and performing arts, literature, business, and economics.
12
On
13
14 Table I: Selected subject areas and disciplines for the survey and the number of responses
15
16
lin
17 Types and formats of data produced in different disciplines
18 Term frequency analyses suggested that survey and observations were the most common types of
19 data produced across all subject areas. Qualitative data, audio, and video were common in Social
eI
20
21
Sciences and Arts and Humanities. In comparison, samples, measurements, simulations, and
22 images were common in Science and Engineering (Table A, supplement 2). Thus, unsurprisingly,
there are substantial differences in the data types produced.
nf
23
24
25 Data formats also varied between subject areas (Figure 1; participants could select multiple
or

26 formats). Numerical data was overall popular across all subject areas except Arts and Humanities
27
(25%). Text was the most common format in Social Sciences (74%) and Arts and Humanities
28
m

29 (88%). These two research areas and Engineering were the top producers of multimedia (audio
30 and video) data: Visual and Performing Arts (33%) and Linguistics and Language (38%). In
contrast, Physical Sciences (45%), Engineering (40%), and Oceanography (45%) in Earth and
at

31
32 Planetary Sciences (36%) generate many computer programs. Biomedical Sciences (55.3%)
33 produces many images; this category was common overall in all subject areas except Social
ion

34 Sciences and Business and Economics.


35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 7 of 38 Online Information Review

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28
m

29 Figure 1. Formats of data produced in different subject areas


30
at

31 Data sharing across disciplines and research experience levels


32
33 Nearly half (46.8%, n=1,523) of the participants reported sharing data online. Self-reported data
ion

34 sharing experience varied among researchers in different stages of their research career (χ2=36.85,
35 p<0.001), and data sharing was more common with more research experience (from 39% during
36 0-3 years to 50% after 10+ years).
37
38 Differences in the prevalence of data sharing were significant between subject areas (χ2=200.17,
Re

39
p<0.001). Physical Sciences (73%) shared most, followed by Earth and Planetary Sciences (70%)
40
41 (Figure 2). In comparison, data sharing was less common in Business and Economics (33%) and
vie

42 Medicine (38%). A binomial logistic regression explored the effect of subject area on data sharing
43 behavior while controlling for research experience. Compared to Arts and Humanities, being in
44 Business and Economics, as well as Medicine significantly decreased the probability of data
45 sharing by 0.5 and 0.63 times respectively. In contrast, the chances of data sharing increased by
w

46 2.9 times for researchers in Physical Sciences and 2.4 time for those in Earth and Planetary
47
48
Sciences (Table B, supplement 2).
49
50
51
52
53
54
55
56
57
58
59
60
Online Information Review Page 8 of 38

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25 Figure 2. Data sharing by subject area (labels on bars represent number of responses)
or

26
27
28
Differences in the extent of data sharing exist within specific disciplines as well as between broad
m

29 subject areas. For example, within the Business and Economics research area, only 26% (n=50) of
30 193 respondents in Business and International Management had previous data sharing experience,
compared to 44% (n=110) of 251 respondents in Economics and Econometrics. Similarly, within
at

31
32 the Social Sciences, data sharing was less common in Education (27%, n=31) compared to Library
33 and Information Sciences (46%, n=117), and Linguistics and Language (46%, n=33). Despite data
ion

34 sharing being less common in Medicine, researchers in Infectious Disease (45%, n=17) more
35
36
commonly shared data than do those in Radiology (26%, n=7).
37
38 Methods of sharing research data
Re

39 From the different methods of sharing data on the web, over half of the respondents mentioned
40
institutional repositories (53.4%, n=813), followed by journal-supported repositories (30%,
41
vie

42 n=457) and personal websites (24.5%, n=373) (participants could select multiple methods). Chi-
43 square tests confirm the statistical significance of differences between subject areas in the types of
44 method used for sharing data (Table II).
45
w

Table II. Data sharing methods in different subject areas


46
47
48 The use of disciplinary repositories was common in most STEM fields except Engineering (1%)
49
and was rare in Arts and Humanities (7%), and Business and Economics (8%). Interdisciplinary
50
51 repository usage was relatively common in Biomedical Sciences (26%), Earth and Planetary
52 Sciences (24%) and Social Sciences (23%) and least common in Medicine (6%). Sharing data on
53 personal websites was most common in Physical Sciences (37%) and least in Medicine (9%). Non-
54 standard data deposit practices in the Social Sciences and Arts and Humanities include
55 Academia.edu and Google drive, which are not ideal solutions for long-term retrieval.
56
57
58
59
60
Page 9 of 38 Online Information Review

1
2
3
4
5
Choice of data repositories
6 When asked how they first found repositories to share data, most researchers responded that they
7 were already aware of them, even though this varied between disciplines (Table III). For example,
8 in Physical Sciences and Biomedical Sciences, over 60% of respondents were already aware of
9 relevant data repositories. This was followed by consulting with colleagues and consulting with
10
experts, which was common across all disciplines. General web searches for repositories were
11
12 more common in Engineering (29%) and Arts and Humanities (20%). Searching re3data, the
registry of research data repositories, was not a preferred method by researchers in any discipline
On
13
14 (5% or less), so this is perhaps a professional librarian’s tool.
15
16
lin
17 Table III. How researchers first found repositories to share data
18
19
Ease of use (53.8%, n=820), repository reputation (46.9%, n=714), disciplinary norms (41.1%,
eI
20
21 n=626), and appropriateness for the data type (40.5%, n=617) were the top reasons for choosing a
22 data repository (Table C, supplement 2). Other factors that influence researchers’ choices are
nf
23 requirement from funding bodies, journals and institutions, accessibility, privacy, security, zero
24 cost, digital object identifier (DOI) assignment, interdisciplinary research support, and
25
international reputation for collaborative project support. The following factors were dependent
or

26
27 on disciplinary differences: Reputation of repository, cost, and appropriateness for data type. Cost
28 and appropriateness for data type were important factors in disciplines where disciplinary
m

29 repositories were more commonly used.


30
at

31
Data reuse across disciplines and research experience
32
33 Overall, 54.3% (n=1,769) of the respondents had reused existing datasets. Data reuse frequency
ion

34 was dependent on researchers’ experience (χ2=8.88, p=0.03), increasing with research experience:
35 47% in 0-3 years, 49% in 3-6 years, 53% in 6-9 years, and 56% after 10+ years.
36
37 Data reuse experience significantly varied between subject areas (χ2=152.03, p<0.001). Over 80%
38
Re

39
of respondents in Physical Sciences and Earth and Planetary Sciences, and 56~60% of respondents
40 in Business and Economics, Environmental Sciences, and Engineering had reused existing data
41 (Figure 3). This rate was lower among Arts and Humanities (42%), Medicine (44%), Social
vie

42 Sciences (47%), and Biomedical Sciences (49%). Only 1-3% of participants in all subject areas
43 responded that their research does not use data; Arts and Humanities was an exception (16%).
44 Outcomes of a binomial multivariable logistic regression (subject areas and research experience
45
as predictors) indicate that when compared to data reuse in Arts and Humanities, the probability
w

46
47 of data reuse increased by 5.86 times in Earth and Planetary Sciences; 5.56 times in Physical
48 Sciences; 1.76 times in Engineering; 1.72 times in Environmental Sciences; and 1.5 times in
49 Business and Economics (Table D, supplement 2).
50
51 Data reuse varied within specific disciplines in Business and Economics, as well as in
52 Environmental Sciences. In contrast to 72% (n=180) researchers in Economics and Econometrics,
53
only 38% (n=74) in Business and International Management reused secondary data. Within
54
55 Environmental Sciences, data reuse was more common in Ecology (63%, n=93) than Pollution
56
57
58
59
60
Online Information Review Page 10 of 38

1
2
3 (49%, n=53). This trend is similar to data sharing behaviour in these fields as data sharing was less
4
5
common in the fields that predominantly rely on primary data.
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28
m

29
Figure 3. Data reuse across subject areas (labels on bars represent number of responses)
30
at

31
32 Data reuse purposes
33 Overall, 63.1% (n=1,116) of researchers reported that they combine multiple existing datasets to
ion

34
35 answer novel research questions; 50.7% (n=897) reused data for comparing or ground truthing,
36 i.e., calibrate, compare, confirm; and 46.6% (n=825) analysed a single dataset to answer novel
37 research questions. Data reuse types varied between subject areas (p < 0.001 across all three types)
38
Re

39
(Figure 4). From the dotted lines in Figure 4 (the average in each category of data reuse), analysis
40 of a single dataset was most common in Medicine (59%) and least common in Environmental
41 Sciences (29%). Combining multiple datasets to answer new research questions was common
vie

42
43
overall, especially in Earth and Planetary Sciences (80%), Physical Sciences (77%), and
44 Environmental Sciences (71%). Comparative data analysis was most common in Engineering
45 (71%) and least common in Business and Economics (28%).
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 11 of 38 Online Information Review

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28
m

29
30
at

31
32
33 Figure 4 Data reuse types in different subject areas (dotted lines represent the average percentage
ion

34 in each area)
35
36
37
38 Other reuse types include testing and validating machine learning models, historical data analysis,
Re

39 teaching (e.g., student projects), evolution analysis, quantifying long-term climate conditions,
40
41
applying new statistical methods on existing datasets, replicating findings in diverse populations,
vie

42 reusing existing linguistic corpora, systematic review and meta-analysis, using GIS data to
43 correlate with image files, and discourse analysis.
44
45
w

46 Data sharing vs data reuse


47 Researchers who reuse data were more likely to share data (56.8%, n=1,004), compared to those
48 who only used their own primary data (32.6%, n=396). In contrast, data reuse was more frequent
49
in Engineering and Business and Economics than data sharing on the web. However, sharing and
50
51
reuse of data were dependent overall (χ2=181.11, p < 0.001). This was the same within all
52 individual subject areas except Medicine and Engineering (Table E, supplement 2).
53
54 Data sharing among those who rely on their own data was relatively common in Earth and
55 Planetary Sciences (50%), Arts and Humanities (44%), Physical Sciences (43%), and
56
57
58
59
60
Online Information Review Page 12 of 38

1
2
3 Environmental Science (41%) (Figure B, supplement 3). It is possible that those who reuse data
4
5
shared by other researchers are more aware of data sharing practices in their field, but those who
6 only use their own primary data for research are less so. Alternatively, the common factor may be
7 the importance of data sharing for particular specialties.
8
9
10 Finding datasets to reuse
11
12 Among the researchers who had reused datasets, 60.9% (n=1,078) found datasets by reading
relevant papers. Also popular were web searches, such as Google Dataset Search (46.1%, n=816),
On
13
14 and disciplinary repository searches (45.6%, n=806). However, all methods to find datasets varied
15
16
in popularity between subject areas (Table F, supplement 2). Searching disciplinary repositories
lin
17 was more common in Physical Sciences (69.4%) and Earth and Planetary Sciences (50%), whereas
18 interdisciplinary repository search was more common in Arts and Humanities (35%) and Social
19
Sciences (28%). Similarly, web search was a majority choice in Engineering (63%), Arts and
eI
20
21 Humanities (60%), and Business and Economics (51.7%). These methods were not dependent on
22 research experience.
nf
23
24
25 The following factors were considered most important by researchers when searching for existing
or

26 datasets to reuse: proper documentation (67%, n=2,195), open data (52%, n=1,678), and
27
information on usability of data (42%, n=1,375). Availability of data in a universal standard format
28
m

29 (36%, n=1163) and evidence that the dataset has an associated publication (34%, n=1,107) were
30 of moderate importance. Evidence of prior reuse was rarely considered important (8.3%, n=270).
at

31
32
33 Despite evidence of increasing data reuse in all disciplines, most researchers reported difficulty
ion

34 finding datasets to reuse. Physical Sciences was an exception, where over 50% researchers could
35 easily find datasets to reuse. This percentage was above average in Earth and Planetary Sciences
36
37 (33%, n=57 out of 174) and Biomedical Sciences (29%, n=58 out of 199) as well.
38
Re

39 Finding datasets becomes slightly easier with experience. 24% (n=511 out of 2,090) of researchers
40 with over 10 years of research experience found it difficult to find datasets to reuse, compared to
41
26~28% of those with less experience (Figure C and D, supplement 3).
vie

42
43
44 Future improvements
45 1,831 open-text responses suggested future improvements in current systems to promote data
w

46 sharing and reuse. A content analysis of these responses identified 23 recommendations in eight
47 themes within three categories: 1. Issues around data, 2. Technological solutions, and 3. Cultural
48 and policy changes (Table IV).
49
50
51 Table IV Future improvements needed to promote data sharing and reuse
52
53
54
55
56
57
58
59
60
Page 13 of 38 Online Information Review

1
2
3
4
5 The most mentioned barrier to data reuse was a lack of knowledge about where and how to search
6 for datasets. Therefore, a single trusted portal or federated search system across disciplines is
7 needed that allows easy discovery of data:
8 “Perhaps more universal/federated searching mechanisms or portals--ArchiveGrid
9 (https://researchworks.oclc.org/archivegrid/) was a game-changer for my research when it was
10
11
released--now I no longer have to think "where might records about X person be?" and go to each
12 individual institution and search.”
On
13
14 A few responses pointed out that not all datasets can have multiple use cases, because some are
15 created for a single use only. Therefore, information on the applicability of data can be helpful to
16 external users. Adequate contextual information is key to successful reuse of data, along with
lin
17
researchers’ commitments to share data and proper data curation.
18
19
Streamlined institutional review board rules are critical to data sharing for reuse purposes in
eI
20
21 research where human participants are involved. Legal constraints about cultural data can be an
22 impediment to data reuse in the Arts and Humanities. A response from a humanities researcher
nf
23 outlines different policy issues and the need for incentives:
24 “Before we can improve data reuse, we need to improve communications between disciplines,
25
accept the resource costs of making data reusable, reward people who do make their data
or

26
27 reusable, and of course work with legal systems and institutions (archives, libraries, publishers
28 etc) who 'own' cultural data to make reuse for research more fluid”.
m

29
30 Collaborations between data creators and reusers were recommended by multiple participants, as
at

31 well as changes in research culture and policies. Participants mentioned that secondary data
32 analysis may not be considered ‘original enough’ by journals to be published. Environmental
33
Sciences researchers mentioned that their data are often very difficult to collect, and they can be
ion

34
35 reluctant to give away the fruits of their labour. Incentives such as data badges, data reuse
36 indicators, more funding for secondary data analysis projects, and rescuing of historical data were
37 recommended to promote more data sharing and reward data creators. As suggested by one
38 participant:
Re

39 “…data work is nowadays high-quality scientific work as well, i.e., the reputation for data work
40
needs to be increased (co-authorship for data work; establish "data"-chairs at universities and
41
research institutes, etc.)”
vie

42
43
44 Limitations
45 The precision of the results is affected by differing subgroup sample sizes. The sample sizes of
w

46 researchers in different experience groups varied, with over 60% in the 10+ years’ experience
47
group. This could be due to the topic of survey since we found that those with more experience
48
49
tend to share and reuse datasets more frequently. In addition, four disciplines had fewer than 30
50 responses. Differences for these disciplines were not reported separately as the results may not
51 accurately represent that group. The participant recruitment method may also have impacted this
52 (i.e., sample selection bias) as more experienced researchers tend to publish more and are listed as
53 the first author more frequently. The results also have an unknown survey self-selection bias
54 related to the 4.65% response rate. Unlike similar studies (Unal et al., 2019; Tenopir et al., 2020),
55
56
57
58
59
60
Online Information Review Page 14 of 38

1
2
3 researchers’ geographic location was not considered in this survey due to its focus on web-based
4
5
data sharing and reuse.
6
7 Discussion
8 Data sharing is known to be increasing in some disciplines to comply with funding body and
9 institutional requirements. However, research data are not always shared in a meaningful way that
10 can lead to long-term accessibility and reuse. In this study, both data sharing and reuse were
11
12
dependent on researchers’ experience; those with more than 10 years of experience tended to share
and reuse data more often. This supports the positive association between data sharing and a longer
On
13
14 career reported by Gregory et al. (2020) and Dorta-González et al. (2021). Disciplinary differences
15 exist in how researchers share data on the web, presumably driven by the culture of data sharing
16 in a discipline – Physical Sciences, Earth and Planetary Sciences, and Environmental Sciences are
lin
17 more likely, whereas Business and Economics, Medicine, and Engineering are less likely to share
18
data. Institutional repositories were frequently used in all disciplines, with journal-supported
19
repositories also being quite popular. This could be because of rapid growth of institutional
eI
20
21 repositories and research data services in higher education institutions to comply with funder
22 mandates (Cragin et al., 2010; Cox et al., 2017). Many journals are also mandating data
nf
23 accessibility statements and have associated data repositories, such as Mendeley Data by Elsevier.
24 These results extend the previously known patterns in Tenopir et al. (2015) to a wider range of
25 disciplines, (e.g., Business and Economics) and demonstrate increased use of such repositories in
or

26
recent years.
27
28
m

29 Disciplinary repositories have emerged to support domain specific data, such as in astronomy and
30 astrophysics, zoology, and social science (Wallis et al., 2013; Faniel & Yakel, 2017). Data sharing
at

31 and reuse are relatively common in these fields because researchers tend to be more aware of
32 frequently used repositories in their specialty. This is in line with the good data practices reported
33 by Tenopir et al. (2020) for Earth and Planetary Sciences and Environmental Sciences. However,
ion

34
35
the current results suggest that disciplinary repository usage has increased in Physical Sciences in
36 recent years, compared to Tenopir et al. (2015). Personal websites were also frequently used for
37 data sharing in many subject areas but not in Medicine, perhaps because of sensitive personal
38 health data. This aligns with the findings of Tenopir et al. (2020). The examples of commonly used
Re

39 repositories reported by participants in this study demonstrate a lack of established data sharing
40 methods in Engineering, Business and Economics, and Arts and Humanities, which could be one
41
of the reasons for less data sharing in these subject areas. The Registry of Research Data
vie

42
43
Repositories (re3data.org) currently lists 951 repositories under Humanities and Social Sciences,
44 including 207 for Economics, and 517 repositories for Engineering Sciences among other
45 disciplines, so the infrastructure for sharing seems to be available.
w

46
47 This study shows that the previously reported growing data reuse in most disciplines (Bishop &
48 Kuula-Luumi, 2017; Borgman et al., 2019; Khan et al., 2021) has continued and is highest in
49
Physical Sciences and Earth and Planetary Sciences. Self-reported data reuse was significantly
50
51 more common in Engineering and Business and Economics than data sharing. In contrast to Curty
52 et al.’s (2017) secondary analysis, the evidence here suggest that data sharing and reuse are
53 dependent, except for Engineering and Medicine. This suggests that the relationship between data
54 sharing and reuse has evolved, perhaps due to a greater accumulation of data sharing experience
55 over time.
56
57
58
59
60
Page 15 of 38 Online Information Review

1
2
3
4
5
Despite high levels of data reuse, researchers in most disciplines except Physical Sciences usually
6 struggle to find datasets to reuse. Hrynaszkiewicz et al. (2021) reported similar findings for their
7 overall study population. The study results also support the findings of Kratz and Strasser (2015)
8 that most researchers read relevant papers to find reusable datasets, with web searches and
9 disciplinary repository searches also being common. The findings here extend these prior findings
10 to show disciplinary differences. Searching disciplinary repositories was common in Physical
11
Sciences and Earth and Planetary Sciences, compared to other disciplines. Even though reading
12
papers is opted for by over 60% of researchers, recent studies reported that only a small percentage
On
13
14 of journal articles share data in a meaningful and accessible way (Federer et al., 2018; Thelwall et
15 al., 2020). This may increase the difficulty of finding datasets from relevant articles. Formalizing
16 data citation across all disciplines was suggested by the respondents. This will ensure that datasets
lin
17 are linked to associated articles and in turn increase visibility of data, as well as being an incentive
18 for further data sharing (Dorta-González et al., 2021).
19
eI
20
Conclusion
21
22 This study has revealed the extent to which data production, sharing and reuse varies between
nf
23 disciplines. While self-reported data sharing is increasing, significant disciplinary differences
24 remain in the adoption of standard data sharing methods. Particularly for qualitative disciplines
25 involving human participants, adequate guidance should be developed, and existing guidelines
or

26
need to be reviewed by experts and funders to support best practices for de-identifying data. For
27
28 example, in 2012 the U.S. Department of Health and Human Services published guidance for de-
m

29 identification standards in accordance with the Health Insurance Portability and Accountability
30 Act (HIPAA) Privacy Rule. Such guidance is useful but may not fit all purposes and should be
at

31 adapted for different country regulations and discipline specific rules.


32
33 In contrast to previous studies, widespread usage of institutional repositories in this study sample
ion

34
35
indicates that institutional support can play an important role in data sharing. At the institutional
36 level, research data management training programs and curated resources (e.g., lists of relevant
37 data repositories) can help researchers in all disciplines adopt best practices for data production,
38 management and sharing. Early career researchers will especially benefit from this because data
Re

39 sharing and reuse were less common among less experienced respondents. Resources developed
40 by the community of researchers, such as FAIRsharing.org4 can be used in training to help find a
41
suitable data repository, as can guidance provided in existing studies (Alter & Gonzalez, 2018;
vie

42
43
Figueiredo, 2017).
44
45 Standard data sharing and citation makes data findable and accessible in the long-term and can
w

46 help reduce the burden of finding data to reuse. Although the results show that sharing data as
47 supplementary materials in a journal or personal website is still common across disciplines, this
48 does not ensure better discoverability and accessibility, and journal editors can ensure any related
49
research data are deposited in a standard manner, adhering to FAIR principles. Since web search
50
51 (e.g., Google Dataset Search5) was the second most common method used to find reusable datasets,
52 data repository managers can help researchers by adopting Schema.org metadata standards to be
53 indexed by Google Dataset Search and make their datasets more easily discoverable and reusable
54
55 4 https://fairsharing.org/
56 5 https://datasetsearch.research.google.com
57
58
59
60
Online Information Review Page 16 of 38

1
2
3 (Patel, 2019). This will particularly help siloed institutional repositories as the survey results show
4
5
that researchers are more likely to search well-known disciplinary repositories to find datasets to
6 reuse.
7
8 Future studies can examine researchers’ attitudes and needs in Arts and Humanities, Business and
9 Economics, and Engineering to further explore why data sharing is particularly low in these subject
10 areas despite relatively frequent data reuse. This will help to identify areas that need new policies,
11
guidance, and infrastructure development. Furthermore, based on the researchers’ responses, this
12
study makes 23 recommendations related to data sharing, technological solutions and cultural and
On
13
14 policy changes to support data sharing and promote data reuse. Incentives such as rewarding data
15 creators in a formal manner similar to article publishing and implementing data reuse indicators or
16 data badges to visualize impact of data sharing seem to be particularly useful. However, there are
lin
17 few incentives-based studies of data sharing (Rowhani-Farid et al., 2017; Devriendt et al., 2021)
18 and more are needed to properly assess the impact of incentives on data sharing.
19
eI
20
21 Data availability statement
22 Data collected from this study are available on figshare:
https://doi.org/10.6084/m9.figshare.19596967.v1
nf
23
24
25 Conflicts of interest/Competing interests
or

26 Not applicable
27
28
m

29
30 References
at

31 1. Alter, G., & Gonzalez, R. (2018), “Responsible practices for data sharing”, American
32 Psychologist, 73(2), 146.
33
ion

34
2. Bell, G., Hey, T., & Szalay, A. (2009), “Beyond the data deluge. Science”, 323(5919),
35 1297-1298.
36 3. Bengtsson, M. (2016), “How to plan and perform a qualitative study using content
37 analysis”, NursingPlus, Open, 2, 8-14.
38
Re

39
4. Bezuidenhout, L. (2019), “To share or not to share: Incentivizing data sharing in life
40 science communities”, Developing World Bioethics, Vol. 19 No. 1, pp. 18-24.
41 5. Bishop, L., & Kuula-Luumi, A. (2017), “Revisiting qualitative data reuse: A decade on”,
vie

42 Sage Open, Vol. 7 No. 1, 2158244016685136.


43
6. Borgman, C. L., Scharnhorst, A., & Golshan, M. S. (2019), “Digital data archives as
44
45 knowledge infrastructures: Mediating data sharing and reuse”, Journal of the Association
w

46 for Information Science and Technology, 70(8), pp. 888-904.


47 7. Ceci, S. J. (1988), “Scientists’ attitudes toward data sharing”, Science, Technology, &
48
Human Values, 13(1-2), pp.45-52.
49
50 8. Coady, S. A., Mensah, G. A., Wagner, E. L., Goldfarb, M. E., Hitchcock, D. M., &
51 Giffen, C. A. (2017), “Use of the national heart, lung, and blood institute data
52 repository”, New England Journal of Medicine, Vol. 376 No. 19, pp.1849-1858.
53 9. Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020),
54
55 “The citation advantage of linking publications to research data”, PloS one, Vol. 15 No.
56 4, e0230416.
57
58
59
60
Page 17 of 38 Online Information Review

1
2
3 10. Cox, A. M., Kennan, M. A., Lyon, L., & Pinfield, S. (2017), “Developments in research
4
5 data management in academic libraries: Towards an understanding of research data
6 service maturity”, Journal of the Association for Information Science and
7 Technology, Vol. 68 No. 9, pp. 2182-2200.
8 11. Cragin, M. H., Palmer, C. L., Carlson, J. R., & Witt, M. (2010), “Data sharing, small
9
10
science and institutional repositories”, Philosophical Transactions of the Royal Society A:
11 Mathematical, Physical and Engineering Sciences, Vol. 368 No. 1926, pp. 4023-4038.
12 12. Curty, R. G., Crowston, K., Specht, A., Grant, B. W., & Dalton, E. D. (2017), “Attitudes
On
13 and norms affecting scientists’ data reuse”, PloS one, 12(12), e0189288.
14
15
13. Devriendt, T., Shabani, M., & Borry, P. (2021), “Data sharing in biomedical sciences: a
16 systematic review of incentives”, Biopreservation and Biobanking, 19(3), 219-227.
lin
17 14. Dorta-González, P., González-Betancor, S. M., & Dorta-González, M. I. (2021), “To
18 what extent is researchers' data-sharing motivated by formal mechanisms of recognition
19
and credit?”, Scientometrics, 126(3), 2209-2225.
eI
20
21 15. Faniel, I. M., & Yakel, E. (2017), “Practices do not make perfect: Disciplinary data
22 sharing and reuse practices and their implications for repository data curation”, Curating
nf
23 research data, volume one: Practical strategies for your digital repository, 1, pp.103-
24
126.
25
or

26 16. Fecher, B., Friesike, S., & Hebing, M. (2015), “What drives academic data sharing?”,
27 PloS one, 10(2), e0118053.
28 17. Federer L. M., Lu Y-L, Joubert D. J., Welsh J., Brandys B. (2015), “Biomedical Data
m

29
Sharing and Reuse: Attitudes and Practices of Clinical and Scientific Research Staff”,
30
PLoS ONE, Vol. 10 No. 6, e0129506. https://doi.org/10.1371/journal.pone.0129506
at

31
32 18. Federer, L. M., Belter, C. W., Joubert, D. J., Livinski, A., Lu, Y. L., Snyders, L. N., &
33 Thompson, H. (2018), “Data sharing in PLOS ONE: an analysis of data availability
ion

34 statements”, PloS one, 13(5), e0194768.


35
36 19. Figueiredo, A. S. (2017), “Data sharing: convert challenges into opportunities”, Frontiers
37 in public health, 5, 327.
38 20. Fink, A. (2003), How to design survey studies, Sage.
Re

39 21. Gregory, K., Groth, P., Scharnhorst, A., & Wyatt, S. (2020), “Lost or Found?
40
41 Discovering Data Needed for Research”, Harvard Data Science Review, 2(2).
vie

42 22. Hansson, K., & Dahlgren, A. (2022), “Open research data repositories: Practices, norms,
43 and metadata for sharing images”, Journal of the Association for Information Science and
44 Technology, Vol. 73 No. 2, pp. 303-316.
45
w

46
23. Henneken, E. A., & Accomazzi, A. (2011), “Linking to data-effect on citation rates in
47 astronomy”, arXiv preprint arXiv:1111.3618.
48 24. Hrynaszkiewicz, I., Harney, J., & Cadwallader, L. (2021), “A survey of researchers'
49 needs and priorities for data sharing”.
50
51
25. Khan, N., Thelwall, M. & Kousha, K. (2021), “Measuring the impact of biodiversity
52 datasets: data reuse, citations and altmetrics”, Scientometrics, pp.1-19.
53 26. Khan, N., Thelwall, M., & Kousha, K. (2022), “Survey data on disciplinary differences in
54 data sharing and reuse practices”, figshare. Dataset.
55
https://doi.org/10.6084/m9.figshare.19596967.v1
56
57
58
59
60
Online Information Review Page 18 of 38

1
2
3 27. Kiley, R., Peatfield, T., Hansen, J., & Reddington, F. (2017), “Data sharing from clinical
4
5 trials—a research funder’s perspective”, The New England Journal of Medicine,
6 377:1990-1992.
7 28. Kim, J., Schuler, E. R., & Pechenina, A. (2018), “Predictors of data sharing and reuse
8 behavior in academic communities”, In Knowledge Discovery and Data Design
9
10
Innovation: Proceedings of the International Conference on Knowledge Management
11 (ICKM 2017), pp. 1-25.
12 29. Kim, Y., & Stanton, J. M. (2016), “Institutional and individual factors affecting scientists'
On
13 data‐sharing behaviors: A multilevel analysis”, Journal of the Association for
14
15
Information Science and Technology, 67(4), pp.776-799.
16 30. Kim, Y., & Yoon, A. (2017), “Scientists' data reuse behaviors: A multilevel analysis”,
lin
17 Journal of the Association for Information Science and Technology, Vol. 68 No. 12, pp.
18 2709-2719.
19
31. Kratz, J. E., & Strasser, C. (2015), “Making data count”, Scientific Data, Vol. 2 No. 1,
eI
20
21 pp. 1-5.
22 32. Mason, C. M., Box, P. J., & Burns, S. M. (2020), “Research data sharing in the
nf
23 Australian national science agency: Understanding the relative importance of
24
organisational, disciplinary and domain-specific influences”, Plos one, 15(8), e0238071.
25
or

26 33. Meystre, S. M., Lovis, C., Bürkle, T., Tognola, G., Budrionis, A., & Lehmann, C. U.
27 (2017), “Clinical data reuse or secondary use: current status and potential future
28 progress”, Yearbook of medical informatics, 26(01), 38-52.
m

29
34. Mozersky, J., Walsh, H., Parsons, M., McIntosh, T., Baldwin, K., & DuBois, J. M.
30
(2020), “Are we ready to share qualitative research data? Knowledge and preparedness
at

31
32 among qualitative researchers, IRB Members, and data repository curators”, IASSIST
33 quarterly, 43(4).
ion

34 35. Office for Civil Rights (2012), “Guidance Regarding Methods for De-identification of
35
36 Protected Health Information in Accordance with the Health Insurance Portability and
37 Accountability Act (HIPAA) Privacy Rule”, available at:
38 https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/D
Re

39 e-identification/hhs_deid_guidance.pdf
40
41 36. Pasquetto, I. V., Borgman, C. L., & Wofford, M. F. (2019), “Uses and reuses of scientific
vie

42 data: The data creators’ advantage”, Harvard Data Science Review, 1(2).
43 37. Pampel, H., Vierkant, P., Scholze, F., Bertelmann, R., Kindling, M., Klump, J.,
44 Goebelbecker, H.J., Gundlach, J., Schirmbacher, P. and Dierolf, U. (2013). Making
45
w

46
research data repositories visible: the re3data. org registry. PloS One, 8(11), e78080.
47 38. Patel, D. (2019), “How Google’s Dataset Search Engine Work”, available at:
48 https://towardsdatascience.com/how-googles-dataset-search-engine-work-928fa5237787
49 (accessed 31 March 2021).
50
39. Pinfield, S., Cox, A. M., & Smith, J. (2014), “Research data management and libraries:
51
52 Relationships, activities, drivers and influences”, PLoS One, Vol. 9 No. 12, e114734.
53 40. Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007), “Sharing detailed research data is
54 associated with increased citation rate”, PloS one, 2(3), e308.
55
56
57
58
59
60
Page 19 of 38 Online Information Review

1
2
3 41. Piwowar, H. A. (2011), “Who shares? Who doesn't? Factors associated with openly
4
5 archiving raw research data”, PloS one, 6(7), e18657.
6 42. “re3data.org - Registry of Research Data Repositories”, available at:
7 https://doi.org/10.17616/R3D (accessed 17 November 2020).
8 43. REF (2019), “Guidance on submissions (2019/01) – REF 2021”, available at:
9
10
https://www.ref.ac.uk/publications/guidance-on-submissions-201901/ (accessed 13 July
11 2021).
12 44. Rowhani-Farid, A., Allen, M., & Barnett, A. G. (2017), “What incentives increase data
On
13 sharing in health and medical research? A systematic review”, Research integrity and
14
15
peer review, 2(1), 1-10.
16 45. Sayogo, D. S., & Pardo, T. A. (2013), “Exploring the determinants of scientific data
lin
17 sharing: Understanding the motivation to publish research data”, Government information
18 quarterly, 30, S19-S31.
19
46. Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D. &
eI
20
21 Dorsett, K. (2015), “Changes in data sharing and data reuse practices and perceptions
22 among scientists worldwide”, PloS one, 10(8), e0134826.
nf
23 47. Tenopir, C., Rice, N. M., Allard, S., Baird, L., Borycz, J., Christian, L., Grant, B.,
24
Olendorf, R. & Sandusky, R. J. (2020), “Data sharing, management, use, and reuse:
25
or

26 Practices and perceptions of scientists worldwide”, PloS one, 15(3), e0229003.


27 48. Thelwall, M., Munafò, M., Mas-Bleda, A., Stuart, E., Makita, M., Weigert, V., Keene, C.,
28 Khan, N., Drax, K. and Kousha, K. (2020), “Is useful research data usually shared? An
m

29
investigation of genome-wide association study summary statistics”, Plos One, Vol. 15
30
No. 2, e0229578. https://doi.org/10.1371/journal.pone.0229578
at

31
32 49. Unal, Y., Chowdhury, G., Kurbanoglu, S., Boustany, J., & Walton, G. (2019), “Research
33 data management and data sharing behaviour of university researchers”.
ion

34 50. Wallis, J. C., Rolando, E., & Borgman, C. L. (2013), “If we share data, will anyone use
35
36 them? Data sharing and reuse in the long tail of science and technology”, PloS One, Vol.
37 8 No. 7, e67332.
38 51. Whitty, C. J., Mundel, T., Farrar, J., Heymann, D. L., Davies, S. C., & Walport, M. J.
Re

39 (2015), “Providing incentives to share data early in health emergencies: the role of
40
41 journal editors”, The Lancet, 386(10006), 1797-1798.
vie

42 52. Wiley, C. (2018), “Data sharing and engineering faculty: An analysis of selected
43 publications”, Science & technology libraries, 37(4), pp.409-419.
44 53. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A.,
45
w

46
Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E. and Bouwman, J. (2016),
47 “The FAIR Guiding Principles for scientific data management and stewardship”,
48 Scientific Data, Vol. 3 No. 1, pp. 1-9.
49 54. Yoon, A. (2016), “Red flags in data: Learning from failed data reuse experiences”,
50
51
Proceedings of the Association for Information Science and Technology, Vol. 53 No. 1,
52 pp.1-6.
53 55. Zenk-Möltgen, W., Akdeniz, E., Katsanidou, A., Naßhoven, V., & Balaban, E. (2018).
54 Factors influencing the data sharing behavior of researchers in sociology and political
55
science. Journal of Documentation.
56
57
58
59
60
Online Information Review Page 20 of 38

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28 Figure 1: Formats of data produced in different subject areas
m

29
2777x1805mm (72 x 72 DPI)
30
at

31
32
33
ion

34
35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 21 of 38 Online Information Review

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25 Figure 2. Data sharing by subject area (labels on bars represent number of responses)
or

26
27 1166x666mm (72 x 72 DPI)
28
m

29
30
at

31
32
33
ion

34
35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Online Information Review Page 22 of 38

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
Figure 3. Data reuse across subject areas (labels on bars represent number of responses)
27
28
m

1083x666mm (72 x 72 DPI)


29
30
at

31
32
33
ion

34
35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 23 of 38 Online Information Review

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28
m

29
30
at

31 Figure 4 Data reuse types in different subject areas (dotted lines represent the average percentage in each
32 area)
33
ion

1333x999mm (72 x 72 DPI)


34
35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Online Information Review Page 24 of 38

1
2
3
4
Table I: Selected subject areas and disciplines for the survey and number of responses
5
6 Subject area Discipline Scopus Number of Percentage of
7
subject responses in each responses within
8
9 code discipline* broader subject
10 category (%)
11
12 Social Sciences Linguistics and Language 3310 72 10%
On
13 (n=733, 22.51%) Education 3304 114 16%
14
15 Library and Information Sciences 3309 252 34%
16 ‘Other’ in Social Sciences 211 29%
lin
17
18 Arts and Visual and Performing Arts 1213 64 19%
19 Humanities Literature and Literary Theory 1208 103 31%
eI
20
(n=334, 10.25%) ‘Other’ in Arts and Humanities 139 42%
21
22 Business and Business and International 1403 193 33%
nf
23
Economics Management
24
25 (n=592, 18.18%) Economics and Econometrics 2002 251 42%
or

26
‘Other’ in Business and 123 21%
27
28 Economics
m

29
Physical Sciences Astronomy and Astrophysics 3103 133 60%
30
at

31 (n=220, 6.75%) Organic Chemistry 1602 11 5%


32 ‘Other’ in Physical Sciences 63 29%
33
ion

34 Biomedical Neurology 2808 39 15%


35 Sciences Pharmacology 3004 46 17%
36
37 (n=264, 8.11%) ‘Other’ in Biomedical Sciences 145 55%
38
Re

Medicine Radiology, Nuclear Medicine and 2741 27 16%


39
40 (n=170, 5.22%) Imaging
41 Infectious Diseases 2725 38 22%
vie

42
43 ‘Other’ in Medicine 97 57%
44 Environmental Ecology 2303 147 45%
45
w

46 Sciences Pollution 2310 108 33%


47 (n=324, 9.95%) ‘Other’ in Environmental 56 17%
48
49 Sciences
50 Earth and Planetary Geology 1907 56 30%
51
Sciences Oceanography 1910 53 28%
52
53 (n=188, 5.77%) ‘Other’ in Earth and Planetary 73 39%
54
Sciences
55
56
57
58
59
60
Page 25 of 38 Online Information Review

1
2
3 Engineering Aerospace Engineering 2202 14 8%
4
5 (n=179, 5.5%) Biomedical Engineering 2204 19 11%
6 Environmental Engineering 2305 30 17%
7
8 ‘Other’ in Engineering 113 63%
9 Other 253 8%
10
11
12
On
13
14
Table II. Data sharing methods in different subject areas
15
16 Subject Institutional Disciplinary Interdiscipli Journal Personal Commonly used
lin
17 category repository repository nary supported website repositories
18 (n=previously repository repository
19 data shared)
Social 169 (54.2%) 42 (14%) 71 (23%) 66 (21%) 75 (24%) Academia.edu,
eI
20
21 Sciences Zenodo, DANS,
22 (n=312, ICPSR, Figshare
42.6%)
nf
23
24 Arts and 105 (66.9%) 11 (7%) 21 (13%) 34 (22%) 53 (34%) Academia.edu, Google
25 Humanities drive, Zenodo,
or

26 (n=157, 47%) Mendeley, OSF


27 Business and 89 (46%) 16 (8%) 18 (9%) 75 (39%) 62 (32%) Data in Brief,
Economics Figshare, ICPSR,
28
m

(n=193, Dataverse, American


29
32.6%) Economic Association
30
Physical 77 (48%) 44 (28%) 31 (19%) 58 (36%) 59 (37%) Zenodo, CADC,
at

31
Sciences GitHub, CCDC,
32 (n=160, NASA databases,
33 72.7%) SDSS, Sloan digital
ion

34 sky survey, SciFinder


35 Biomedical 66 (47%) 34 (24%) 37 (26%) 50 (36%) 27 (19%) DDBJ, OSF, Figshare,
36 Sciences GenBank, MRI Image
37 (n=141, Consortium, NCBI,
38 53.4%) EMBL, PubMed,
Re

39 PubChem, The Cancer


40 Imaging Archive,
41 GitHub, GEO
vie

42 Medicine 35 (54%) 15 (23%) 4 (6%) 24 (37%) 6 (9%) dbGAP, NCBI, GEO,


43 (n=65, 38%) GenBank, Zenodo,
44 Dryad, EGA, IADR,
45 fMRI database, PLoS
w

46 ONE
47 Environmental 90 (51%) 40 (23%) 30 (17%) 64 (36%) 22 (13%) Dryad, GenBank,
48 Sciences NCBI, PANGAEA,
49 (n=176, SeaBass, GitHub,
50 54.3%) MorphoSource,
51 ForestPlots.NET,
52 NASA, NSF Arctic
53 Data Centre
54 Earth and 86 (65%) 32 (24%) 32 (24%) 33 (25%) 22 (17%) biorXiv, arXiv,
55 Planetary PANGAEA, GitHub,
Sciences DeepBlue, GIRO,
56
57
58
59
60
Online Information Review Page 26 of 38

1
2
3 (n=132, NASA, NOAA,
4 70.2%) NCAR, NSF Arctic
5 Data Centre, Zenodo
6 Engineering 44 (59%) 1 (1%) 11 (15%) 23 (31%) 19 (25%) Elsevier, Zenodo,
7 (n=75, 42%) Figshare, OSF,
8 GitHub, Mendeley
9 Chi-square test X2 = 30.62, X2= 66.35, X2 = 40.22, X2 = 36.03, X2= 55.14,
10 result p = 0.00 p = 0.00 p = 0.00 p = 0.00 p= 0.00
11
12
On
13
14
15 Table III. How researchers first found repositories to share data
16 Subject category Already Search Web search Consult with Consult with
lin
17 (n=previously data aware re3data.org colleagues experts
18
shared)
19
Social Sciences (n=312) 174 (55.8%) 8 (3%) 57 (18%) 106 (34%) 80 (26%)
eI
20
21 Arts and Humanities 66 (42%) 1 (0.6%) 31 (20%) 52 (33%) 37 (24%)
22 (n=157)
Business and Economics 87 (45%) 2 (1%) 31 (16%) 46 (24%) 32 (17%)
nf
23
24 (n=193)
25 Physical Sciences 108 (67.5%) 4 (3%) 17 (11%) 47 (29%) 25 (16%)
or

26 (n=160)
27 Biomedical Sciences 86 (61%) 1 (0.7%) 26 (18%) 51 (36%) 24 (17%)
28 (n=141)
m

29 Medicine (n=65) 26 (41%) 3 (5%) 11 (17%) 24 (38%) 15 (24%)


30 Environmental Sciences 87 (49%) 2 (1%) 27 (15%) 62 (35%) 32 (18%)
at

31 (n=176)
32
Earth and Planetary 63 (48%) 2 (2%) 18 (14%) 52 (39%) 28 (21%)
33
ion

Sciences (n=132)
34
35 Engineering (n=75) 34 (45%) 1 (1%) 22 (29%) 29 (39%) 12 (16%)
36 Chi-square test result X2 = 38.18, X2= 8.13, p X2 = 15.61, X2 = 13.35, X2= 13.05, p
37 p < 0.001 = 0.42 p = 0.048 p = 0.1 = 0.11
38
Re

39
40 Table IV. Future improvements needed to promote data sharing and reuse
41
vie

42 Category Theme Recommendations


43
Data related issues Availability of data Increased data sharing with available code (where
44
applicable)
45
w

46 Data is easily available and accessible with a DOI


47 Handling of data Better data management during research lifecycles
48 Data citation Formalize data citation to ensure datasets and
49 associated articles are linked
50 Usability of data Data quality - reliable data with adequate
51 documentation and in a standard format supported in
52 individual's discipline
53 Publish data paper/ data descriptor articles to enhance
54 the usability of datasets
55
56
57
58
59
60
Page 27 of 38 Online Information Review

1
2
3 Information on usability of data with some case
4 examples (some datasets are produced for a single use)
5
Technological Search system A single trusted portal or federated search system to
6
solutions search across multiple repositories and disciplines
7
8 Enhanced search system with better tagging feature
9 User-friendly data repository interfaces with fast data
10 retrieval (for disciplines producing big data)
11 New search system A recommendation system for datasets
12 feature Availability of data extraction and analysis-support
On
13 tools in the same platform used to access data
14 Alert system to notify when relevant datasets are made
15 available publicly
16 Cultural and policy Awareness and Readiness, awareness, and acceptance within the
lin
17 changes acceptance scientific community to support secondary data
18 analysis and publish in journals
19
Promotion of data and repositories within scientific
eI
20
communities via conferences, webinars, training for
21
22 early career researchers
Incentives Credit data creators/ reward data sharing in a similar
nf
23
24 way to publishing journal articles
25 Create incentives such as data badges, data reuse
or

26 indicators to promote data reuse


27 Increased funding for secondary data analysis projects
28 and to rescue historical data
m

29 Collaboration Form collaborations between data creators and users


30 and their institutions (in some cases data are not
at

31 reusable without contextual explanation)


32 Guidelines and Streamlined IRB rules on how to handle qualitative/
33 documentation medical data to share at the end of research
ion

34
Adequate guidelines on how to anonymize qualitative
35
36
and health data to ensure data privacy
37 Adequate legal and copyright information in place to
38 access and reuse data
Re

39 Reduce bureaucratic application procedure for data


40 access to avoid extended waiting periods
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Online Information Review Page 28 of 38

1
2
3
4
Supplementary material 1: Survey questionnaire
5
6 Data production
7 Q1: Please indicate your research experience in terms of years (A research career would
8 normally start when a PhD starts).
9 1. 0-3 years
10
2. 3-6 years
11
12 3. 6-9 years
4. 10+ years
On
13
14
15 Q2: Please select your main current research area from the options below. If your research area is
16 not included in the list, then please include it under ‘Others’.
lin
17 1. Physical Sciences – a. Astronomy and Astrophysics, b. Organic Chemistry
18
19
2. Biomedical Sciences – a. Neurology, b. Pharmacology
3. Social Sciences – a. Linguistics and Language, b. Education, c. Library and Information
eI
20
21 Science
22 4. Arts and Humanities – a. Visual and Performing Arts, b. Literature and Literary Theory
nf
23 5. Earth and Planetary Sciences – a. Geology, b. Oceanography
24 6. Engineering – a. Aerospace Engineering, b. Biomedical Engineering, c. Environmental
25 Engineering
or

26
27
7. Environmental Science – a. Ecology, b. Pollution
28 8. Medicine – a. Radiology, Nuclear Medicine and Imaging, b. Infectious Diseases
m

29 9. Business and Economics – a. Business and International Management, b. Economics and


30 Econometrics
at

31 10. Other (Please specify)


32
33
Q3: What type of data do you produce in your research? [Please give specific examples, e.g.,
ion

34
35 survey data, type of samples/ observations] (open text)
36
37 Q4: What are the most important formats of data that you produce in your research? [Select all
38 that apply]
Re

39 1. Text
40 2. Images
41
3. Multimedia (Audio/Video)
vie

42
43 4. Software/ code
44 5. Numerical data (Any type of quantitative measurements)
45 6. None/ not sure
w

46 7. Other (Please scpecify)


47
48
49
50
Data sharing
51 Q5. Have you ever shared your research data by posting it on the web (e.g., in a data
52 repository)? [If no, then skip to data reuse question 7]
53 1. Yes,
54 2. No,
55 3. I don’t know/not sure
56
57
58
59
60
Page 29 of 38 Online Information Review

1
2
3
4
5
Q5a (If yes): How do you usually share your data? [Please select all that apply]
6 1. Institutional repository (e.g., university repository)
7 2. Discipline-specific repository (e.g., Inter-university Consortium for Political and Social
8 Research (ICPSR), PANGAEA)
9 3. Interdisciplinary repository (e.g., Zenodo, UCLA Center for Embedded Networked
10 Sensing (CENS))
11
4. A journal supported repository (e.g., PLOS ONE)
12
5. Personal website
On
13
14 6. Other
15 (i) Please specify repositories other than institutional
16 (ii) Other repositories
lin
17
18 Q6. How did you first find a repository to share your data?
19
1. I was already aware of the popular/ relevant repositories in my field
eI
20
21 2. Searched re3data.org (Registry of Research Data Repositories)
22 3. Web search
4. Consulted with colleagues or senior researchers
nf
23
24 5. Consulted with the experts in my institution, e.g., Research data support services
25 6. Others (Please specify)
or

26
27
28
Q7: Which of these factors influence your choice of repositories to share your data from? [Please
m

29 select all that apply]


30 1. Discipline norms,
2. Cost,
at

31
32 3. Ease of use,
33 4. Reputation of the repository,
ion

34
5. Appropriateness for data type,
35
36
6. Data curation services offered,
37 7. None of the above
38 8. Other factors [Please specify]
Re

39
40 Data Reuse
41 Q8: Have you ever reused existing datasets created by other people in your research?
vie

42
1. Yes [Please select the best option that applies]
43
44 a. I use my own primary data but sometimes combine it with data from existing data
45 sources
w

46 b. I never use my own primary data but only ever use data from existing sources
47 (e.g., datasets published in repositories) to answer new research questions
48 2. I only ever use my own primary data for my research
49 3. My research doesn’t use data
50
51
4. I don’t know/ Not sure
52
53 (Those who answer ‘Yes’ proceed to next questions. Skip to question 12 for option 2, 3,4)
54
55 Q9: How do you find datasets to reuse? [check all that apply]
56
57
58
59
60
Online Information Review Page 30 of 38

1
2
3 1. Search disciplinary repositories
4
5
2. Search interdisciplinary repositories
6 3. Web search (e.g., Google Dataset Search)
7 4. Read relevant papers and then check if the authors shared data
8 5. By accident – I noticed the dataset (e.g., in the original paper) and decided to use it.
9 6. I don’t know/can’t remember
10 7. Other [Please specify]
11
12
Q10: For which purposes do you reuse existing data? [Please select all that apply]
On
13
14 1. Ground truthing: calibrate, compare, confirm (Comparative reuse)
15 2. Analyze a single existing dataset to answer novel research questions (Integrative reuse)
16 3. Combine multiple existing datasets to answer novel research questions (Integrative reuse)
lin
17 4. I don’t know/can’t remember
18 5. Other (Please specify)
19
Q10b:(When selected 1-3) Please describe the type of data and how you used it (open text)
eI
20
21
22
Measuring data reuse
nf
23
24 Q11: Would you like to know whether someone else has reused your published data?
25 1. Yes,
or

26 2. No,
27
28
3. Not sure
m

29
30 Q12: Do you ever actively promote your published datasets?
1. Yes,
at

31
32 2. No
33 3. Not sure
ion

34
Q12 (If Yes): How do you promote your datasets?
35
36
1. In classrooms
37 2. Using social media platforms – (i) Twitter, (ii) Facebook, (iii) Blog posts
38 3. Promote within research groups and collaborators’ channels
Re

39 4. Other (specify)
40
41 Incentive
vie

42
We are investigating the type of incentives that can improve the search experience and usage of
43
44 research data. The following questions identify the factors that may assist in such decision-
45 making process.
w

46
47 Q13: When searching for existing datasets in a repository, which of the following factors you
48 consider important for the decision to use one? [Please select all that apply]
49
50
51
1. Proper documentation for the dataset
52 a. Type of data
53 b. Subject of data
54 c. Data collection method
55 d. Other (Please specify)
56
57
58
59
60
Page 31 of 38 Online Information Review

1
2
3 2. The data is open (no application procedure)
4
5
3. Information on the usability of the data
6 4. Evidence that the data is from an associated publication
7 5. The data is in a universal standard format
8 6. Evidence that the data has been reused
9 7. Other (Please specify)
10 8. Not applicable
11
12
Q14: How easy is it for you usually to find relevant datasets for reuse? [Likert scale]
On
13
14 1. Extremely, 2. Very, 3. Neutral, 4. Difficult, 5. Very difficult or often impossible, 6. I
15 don’t know/does not apply
16
lin
17 Q15: What can be improved in current systems to encourage and promote data sharing and
18 reuse? (open-ended)
19
eI
20
21 ------------------------
22
nf
23
24 Following definitions were added as reference:
25
or

26 Research data: Any information that has been collected, observed, generated, or created to
27
28
answer novel research questions and validate research findings. Data may include any form of
m

29 raw data, multimedia files, such as images, audio, video, codes, and software.
30
Dataset: A single file or a collection of data produced as a part of research and its associated
at

31
32 metadata, such as an abstract, license, and any other relevant information that enables
33 understanding and usage of the data in a legal way.
ion

34
35
36
Data repository: A data repository or data archive is a web-based infrastructure that hosts data in
37 a secure manner and provides long term access to data. A repository can be a part of an academic
38 institution or hosted independently (e.g., Zenodo).
Re

39
40 Data reuse: Any secondary use of data by users other than the data collectors.
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Online Information Review Page 32 of 38

1
2
3
4
5 Supplementary material 2: Tables
6
7
8 Table A: Top 10 types of data produced in different subject areas with the term
9 frequencies
10
11
12 Subject Data types
categories
On
13
14 Social Sciences Survey (481), interviews (234), observations (113), qualitative (98),
15 transcripts (52), quantitative (44), audio (38), video (36), recordings (34),
16 experimental (34)
lin
17
18
Arts and Survey (60), observations (43), texts (37), images (33), interviews (33),
19 Humanities video (28), audio (26), qualitative (24), literary (23), historical (22)
Business and Survey (345), secondary (83), interviews (67), observations (31), qualitative
eI
20
21 Economics (26), experimental (18), financial (16), quantitative (16), economic (15),
22 time series (14)
nf
23 Environmental Survey (105), samples (54), observations (48), water (35), field data (30),
24
25
Sciences experimental (29), measurements (23), images (22), soil (22), species (19)
or

26 Earth & Models (40), Observations (32), Samples (25), Survey (25), Measurements
27 Planetary (24), Water (17), Field data (16), Numerical (16), Chemical (15),
28 Sciences Temperature (12)
m

29 Biomedical Survey (45), images (32), behavioral (30), samples (26), experimental (21),
30 Sciences imaging (19), recordings (13), clinical (12), EEG (12), brain (11)
at

31
32
33 Medicine Survey (60), Clinical (35), Observations (26), Images (21), Imaging (11),
ion

34 Qualitative (11), Medical (10), Samples (10), Measures (9), Trials (8)
35
36 Physical Images (50), observations (48), simulations (45), spectra (36), survey (27),
37 Sciences software (24), astronomical (22), numerical (18), catalogues (13), physical
38
Re

39
objects (13)
40 Engineering Survey (33), Experimental (27), Simulations (26), Numerical (16), Images
41 (15), Samples (15), Observations (12), Software (12), Measurements (10),
vie

42 system (8)
43
44
45
w

Table B: Logistic regression of data sharing outcomes (n=3,098)


46
47
48 Predictor Estimate (β) Std. Error z-value p-value
49 Intercept -0.15238 0.18822 -0.810 0.4182
50 Biomedical Sciences 0.10910 0.17032 0.641 0.5218
51 Business and -0.69173 0.14524 -4.763 1.91e-06 ****
52
53
Economics
54 Earth and Planetary 0.89144 0.20112 4.432 9.32e-06 ****
55 Sciences
56 Engineering -0.25182 0.19368 -1.300 0.1935
57
58
59
60
Page 33 of 38 Online Information Review

1
2
3 Environmental 0.27336 0.16406 1.666 0.0957 *
4
5
Sciences
6 Medicine -0.46562 0.19683 -2.366 0.0180 **
7 Other -0.18376 0.17283 -1.063 0.2877
8 Physical Sciences 1.07162 0.19778 5.418 6.02e-08 ****
9 Social Sciences -0.26114 0.13775 -1.896 0.0580 *
10
11
10+ years 0.30144 0.16074 1.875 0.0607 .
12 3-6 years -0.08549 0.18425 -0.464 0.6426
6-9 years 0.11700 0.17991 0.650 0.5155
On
13
14 Overall model evaluation
15 Likelihood ratio test X2 = 15.225 df = 13 0.00163**
16
Significance codes: 0 ‘****’; 0.001 ‘***’; 0.01 ‘**’; 0.05 ‘*’
lin
17
18
19
eI
20 Table C: Factors that influence choice of repositories in different subject areas
21
22 Subject category Disciplinar Cost East of Reputatio Appropria Data
nf
23
24
(n=previously y norms use n of a teness for curation
25 data shared) repository data type services
or

26 offered
27 Social Sciences 145 104 178 156 124 48 (15%)
28 (n=312) (46.5%) (33.3%) (57.1%) (50%) (39.7%)
m

29 Arts and 67 (43%) 54 (34%) 86 (55%) 73 (46%) 57 (36%) 18 (11%)


30
Humanities
at

31
32 (n=157)
33 Business and 79 (41%) 56 (29%) 87 (45%) 79 (41%) 54 (28%) 17 (9%)
ion

34 Economics
35 (n=193)
36 Physical 66 (41%) 65 (41%) 98 (61%) 74 (46%) 73 (46%) 27 (17%)
37
38
Sciences
Re

39 (n=160)
40 Biomedical 60 (43%) 61 (43%) 79 (56%) 77 (55%) 78 (55%) 23 (16%)
41 Sciences
vie

42 (n=141)
43 Medicine 21 (32%) 23 (35%) 28 (43%) 35 (54%) 27 (42%) 9 (14%)
44
(n=65)
45
w

46 Environmental 63 (36%) 66 (38%) 87 (49%) 73 (41%) 71 (40%) 27 (15%)


47 Sciences
48 (n=176)
49 Earth and 49 (37%) 57 (43%) 68 (52%) 50 (38%) 53 (40%) 20 (15%)
50 Planetary
51
Sciences
52
53 (n=132)
54 Engineering 24 (32%) 26 (35%) 41 (55%) 41 (55%) 27 (36%) 5 (7%)
55 (n=75)
56
57
58
59
60
Online Information Review Page 34 of 38

1
2
3
Chi-square test X2 = X2= X2 = X2 = X2= 31.2, X2=
4
5 result 12.87, p = 18.33, p = 16.38, p = 17.27, p = p < 0.001 11.59, p =
6 0.16 0.03 0.06 0.04 0.24
7
8
9 Table D: Logistic regression of data reuse outcomes (n=3,095)
10
11
12 Predictor Estimate (β) Std. Error z-value p-value
Intercept -0.08281 0.11754 -0.704 0.48114
On
13
14 Biomedical Sciences 0.12203 0.17178 0.710 0.47748
15 Business and 0.40240 0.14505 2.774 0.00553 ***
16 Economics
lin
17
18
Earth and Planetary 1.76788 0.23697 7.460 8.62e-14 ****
19 Sciences
Engineering 0.56598 0.19368 2.891 0.00383 ***
eI
20
21 Environmental 0.54451 0.16520 3.296 0.00098 ****
22 Sciences
nf
23
Medicine -0.08834 0.19592 -0.451 0.65206
24
25 Other 0.14091 0.17444 0.808 0.41919
or

26 Physical Sciences 1.71484 0.21903 7.829 4.91e-15 ****


27 Social Sciences 0.04842 0.13982 0.346 0.72912
28 Overall model evaluation
m

29
Likelihood ratio test X2 = 5.96 df = 10 0.1136
30
Significance codes: 0 ‘****’; 0.001 ‘***’; 0.01 ‘**’; 0.05 ‘*’
at

31
32
33
ion

34 Table E: Comparison between data sharing and reuse across different subject areas
35
36
Subject category Previously Previously Chi-square test results
37
38 shared data reused data (Data sharing vs reuse)
Re

39 Social Sciences 312 (43%) 343 (47%) X2 = 34.594, p < 0.001


40 Arts and Humanities 157 (47%) 139 (42%) X2 = 7.839, p = 0.02
41 Business and Economics 193 (33%) 329 (56%) X2 = 19.175, p < 0.001
vie

42 Physical Sciences 160 (73%) 179 (81%) X2 = 13.559, p < 0.001


43
44 Biomedical Sciences 141 (53%) 130 (49%) X2 = 35.29, p < 0.001
45 Medicine 65 (38%) 75 (44%) X2 = 0.008, p = 0.93
w

46 Environmental Sciences 176 (54%) 192 (59%) X2 = 10.737, p = 0.001


47 Earth and Planetary 132 (70%) 151 (80%) X2 = 4.813, p = 0.03
48 Sciences
49
50
Engineering 75 (42%) 107 (60%) X2 = 2.082, p = 0.149
51
52
53
54
55
56
57
58
59
60
Page 35 of 38 Online Information Review

1
2
3 Table F: How researchers find datasets to reuse in different subject areas
4
5
6 Subject category Search Search inter- Web search Read By accident
7 (n=previously disciplinary disciplinary (e.g., Google relevant
8 reused data) repositories repositories Dataset papers
9 Search)
10 Social Sciences 148 (43.1%) 96 (28%) 164 (47.8%) 177 (51.6%) 66 (19%)
11
12
(n=343)
Arts and 66 (47%) 49 (35%) 84 (60%) 81 (58%) 38 (27%)
On
13
14 Humanities
15 (n=139)
16 Business and 142 (43.2%) 76 (23%) 170 (51.7%) 181 (55%) 54 (16%)
lin
17 Economics
18
(n=329)
19
Physical Sciences 118 (69.4%) 22 (13%) 62 (36%) 142 (83.5%) 30 (18%)
eI
20
21 (n=170)
22 Biomedical 58 (45%) 35 (27%) 44 (34%) 79 (61%) 28 (22%)
nf
23 Sciences
24 (n=130)
25
Medicine 28 (37%) 13 (17%) 23 (31%) 39 (52%) 9 (12%)
or

26
27 (n=75)
28 Environmental 69 (36%) 40 (21%) 77 (40%) 114 (59.4%) 24 (13%)
m

29 Sciences (n=192)
30 Earth and 75 (50%) 33 (22%) 71 (47%) 115 (76.2%) 17 (11%)
at

31 Planetary
32
Sciences (n=151)
33
ion

34 Engineering 39 (36%) 25 (23%) 67 (63%) 73 (68%) 17 (16%)


35 (n=107)
36 Chi-square test X2 = 46.94, X2= 31.38, p X2 = 55.41, X2 = 63.12, X2= 21.36,
37 result p < 0.001 < 0.001 p < 0.001 p < 0.001 p= 0.01
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Online Information Review Page 36 of 38

1
2
3
4
Supplementary material 3: Figures
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28
m

29
30
at

31
32
33
ion

34
35
36
37 Figure A. Data sharing in groups with different research experiences across subject areas
38 (labels on bars represent numbers of responses)
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 37 of 38 Online Information Review

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27 Figure B: Data sharing among those who only use own primary data (labels on bars represent
28 numbers of responses)
m

29
30
at

31
32
33
ion

34
35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56 Figure C: Ease of finding datasets to reuse by subject areas
57
58
59
60
Online Information Review Page 38 of 38

1
2
3
4
5
6
7
8
9
10
11
12
On
13
14
15
16
lin
17
18
19
eI
20
21
22
nf
23
24
25
or

26
27
28
m

29
Figure D: Ease of finding datasets to reuse by research experience
30
at

31
32
33
ion

34
35
36
37
38
Re

39
40
41
vie

42
43
44
45
w

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

You might also like