Chatgpt
Chatgpt
3390/fi15100336)
1 Research Centre of the Slovenian Academy of Sciences and Arts, 1000 Ljubljana, Slovenia;
[email protected]
* Correspondence: [email protected]
Abstract: Historical emphasis on writing mastery has shifted with advances in generative AI, espe-
cially in scientific writing. This study analysed six AI chatbots for scholarly writing in humanities
and archaeology. Using methods that assessed factual correctness and scientific contribution,
ChatGPT-4 showed the highest quantitative accuracy, closely followed by ChatGPT-3.5, Bing, and
Bard. However, Claude 2 and Aria scored considerably lower. Qualitatively, all AIs exhibited pro-
ficiency in merging existing knowledge, but none produced original scientific content. Interestingly,
our findings suggest ChatGPT-4 might represent a plateau in large language model size. This re-
search emphasizes the unique, intricate nature of human research, suggesting that AI's emulation
of human originality in scientific writing is challenging. As of 2023, while AI has transformed con-
tent generation, it struggles with original contributions in humanities. This may change as AI chat-
bots continue to evolve into LLM-powered software.
Keywords: generative AI, large language model (LLM), ChatGPT, Bard, Bing, scientific writing,
digital humanities, archaeology
5100336 curacy, but is unable to "pass an undergraduate exam" in hu- GPT-3.5 -18 46% 32%
manities. +36%
Bing -21 23% 22%
Average
GPT-3.5 6%
Graphic extract
The race for parameters: LLMs grow exponentially, but after GPT-3 "jump" the content is only marginally
improved.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 3 of 30
In recent human history, the ability to write well was considered essential to human
progress and professionalism. Creative expression was traditionally seen as a defining
characteristic of humanity and the pinnacle of human achievement. This view is still re-
flected in the way universities cultivate their students' writing skills. Until recently, novel
artifacts such as literary works, scientific texts, art, and music were difficult to create and
only attainable by talented experts [1–3].
We must reckon with that changing!
The current generation of openly available generative AI has rightly been called AI's
great inflection point. Generative AI is shaping to become a general purpose technology
[4], a "fundamental, horizontal technology that will touch everything in our lives" (Tim
Cook, Apple CEO, speaking at Università Degli Studi di Napoli Federico II in Naples,
Italy, 29.9.2022).
The most recent and disruptive advance in generative AI has been a leap-frog devel-
opment in the field of large language models (hereafter LLMs). LLMs are based on deep
neural networks and self-supervised learning that have been around for decades, but the
amount of data that the current models were trained with lead to an unprecedented and,
to some extent, unexpected performance leap. Current LLMs belong to foundation models
that are pre-trained on a large datasets using self-supervision at scale and then adapted to
a wide range of downstream tasks. This centralisation is crucial for harnessing the enor-
mous computing power required to create them, but it also replicates all potential prob-
lems such as security risks and biases [3,5].
Currently, the most powerful LLMs are generative pretrained transformers (hereaf-
ter GPTs), which are based on the Transformer, a type of neural network architecture. The
Transformer uses a mechanism called attention to weigh the influence of different input
words on each output word. As a result, instead of processing words in a sentence se-
quentially, it constructs relationships between all words in a sentence at once [6]. Addi-
tional key advantage of GPTs over earlier models is that the learning process can be par-
allelised and the models can be trained on an unprecedented scale.
The scale of an LLM depends on the size of ingested datasets, the amount of training
compute, and the number of parameters it can support [7,8]. Parameters are numerical
values that determine how a neural network processes and generates natural language.
The more parameters a model has, the more data it can learn from and the more complex
tasks it can perform. GPT-3 from 2020, for example, supports 175 billion parameters and
has been trained on 45 TB of text data, including almost the entire public web [9]. PaLM,
the 2022 LLM from Google Research, is a 540-billion-parameter GPT model trained with
the Pathways system [10], and GPT-4 launched in 2023 supports an estimated 1.8 trillion
parameters [11].
That is 1,800,000,000,000 parameters with which the model interacts to generate each
individual token (a word or a part of a word). Multiplied by ChatGPT's 100,000,000
monthly users each processing just one prompt with 100 tokens daily brings us to a stag-
gering 18,000,000,000,000,000,000,000 or 18 * 1021 computations, which explains the daily
costs of running ChatGPT at 700,000 $ [12]. One can only imagine the environmental costs
of the operation.
When given an input or prompt, GPT LLMs are able to predict the next word in the
context of all previous content and can thus generate creative outputs such as complete
sentences and answers or even essays and poems. In essence, they generate a pattern of
words based on the word patterns they have been trained by applying attention to context
and a controlled amount of randomness. But because they have been trained on such a
large amount of text, the quality of the text is such that GPT-4, for example, has been able
to pass or even ace some standardised academic and professional tests [13].
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 4 of 30
And how does academia and the way we create and write research and scholarly
articles fit in? AI chatbots have the potential to revolutionise academia and scholarly pub-
lishing [49]. In fact, it seems that academia will be among the first industries to go through
this process, since academics and students represented two of the top three occupational
groups among the early adopters of ChatGPT [50].
AI chatbots—most of the attention to date was directed to ChatGPT—have already
been recognised as a powerful tool for scientific writing that can help organise material,
proofread, draft and generate summaries [51–53]. The scientific community is also ac-
tively testing their ability to generate entire papers with minimal human input. The con-
sensus is that AI chatbots are able to create scientific essays and reports of scientific ex-
periments that appear credible but are a combination of true and entirely fabricated infor-
mation [49,51,54–63].
Unfortunately, there is a public perception that ChatGPT is already capable of gen-
erating academic papers that get peer-reviewed and published, e.g., [64,65], which may
add to public scepticism about science. This is not the case. For example, the ArXiv repos-
itory of pre-prints (https://arxiv.org), the AI community's most popular publication fo-
rum, shows no results for ChatGPT (in any variation) as a (co-)author (tested on 16 August
2023). We are aware of a single such attempt that has attracted a lot of public attention,
but did not get peer reviewed and has no notable scientific value [55].
Regardless, there is a clear consensus among researchers that AI chatbots will be
widely adopted in scientific writing in the near future, and it is thus crucial to reach an
accord on how to regulate their use [63,66].
However, to successfully discuss regulation, a better understanding of AI chatbots is
needed. This requires more systematic testing of their capabilities, which will provide a
more robust understanding of the strengths and weaknesses of the technology. This pro-
cess has been likened to the processes drugs go through to gain approval. Assessment of
AI systems could allow them to be deemed safe for certain applications and explain to
users where they might fail [66].
To this end, there is a growing body of testing and benchmarking of generative AI
models, e.g., [13,67,68]. The standard methodology in machine learning is to evaluate the
system against a set of standard benchmark datasets, ensuring that these are independent
of the training data and span a range of tasks and domains. This strategy aims to distin-
guish real learning from mere memorisation. However, this approach is not ideally suited
to our needs and to the study of LLM-based AI chatbots for three reasons. First, only the
creators of proprietary LLMs have access to all the training details needed for detailed
benchmark results. Second, one of the key aspects of the intelligence of LLMs is their gen-
erality and ability to perform tasks that go beyond the typical scope of narrow AI systems.
Metric of evaluation benchmarks designed for such generative or interactive tasks remain
a challenge. The third and perhaps most important reason is that in this article we are
interested in how well AI chatbots perform in human tasks. To evaluate this, methods
closer to traditional psychology leveraging human creativity and curiosity are needed
[69].
Such an approach has already been taken, and there are several evaluations of the
performance of AI chatbots in scientific writing, but most of them focus on medicine and
similar fields [49,51,56–63,70–72]. We are not aware of any such test designed specifically
for humanities. Therefore, more tests and, we believe, more types of tests on the perfor-
mance of AI chatbots in scientific writing are urgently needed.
With this in mind, the aim of this article was to design and conduct a test of AI chat-
bots' abilities in scientific writing in the humanities. First, we were interested in their abil-
ity to generate correct answers to complex scientific questions. Second, we tested their
capacity to generate original scientific contributions in humanities research. Since AI chat-
bots are developing at a staggering pace, our results apply to the state of affairs in the
third quarter od 2023 (23Q3).
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 7 of 30
2.1 AI Chatbots
This section describes the AI chatbots that were tested. As they are all proprietary
commercial products, there is often not much detail available, let alone in the form of peer-
reviewed articles. Our descriptions are therefore based on various sources such as blogs,
social media and help pages. We do not go into the specifics of the underlying LLMs, as
this is a specialised topic on which there is an extensive literature, e.g., [73,74].
First, the criteria for selecting the six AI chatbots should be elucidated. As mentioned
earlier, most of the previous studies have only analysed ChatGPT-3.5. Its inclusion, as well
as that of its successor ChatGPT-4, was therefore a given. The Bing chatbot was included
because it was arguably the most advanced freely available AI chatbot at the time of the
test. Bard was included because it is seen by many as the only challenger to ChatGPT’s
hegemony. We also wanted to include two chatbots that use an application programming
interface (hereafter API) to access LLM. APIs are the only available means for “smaller”
developers, i.e., anyone other than OpenAI/Microsoft or Google, to access state-of-the-art
LLMs. We chose Aria and Claude 2, which use APIs from OpenAI and Google, respec-
tively. If Aria and Claude 2 performed on par with ChatGPT and Bard, it would signal
that generative AI technology is indeed being developed openly “for all humanity”, and
vice versa. The two plugins, ChatGPT with Bing and ScholarAI, were chosen from the
rapidly growing selection as the two most relevant to the task of scientific writing. Baidu’s
ERNIE bot (https://yiyan.baidu.com), on the other hand, was not considered because at
the time, it was only available with a Chinese interface and required a Baidu login and the
Baidu app (also only available in Chinese) to use.
ChatGPT-3.5, sometimes called ChatGPT or GPT-3.5, is an AI chatbot offered as a free
service by OpenAI (https://chat.openai.com; accessed on 11 October 2023). It was fine-
tuned from a model in the GPT-3.5 series, more specifically gpt-3.5-turbo, in 2022. This
autoregressive language model has the same number of parameters as the largest model
from the 2020 GPT-3 series, namely 175 billion. The model was trained using the same
methods as InstructGPT-3, but with slight differences in the data collection setup and by
using supervised fine-tuning. To predict the next token in the text, it was pre-trained on
approximately half a trillion words and improved by task-specific fine-tuning datasets
with thousands or tens of thousands of examples that primarily used reinforcement learn-
ing from human feedback [75]. ChatGPT-3.5 achieved strong performance on many NLP
datasets, including translation and question answering, as well as on several tasks requir-
ing on-the-fly reasoning or domain adaptations. Its breakthrough results paved the way
for a new generation of LLMs by demonstrating that scaling language models exponen-
tially increases performance. GPT-3.5 is also available as an API [9]. It may come as a sur-
prise that the core team that developed ChatGPT initially consisted of only approximately
100+ experts, although crowdworkers were also involved as a so-called Short term align-
ment team [76].
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 8 of 30
Fig. 2. Screengrab of the ScholarAI plugin for ChatGPT plus demonstrating its inner workings.
ChatGPT-4 with ScholarAI plugin (henceforth ScholarAI) was also under active de-
velopment during our test. No documentation was available, but the plugin provided
metadata about how it processed the prompt (Appendix A: L. 588, footnote 24; Fig. 2) and
ScholarAI provided additional information (personal communication with Lakshya Bak-
shi, CTO and Co-founder of ScholarAI). First, it extracted keywords from the prompt,
which were then recombined into a query. Based on this query, it returned the top results
from the ScholarAI database of "40M+ peer-reviewed papers". The user then either con-
firmed the selection or requested further search. When the user was satisfied with the
selection of articles, ScholarAI fed the ChatGPT-4 LLM with content. ScholarAI ensures
that the LLM receives the source data and tries to get ChatGPT to discuss only the content
provided to it, but this is a work in progress.
The Bing Chatbot is available either as a Bing Chat or a Bing Compose service and can
be used free of charge in the Microsoft Edge browser and in the Bing app
(https://www.bing.com/new). The Bing Chatbot is based on proprietary technology called
Prometheus, an AI model that combines Bing's search engine with ChatGPT-4. When
prompted by a user, it iteratively generates a series of internal queries through a compo-
nent called Bing Orchestrator. By selecting the relevant internal queries and leveraging
the corresponding Bing search results, the model receives up-to-date information so that
it can answer topical questions and reduce inaccuracies. For each search, it ingests about
128,000 words from Bing results before generating a response for the user. In the final
step, Prometheus adds relevant Bing search responses and is also able to integrate cita-
tions into the generated content. This is how Prometheus grounds ChatGPT-4. However,
for prompts that the system considers to be simple, it generates responses using
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 9 of 30
Bing
Orchestrator
Prometheus Prompt
Generated
content
Bing
Index, Ranking GPT-4
& Answers
Chat, Aria), and tools powered by ChatGPT (Bing Compose, ScholarAI). Despite these
differences, all except Bing Chat were able to comply with our prompts and were therefore
suitable for our test.
It must be emphasized that the tested AI chatbots were designed as general purpose
AI chatbots capable of human-like conversation, and not to "do science". Such down-
stream applications can be expected in the near future.
The case study chosen for testing AI chatbots was the migration of the South Slavs
with a follow-up prompt on the Alpine Slavs, a subgroup of the South Slavs. The authors'
thematic explanation of the case study can be found in the appendix (Appendix A: L. 383-
439 and 799-855).
The migration of the Slavs, including the South Slavs, has been a research topic for
almost a century. Notwithstanding this, the rapid spread of the Slavic language in the
second half of the first millennium CE remains a controversial topic [84–92]. It is part of
the grand narrative of the “dawn of European civilisation”. The Europe we live in today
emerged in the centuries after the decline of the Roman Empire and was importantly
shaped by the ancient Slavs, among others.
The current scientific debate on this issue revolves around the gene pool landscape
on the one hand and the so-called ethnic landscape on the other. Until the 1950s, migration
was assumed to be the main process of change, e.g., [93] and peoples and tribes were
understood as caroming around the continent like culture-bearing billiard balls [94]. It
was during this period that the term Migration period was coined. Since the 1960s, the
understanding of ethnic identity has shifted to the concept of dispersed identities, which
states that people fluidly adopt different identities as changing social circumstances dic-
tate, e.g., [95]. Today, most assume that hardly any physical migration has taken place,
but rather that ideas and knowledge have been passed on, e.g., [96]. However, recent re-
search in the field of DNA, ancient DNA, and deep data analysis supported by machine
learning is providing increasingly compelling evidence that, at least in the case of the
South Slavs, physical migrations of people and peoples took place [92].
Our experiment was based on asking generative AI models two specific scientific
questions. We designed two text prompts that were precise enough to produce the desired
result without follow-up prompts.
The selected case study spans several academic fields, one of which can be considered
a natural science (DNA analysis), one a humanities science (historiography) and two a
humanities science with links to natural science (archaeology, linguistics). In the USA,
archaeology is also considered a social science in certain contexts.
The two text prompts were:
• Q1: What is scientific explanation for migration of South Slavs in Early Middle Ages?
Write 500 words using formal language and provide references where possible.
• Q2: What is scientific explanation for the settlement of Alpine Slavs in Early Middle
Ages? Write 500 words using formal language and provide references where possi-
ble.
Q1. The first prompt is a complex scientific question on the subject of Early Medieval
studies. To discuss it requires knowledge of archaeology, historiography, linguistics, and
DNA studies. However, the topic is relatively broad. Spatially, it covers an entire Euro-
pean region, the Balkans. Its scientific background, the migration of the Slavs, is relevant
to over 200 million modern Europeans. In short, although it is not one of the foremost
topics in the Humanities or even for Early Medieval scholars, there are numerous
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 11 of 30
researchers working on this topic and dozens of relevant scientific papers are published
every year.
Q2. At first glance, the second prompt is almost exactly the same as the first, except
for the target group, the Alpine Slavs. However, the added complexity of this prompt
comes from the fact that it addresses a very narrow and specific topic. In fact, the only
scholarly content on this topic is either more than half a century old, e.g., [97] and not
available online, or it comes from a 2022 paper [92], which is too recent to be included in
the datasets used for training ChatGPTs.
However, the key term "Alpine Slavs" is very specific. In response to the search term
"settlement of the Alpine Slavs", the search engines Bing, Google and DuckDuckGo as
well as Google Scholar return the mentioned article as the top hit after Wikipedia or the
Encyclopaedia Britannica.
We therefore expected ChatGPT to respond well to Q1 but have more problems with
Q2. On the other hand, AI chatbots with access to the current online content (Bing chatbot,
GPT w/Bing) were expected to generate a high quality content for Q2 by sourcing it di-
rectly from the relevant article.
Our scientific questions are therefore so-called one-shot prompts, where the user pro-
vides the AI chatbot a single example of the desired task and then asks it to perform a
similar task. It is well known that GPT LLMs are "few-shot learners" (BrownEt2020), i.e.
they are much better at generating content when given more examples of the expected
content. When using one-shot prompts, multiple refinement prompts are expected to im-
prove results, e.g., [98].
However, few-shot prompts were not suitable for our testing purpose because they
did not mimic a scientific question and a series of prompts would reduce the value of a
direct comparison between different AI chatbots and introduce subjectivity. Therefore,
our one-shot prompts were optimised for comparison rather than for generating the best
possible content.
There are several existing studies similar to ours, but they refer to other fields of sci-
ence. Regardless, a brief overview of the methods used is in order. Altmäe and colleagues
[51], for example, provided prompts and content generated by ChatGPT and then dis-
cussed the quality of the content. Petiška [56] focused only on references and analysed
factors such as the number of citations, the date of publication, and the journal in which
the paper was published. Májovský and colleagues [57] posed questions and prompts to
the model and refined them iteratively to produce a complete article, which was then re-
viewed by relevant experts for accuracy and coherence. Buholayka and colleagues [59]
tasked ChatGPT with writing a case report based on a draft report and evaluated its per-
formance by comparing it to a case report written by human experts.
Our approach is similar to that of Májovský and colleagues [57], but there are three
significant differences. First, we did not use iterative refinement to ensure comparability
between different AI chatbots. Second, we did not generate a complete paper. Third, our
review was both qualitative and quantitative, not just qualitative. This was achieved by
tagging the content. The aim was to provide what is, to our knowledge, the first quantita-
tive comparison of different AI chatbots on the subject of scientific writing.
The content generated by each of the tested AI chatbots was tagged for quantitative
accuracy and qualitative precision.
Quantitative accuracy describes how accurate the content is in the opinion of the hu-
man experts (the authors). It was gradated into five classes:
• Correct: Factually correct and on par with the content created by human experts.
• Inadequate: Factually correct but falls short of the content created by human experts.
• Unverifiable: The statement cannot be verified or there is no expert consensus.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 12 of 30
• w/ Errors: Mostly factually correct, but with important errors that change the mean-
ing.
• Incorrect: Factually incorrect.
Quantitative accuracy is the measurement most commonly applied to AI-generated
content, providing a quantifiable index of how trustworthy the tested AI chatbot is for the
task at hand. From the perspective of academia, it can be understood as similar to the
grading of a student's work. The questions in this case study are based on the grading of
a senior undergraduate student attending a class on Early Medieval Archaeology.
Qualitative precision describes how "good" the content is in the opinion of a human
expert. In other words, how it compares to a human-generated response in the context of
scientific writing. It was gradated into four classes:
• Original scientific contribution.
• Derivative scientific contribution.
• Generic content not directly related to the question.
• Incorrect: Factually incorrect, containing errors that change the meaning, or disputed
(the last three classes of above quantitative tagging combined).
Qualitative precision, as we have defined it, is specific to testing AI-generated content
for the purpose of scientific writing and, to our knowledge, has not yet been used. The
reason for the insufficient development of such indices is mainly that the current genera-
tion of AI chatbots is not expected to generate original scientific content. However, in the
near future AGI will be expected to produce original scientific content. The qualitative
precision index was therefore developed with an eye to the future as a measure of how
close the tested AI chatbots are to AGI.
From an academic perspective, qualitative precision can be understood in a similar
way to peer review of a scientific paper. As with any peer review, e.g., [99], it is a combi-
nation of objective and subjective evaluation.
It should be mentioned in passing that in the humanities an article usually consists
of both derivative and original content. Typically, introduction and method are predomi-
nantly derivative, while results, discussion, and conclusion are predominantly original.
The ratio of derivative to original content varies widely, depending on the discipline,
topic, type of article, etc. Thus, the expected "perfect score" is not 100%, but in the order
of 50+% of the original scientific contribution. To establish the baseline for our case study,
we have tagged the responses generated by human experts (see Results section).
Both quantitative accuracy and qualitative precision tagging were performed by the
two co-authors. To mimic the standard process used in grading students or reviewing
scholarly articles we each made our own assessment and consulted to arrive at a unani-
mous decision. The results are shown in the appendices (qualitative accuracy: Appendix
A; quantitative accuracy: Appendix B). Both co-authors have experience with both tasks
in a professional capacity and both are experts on the topic, e.g., [92,100,101].
The tagged content was quantified and the results are discussed in the next section.
Given the small amount of data generated by tagging, the observational method of anal-
ysis was sufficient. To sort the different AI chatbots according to the result, we calculated
the accuracy score using the following formula: Correct% - (2 x Incorrect%). The higher the
score, the better. Students would need a positive score for a passing grade.
As the amount of tagging data increases in future projects, more sophisticated statis-
tical methods will be used.
There are two limitations to our method. First, the case study is limited to a single
(interdisciplinary) field in humanities and cannot address the differences between, for ex-
ample, philosophy and geography. Second, only a limited number of human experts were
involved.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 13 of 30
For better results, we plan to extend this study to a series of field-specific studies and
involve more human experts. However, in the current structure of public science, it takes
years to accomplish such an organisational feat. Therefore, this article can be understood
as an interim measure taken in response to the incredible speed at which AI chatbots are
developing.
3. Results
The quantitative accuracy tagging was intended to objectively determine how correct
the answers generated by the AI chatbots were (Fig. 4; Appendix A).
The highest accuracy score was achieved by ChatGPT-4 which also generated the
highest percentage of correct content. On average, about half of the content provided was
correct and about 1/5 was incorrect, with errors or unverifiable. However, as expected (see
section 2.3), all of the incorrect content belonged to Q2 for which it could not source the
relevant content from the 2022 article. Considering the complexity of the questions, the
results were impressive, but far below what would be expected of, for example, a senior
undergraduate student.
GPT-4 w/ GPT-4 w/
* Score is calculated as: Correct% - (2 x Incorrect%). Higher is better, students would be
GPT-4 GPT-3.5 Bing Bing Bard ScholarAI Claude 2 Aria
expected to have a positive score.
Score -5 -18 -21 -31 -32 -58 -75 -80
100%
50%
0%
Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
Correct Inadequate Not verifiable w/ Mistakes Incorrect
Fig. 4. Quantitative test results, generalized accuracy score above and detailed quantitative accu-
racy below.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 14 of 30
The main focus of our article was on whether the tested AI chatbots are able to gen-
erate original scientific contribution. The short and expected answer is no. A more detailed
answer can be found below (Fig. 5; Appendix B).
As mentioned earlier, human-generated scientific articles in the humanities are typi-
cally a combination of derivative and original scientific contributions. In our case study,
the human-generated content included ½ of the original scientific contribution for Q1 and
¾ for Q2.
The AI chatbots did not reach this level by far. The only discernible original scientific
contribution was at 11% generated by the ChatGPT. ChatGPT-4 aptly inferred in Q1 that
the migration of the South Slavs was not a singular event (Appendix B: L. 91—93) and its
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 15 of 30
introductory paragraph in Q2 was extremely apt, profound, and on the cutting edge of
science (Appendix B: L. 478—483). Similarly, ChatGPT-3.5 summarised the settlement of
the Alpine Slavs very astutely, if repetitively (Appendix B: L. 458—461 and 646—467).
Claude 2 correctly pointed out that the fact that Christian missionaries had to preach
in Slavic languages proves the demographic dominance of the Slavs. This is an established
historical fact, but not commonly referred to in the context of migration, and was therefore
tagged as an original scientific contribution.
ScholarAI has generated what at first sight appeared to be very exciting original sci-
entific content. It has established a direct link between the process of settlement of the
Alpine Slavs and their cultural practices and beliefs (Appendix B: L. 578—580 and 585—
588). The discussion of beliefs in the context of the migrations is far from the norm and, to
our knowledge, has never been brought forward for the migrations of the Alpine Slavs.
However, ScholarAI's argumentation was flawed because it was based on irrelevant
knowledge pertaining to the Baltic Slavs [102] dwelling about 1000 km northeast of the
Alpine Slavs. Interestingly, the same hypothesis could have been argued with another
freely available scientific text [103], but this is a book rather than an article and is therefore
not in the ScholarAI database.
Other AI chatbots have not generate original scientific contributions.
In conclusion, ChatGPT-4 was once again the best among the AI chatbots tested, but
not on the same scale as the human-generated
Original scientificcontent (Fig. 5).
contribution
GPT-4 11%
GPT-3.5 6%
Claude 2 2%
64%
Human
Bing 0% generated
Bard 0%
Aria 0%
50%
*
0%
Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2* Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
Fig. 5. Qualitative test results, original scientific contribution above and detailed precision below (*
false argumentation).
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 16 of 30
The most commonly cited shortcomings of AI chatbots are reasoning errors, halluci-
nations, and biases, e.g., [9,13]. The terms themselves are not the best choice, because they
inappropriately anthropomorphize AI chatbots. However, they are widely used and we
have used them for clarity.
In the quantitative analysis above, these shortcomings were interchangeably tagged
as 'incorrect,' 'with errors,' or 'unverifiable' (Appendix A). Here we address them qualita-
tively, on a case-by-case basis.
Reasoning errors, also termed lack of on-the-fly reasoning or lack of critical thinking,
are the kind of incorrect content in causal statements where cause and effect do not match.
Critical thinking is one of the most important qualities for humanities scholars and
knowledge workers in general. However, AI chatbots based on LLMs are not designed
for this task.
The most obvious example of a reasoning error in our case study was the content
generated by ChatGPT-4 and Bard, which causally linked the migration of Slavs into the
Alps to the period of climate cooling (Appendix A: L. 512—515 and 703—704). Similarly,
ChatGPT-3.5 linked the settlement of the Alpine areas to "fertile lands" (Appendix A: L.
456—459). For most Europeans and most people with formal education worldwide, the
Alps are synonymous with mountains and hence with cold climate and harsh agricultural
conditions. Most people would therefore reason that a cooling climate and the search for
fertile land would not expediate the migration into the Alps, but rather impede it.
Another example of a reasoning error was that almost all tested AI chatbots listed the
decline of the (Western) Roman Empire as one of the attractors for the migration of South
Slavs to the Balkans (Western Roman Empire: Appendix A, L. 116—117, 135—136, 144—
145, 278—281, 322—323, 454—455, 683—684, 736—738; Roman Empire: Appendix A, L.
124—125, 506—507, 516—517, 548—550, 584—585, 593—595). However, we learn in the
high school history classes that the (Western) Roman Empire preceded the migration of
the South Slavs by at least a century. In fact, the Byzantine Empire was the declining su-
perpower that created the power vacuum for the immigration of the South Slavs to the
Balkans.
The fact that both LaMDA (bard) and GPT-4 (ChatGPT-4) generated almost identical
incorrect content suggests that such behaviour is inherent in the current generation of GPT
LLMs.
The underlying issue on the lack of critical thinking was that none of the tested AI
chatbots made any attempt to critically compare different sources. For example, the most
important component of a human-generated response to Q1 was: "Currently, there are
three main hypotheses..." (Appendix A: L. 394), which was continued by comparing sev-
eral different sources. No such attempt was detected in the content generated by the AI
chatbot. Anecdotally, the majority of randomly selected human users were able to distin-
guish the critical thinking of the human expert from the content generated by ChatGPT,
based solely on the 24-character snippet "There are 3 hypotheses…" without further con-
text (Fig. 6).
Critical comparison of different sources is typical and vital not just in any kind of
scientific reasoning, but also in everyday life. The one-sided approach of the tested AI
chatbots amplifies "the loudest voice" (the highest ranking search engine result), which is
not only bad science but also a grave danger for balanced news reporting, democracy,
minority rights, etc.
Hallucinations or confabulations of AI chatbots are confident responses by an AI that
are not justified by its training data. This is not typical of AI systems in general, but is
relatively common in LLMs, as the pre-training is unsupervised [104].
The most obvious hallucinations in our case study were invented references
(ChatGPT-4, Appendix C: L. 36; ChatGPT-3.5, Appendix A: L. 495-498; ScholarAI, Appen-
dix A: L. 189—197). Similarly, attempts to inline citations by Bing (Appendix A: L. 226—
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 17 of 30
Fig. 6. Twitter (now X) poll asking human users to differentiate between ChatGPT and human-
generated content with almost no context. Most respondents answered correctly.
Biases are often exhibited by AI chatbots. They are based on training data, but accord-
ing to recent research, they can be amplified beyond existing perceptions in society. Biases
generated by AI chatbots can therefore be informative about the underlying data, but they
can also be misleading if the AI-generated content is used uncritically. The most re-
searched biases to date are those related to gender, race, ethnicity, and disability status,
e.g., [29,49,104–106].
In our test we have detected three different types of biases: language bias, neo-colo-
nial bias, and citation bias.
First, language bias. Although there is far more relevant scholarly content written in
Balkan languages than in English, 92% of the references generated by the AI chatbots in
our test referred to English and none to Balkan-language publications (Fig. 7). This can
only be partially explained by the fact that the prompts were in English. Namely, three
(8%) German references prove that English was not the only criterion for selection. When
question Q2 was asked with a prompt in Slovenian, two references were again in English
and the third in Slovenian was a hallucination (Appendix C: L. 36).
The detected language bias is most likely due in large part to the language bias of
online search engine ranking algorithms that favour English publications [107]. This bias
seems to be a wasted opportunity, because all tested AI chatbots "understand" many lan-
guages, e.g., [13].
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 18 of 30
10
0
1971-1999 2000-2004 2005-2009 2010-2014 2015-2016
EN GER
Second, neo-colonial bias. 88% of the references are by authors from the global West.
Only a minority (12%) belong to English translations of texts originally written in Slavic
languages [102,108,109], although there are several other (more) relevant translations, e.g.,
[88,110–117]. This reflects a scholarly hierarchy created by colonialism (until the 1910s, the
Balkans were largely divided between the Austro-Hungarian and Ottoman Empires),
sometimes referred to as a neo-colonial pattern in the global network of science in which
the intellectual dominance of the global West is growing [118,119]. To our knowledge, the
neo-colonial bias for the study of medieval Slavs has not yet been explicitly analysed or
even discovered, as it has never been revealed as clearly as through the use of AI chatbots
in this case study.
Third, citation bias. 75% of the references are from before 2005 and the oldest was
originally published in 1895 [108]. This is showing a very clear bias towards new and up
to date publications. For example, by far the most referenced publication in our case study
is Curta [85]. While this is still a seminal work, it is outdated and has often been criticised,
e.g., [91] and the critiques have been responded to [86]. Therefore, in a modern scientific
text created by a human expert, the reference to Curta is always followed by either its
critique or an up-to-date response to that critique. In AI-generated content, however,
Curta is always referenced as the primary source.
This bias is in line with the growing trend to cite old documents, caused at least in
part by the "first page results syndrome" combined with the fact that all search engines
favour the most cited documents [120,121]. The ScholarAI plugin, for example, transpar-
ently discloses that it ranks references solely by the number of citations (Appendix A: L.
622, footnote 25).
These findings are consistent with the recent study looking at references generated
by ChatGPT-3.5. It revealed that ChatGPT-3.5 tends to reference highly cited publications,
shows a preference for older publications, and refers predominantly to reputable journals.
The study concluded that ChatGPT-3.5 appears to rely exclusively on Google Scholar as a
source of scholarly references [56].
All three of the listed biases are abundant in the training material, which is of course
mostly the public web. In other words, an uncritical online research by a human using one
of the major search engines, Google Scholar or Wikipedia pages would yield comparable
results with language bias, citation bias, and neo-colonial bias. However, proper research
by a human expert(s) would avoid citation bias and at least reduce language and neo-
colonial bias.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 19 of 30
It was not the intention of this article to gain insights into the efficiency of LLMs, but
as a side note, the current trend of upsizing the LLMs, sometimes referred to as an AI
Arms Race, can be addressed. Currently, numerous well-funded startups, including An-
thropic, AI21, Cohere, and Character.AI, are putting enormous resources into developing
ever larger algorithms and the number of LLM parameters exceeds the exponential
growth trend line (Fig. 8). However, it is expected that the returns on scaling up the model
size will diminish [13].
In our data, we observed the impact of the exponential growth of the LLM on the
content generated. ChatGPT-4 is approximately 10 times larger than ChatGPT-3.5.
LaMDA has a similar number of parameters as ChatGPT-3.5, 137 and 175 billion respec-
tively, but the AI chatbot tested, Bard, only uses the "lightweight" version of LaMDA. As-
suming that the "lightweight" version uses only half the parameters, the size ratio between
LaMDA (Bard), ChatGPT-3.5, and ChatGPT-4 is 1:2:20.
Fig. 8. The race for parameters. The increase in the number of parameters (red dotted line) exceeds
the exponential trend (orange line); the in-context learning as detected by our test (blue columns;
see section 3.1) only improves with a linear trend (blue line; adapted after [122] and updated with
sources cited in section 2.1).
The results of our analysis show that ChatGPT, while currently the most advanced
AI chatbot, is not capable of producing original scientific content. This is a zero-shot learn-
ing problem, but it is also much more than that. Namely, LLM-based AI chatbots are not
designed to generate content in the same way that human researchers produce new
knowledge.
A typical process for producing original scientific contribution, which was also used
in our case study, is an intricate process that is perhaps best explained using Ackoff's Data-
Information-Knowledge-Wisdom (DIKW) hierarchy. According to this hierarchy, data is
the raw input, simple facts without context; information is data that has been given con-
text and is meaningful in some way; knowledge is the understanding and interpretation
of that information gained through experience or learning; and wisdom, the highest level
of the pyramid, is the ability to apply knowledge judiciously and appropriately in differ-
ent contexts, understand the broader implications, and make insightful decisions based
on accumulated knowledge and experience [123].
In our case study, the data was the documentation of the excavations of 1106 archae-
ological sites. The information was summaries of the excavation reports and scientific
publications on these 1106 sites curated in the structured database Zbiva, available on the
public web since 2016 [124]. Knowledge were the scholarly articles discussing and/or an-
alysing the migration of the Alpine Slavs, e.g., [97,114,115]. Wisdom is what happens
(rarely) after the publication of the scientific articles and after the generation of AI chatbot
content and does not concern us here.
Human researchers approached Q2 by first obtaining and refining the data, which
was followed by analysing the information using appropriate software, formulating
knowledge based on the results, and finally disseminating the knowledge in the form of
an original scientific article.
In the real world, archaeological practice (and most humanities research) is messy
and therefore this process is fuzzy and recursive. It takes the form of a hermeneutic spiral
[125], which from the perspective of computational algorithms are loops between data,
information and knowledge. These loops involve solving computationally irreducible
processes that cannot be handled by LLM-based AI chatbots alone (Fig. 1: I).
In other words, to generate original scientific content requires not only access to cu-
rated data/information, but also the ability to analyse it.
LLMs, on the other hand, are pre-trained on existing knowledge (texts, books) and
only able to recombine it in new permutations. Occasionally this can lead to modest orig-
inal scientific content. For instance, AI chatbot could be used to summarize what a histor-
ical text verbatim says about a certain subject, but not to interpret it. Therefore, this is a
limited and finite avenue to original scientific contributions. Regardless of the future im-
provement of LLMs, LLM-based AI chatbots will not be able to replicate the workflow of
human researchers, as they are only trained on the existing knowledge.
A purpose-built LLM-based software, on the other hand, could handle such a work-
flow: searching for data/information, performing relevant analysis, generate textual and
graphical information, and summarising it into new knowledge (Fig. 1: K, L, M; G, J). Such
LLM-based software would have several qualities of an AGI and is in fact already feasible
with existing technology, for example by combining Prometheus, a relevant cloud-based
software connected through a ChatGPT API, and ChatGPT-4.
In a nutshell, LLM-based AI chatbots are not and probably will never be able to gen-
erate new knowledge from data in the same way as human researchers (in the humani-
ties), but appropriate LLM-powered software might be.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 21 of 30
Most commentators on generative AI, including the authors of this article, agree that
the current generation of AI chatbots represent AI's inflection point. It is very likely that
historiography will record ChatGPT-3 as the eureka moment that ushered in a new era
for humanity. What this new era will bring is currently unknown, but it seems that it will,
for better or worse and probably both, change the world. To maximize the positive and to
mitigate the negative as much as possible, making AI safe and fair is necessary. And a
significant part of making AI safe is testing.
The aim of this article was to test the current AI chatbots from human(ities) perspec-
tive, specifically their scientific writing abilities. We compared six AI chatbots: ChatGPT-
3.5, ChatGPT-4, Bard, Bing Chatbot, Aria, and Claude 2.
In accordance with expectations, ChatGPT-4 was the most capable among the AI
chatbots tested. In our quantitative test, we used a method similar to grading undergrad-
uate students. Bing Chatbot and ChatGPT-4 were nearing the passing grade and
ChatGPT-3.5 and Bard were not far behind. Claude 2 and Aria produced much weaker
results. The ChatGPT-4 plugins were not yet up to the task.
In our qualitative test, we used a method similar to peer reviewing a scientific article.
ChatGPT-4 was again the best performer, but it didn't generate any notable original sci-
entific content.
Additional shortcomings of the AI-generated content that we found in our test in-
clude reasoning errors, hallucinations, and biases. Reasoning errors refer to the chatbot's
inability to critically evaluate and link cause-and-effect relationships, as evidenced by sev-
eral historical inaccuracies regarding the migration patterns and historical periods of the
Slavs. Hallucinations denote confident but unsubstantiated claims by the AI, such as in-
vented references and inaccurate dates. Our test also reveals significant biases in the con-
tent generated by the AI. These biases manifest as language bias, favouring English
sources over other relevant languages; neo-colonial bias, displaying a preference for West-
ern authors; and citation bias, skewing towards older and more highly cited publications.
These findings highlight that despite their technological prowess, AI chatbots remain re-
liant on their training data echoing or even amplifying existing biases and knowledge
gaps. Because they veer towards past data, they are likely to be too conservative in their
recommendations. Since the listed deficiencies are relatively inconspicuous compared to,
for example, gender or racial biases, it is unlikely that they will be remedied in the fore-
seeable future by resource-intensive reinforcement learning from human feedback. These
biases are among the key concerns with the use of AI chatbots in scientific writing, as they
are less likely to be highlighted in the review processes.
Our results also highlighted the possible future trends in the development of AI chat-
bots. The large discrepancy between almost passing an undergraduate exam and not pro-
ducing any notable scientific contribution may seem surprising at the first glance. On
closer inspection, however, this was to be expected. "Doing science", i.e. making an origi-
nal scientific contribution, is much more complex than just doing very well in the exams.
It is based on proactively acquiring and analysing data and information to generate new
knowledge, whereas "passing an exam" is based on accumulating existing knowledge and
passing it on at a request.
"Passing an exam" will further improve when AI chatbots are given access to curated
data. An AI chatbot with access to selected datasets would be a typical downstream task
developed around an existing LLM. However, without access to external tools, LLM-
based AI chatbots will never be suitable for "doing science". Therefore, in the near future
an evolution of current LLM-based AI chatbots towards LLM-powered software capable
of, among other things, "doing science" seems likely. This assertion is in line both with the
fact that the growth of LLMs seems to have plateaued and that the industry is turning to
other solutions.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 22 of 30
In conclusion, we agree with the previous commentators that AI chatbots are already
widely used for various tasks in scientific writing due to their applicability to a wide range
of tasks. However, AI chatbots are not capable of generating a full scientific article that
would make a notable scientific contribution to the humanities or, we suspect, to science
in general. If left unsupervised, AI chatbots generate content that is fluent but not factual,
meaning that the errors are not only many but also easily overlooked, cf.[126]. Therefore,
peer review processes need to be rapidly adapted to compensate for this, and the academic
community needs to establish clear rules for the use of AI-generated content in scientific
writing. The discussion about what is acceptable and what is not must be based on objec-
tive data. Articles like this one are necessary to support those decisions and we suspect
that many more will follow.
As for the future beyond the immediate foreseeable, when LLM-powered software
and/or AGI will be able to generate original scientific contributions, we agree that ques-
tions about explainable AI are likely to come to the fore. Understanding our world, a fun-
damental aspiration of the humanities, will only be partially achieved through the use of
black-box AI. Since the humanities, just like justice, for example, are as much about pro-
cess as outcome, humanities scholars are unlikely to settle for uninterpretable AI-gener-
ated predictions. We will want human-interpretable understanding, which is likely to be
the remaining task of human researchers in the humanities in the future, cf. [127].
Supplementary Materials: Appendix A, Appendix B, and appendix C are available on the open
access repository Zenoto under CC-BY 4.0 lince: https://doi.org/10.5281/zenodo.8345088.
Author Contributions: Both authors, E.L. and B.Š., contributed equally to the article. Conceptual-
ization, E.L. and B.Š.; methodology, E.L. and B.Š.; validation, E.L. and B.Š.; formal analysis, E.L. and
B.Š.; writing—original draft preparation, E.L. and B.Š.; writing—review and editing, E.L. and B.Š.;
visualization, E.L. and B.Š.; project administration, E.L. and B.Š.; funding acquisition, E.L. and B.Š.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was part of the AI4Europe project that has received funding from the Euro-
pean Union’s Horizon Europe research and innovation programme under Grant Agreement nº
101070000.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: All data used in and produced by the research are available in the
appendices.
Acknowledgments: The authors give thanks to dr. Zoran Čučković for introducing them to
ChatGPT in December 2022.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the
design of the study, in the collection, analyses, or interpretation of data, in the writing of the manu-
script, or in the decision to publish the results.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 23 of 30
References
1. Elam, M. Poetry Will Not Optimize: Creativity in the Age of AI. In Generative AI: Perspectives from Stanford HAI. How do you
think generative AI will affect your field and society going forward?; F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay, E.
Michele, D. E. Ho, P. Liangs, E. Brynjolfsson, C. D. Manning, R. Reich, P. Norvig, Eds.; HAI, Stanford University,
Human-Centred Artificial Inteligence: Palo Alto, 2023; pp. 11–12.
2. Manning, C.D. The Reinvention of Work. In Generative AI: Perspectives from Stanford HAI. How do you think generative AI will
affect your field and society going forward?; F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay, E. Michele, D. E. Ho, P.
Liangs, E. Brynjolfsson, C. D. Manning, R. Reich, P. Norvig, Eds.; HAI, Stanford University, Human-Centred Artificial
Inteligence: Palo Alto, 2023; pp. 18–19.
3. Liang, P. The New Cambrian Era: “Scientific Excitement, Anxiety.” In Generative AI: Perspectives from Stanford HAI. How do
you think generative AI will affect your field and society going forward?; F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay,
E. Michele, D. E. Ho, P. Liangs, E. Brynjolfsson, C. D. Manning, R. Reich, P. Norvig, Eds.; HAI, Stanford University,
Human-Centred Artificial Inteligence: Palo Alto, 2023; p. 15.
4. Eloundou, T.; Manning, S.; Mishkin, P.; Rock, D. GPTs Are GPTs: An Early Look at the Labor Market Impact Potential of 5.
Large Language Models. arXiv preprint 2023, 2303.10130, doi:10.48550/ARXIV.2303.10130.
5. Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill,
E.; et al. On the Opportunities and Risks of Foundation Models; Center for Research on Foundation Models, Stanford
University: Palo Alto, 2023.
6. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, \L ukasz; Polosukhin, I. Attention Is All
You Need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus,
R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc., 2017; Vol. 30.
7. Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling
Laws for Neural Language Models. arXiv preprint 2020, 2001.08361, doi:10.48550/ARXIV.2001.08361.
8. Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; Casas, D. de L.; Hendricks, L.A.; Welbl, J.;
Clark, A.; et al. Training Compute-Optimal Large Language Models. arXiv preprint 2022, 2203.15556,
doi:10.48550/ARXIV.2203.15556.
9. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et
al. Language Models Are Few-Shot Learners. In Advances in Neural Information Processing Systems 33; Larochelle, H.,
Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc., 2020; pp. 1877–1901.
10. Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann,
S.; et al. PaLM: Scaling Language Modeling with Pathways. arXiv preprint 2022, 2204.02311,
doi:10.48550/arXiv.2204.02311.
11. Patel, D.; Wong, G. GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE. Demystifying GPT-4: The
Engineering Tradeoffs That Led OpenAI to Their Architecture. SemiAnalysis 2023, 10 July.
https://www.semianalysis.com/p/gpt-4-architecture-infrastructure (accessed on 12 September 2023).
12. Gardizy, A.; Ma, W. Microsoft Readies AI Chip as Machine Learning Costs Surge. The Information 2023, April 18.
https://www.theinformation.com/articles/microsoft-readies-ai-chip-as-machine-learning-costs-surge (accessed on 12
September 2023).
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 24 of 30
13. OpenAI GPT-4 Technical Report. arXiv preprint 2023, 2303.08774, doi:10.48550/ARXIV.2303.08774.
14. Wolfram, S. Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT. Stephen Wolfram
Writings 2023. writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-
superpowers-to-chatgpt (accessed on 12 September 2023).
15. Wolffram, S. What Is ChatGPT Doing … and Why Does It Work? Stephen Wolfram Writings 2023.
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ (accessed on 12
September 2023).
16. OpenAI Introducing ChatGPT. OpenAI blog 2022, November 30. https://openai.com/blog/chatgpt (accessed on 12 September
2023).
17. Bryant, A. AI Chatbots: Threat or Opportunity? Informatics 2023, 10, 49, doi:10.3390/informatics10020049.
18. Hsiao, S.; Collins, E. Try Bard and Share Your Feedback. Blog Google 2023. https://blog.google/technology/ai/try-bard
(accessed on 12 September 2023).
19. Mehdi, Y. Reinventing Search with a New AI-Powered Microsoft Bing and Edge, Your Copilot for the Web. Official Microsoft
Blog 2023, 7 February. https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-
microsoft-bing-and-edge-your-copilot-for-the-web (accessed on 12 September 2023).
20. OpenAI Introducing ChatGPT Plus. OpenAI blog 2023, February 1. https://openai.com/blog/chatgpt-plus (accessed on 12
September 2023).
21. Opera Your Comprehensive Guide to Aria: Opera’s Native Browser AI. Blogs Opera 2023.
https://blogs.opera.com/desktop/2023/06/introducing-aria/ (accessed on 12 September 2023).
22. Anthropic Introducing Claude. Anthropic Blog 2023, March 14. https://www.anthropic.com/index/introducing-claude
(accessed on 12 September 2023).
23. Anthropic Claude 2. Anthropic Blog 2023, July 11. https://www.anthropic.com/index/claude-2 (accessed on 12 September
2023).
24. Brynjolfsson, E. A Call to Augment - Not Automate - Workers. In Generative AI: Perspectives from Stanford HAI. How do you
think generative AI will affect your field and society going forward?; F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay, E.
Michele, D. E. Ho, P. Liangs, E. Brynjolfsson, C. D. Manning, R. Reich, P. Norvig, Eds.; HAI, Stanford University,
Human-Centred Artificial Inteligence: Palo Alto, 2023; pp. 16–17.
25. Li, F.-F. AI’s Great Inflection Point. In Generative AI: Perspectives from Stanford HAI. How do you think generative AI will affect
your field and society going forward?; F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay, E. Michele, D. E. Ho, P. Liangs,
E. Brynjolfsson, C. D. Manning, R. Reich, P. Norvig, Eds.; HAI, Stanford University, Human-Centred Artificial
Inteligence: Palo Alto, 2023; pp. 4–5.
26. Microsoft Will AI Fix Work? 2023 Work Trend Index: Annual Report; Redmond, 2023. Available online:
https://assets.ctfassets.net/y8fb0rhks3b3/5eyZc6gDu1bzftdY6w3ZVV/93190f5a8c7241ecf2d6861bdc7fe3ca/WTI_Will_A
I_Fix_Work_060723.pdf (accessed on 12 September 2023).
27. Donelan, M. Government Commits up to £3.5 Billion to Future of Tech and Science. UK Government News 2023, 16 March.
https://www.gov.uk/government/news/government-commits-up-to-35-billion-to-future-of-tech-and-science (accessed
on 12 September 2023).
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 25 of 30
28. Hinton, G. et al. Statement on AI Risk. Center for AI safety 2023. https://www.safe.ai/statement-on-ai-risk (accessed on 12
September 2023).
29. Okerlund, J.; Klasky, E.; Middha, A.; Kim, S.; Rosenfeld, H.; Kleinman, M.; Parthasarathy, S. What’s in the Chatterbox? Large
Language Models, Why They Matter, and What We Should Do About Them; University of Michigan: Ann Arbor, 2022.
Available online: https://stpp.fordschool.umich.edu/sites/stpp/files/2022-05/large-language-models-TAP-2022-final-
051622.pdf (accessed on 12 September 2023).
30. Hendrycks, D.; Mazeika, M. X-Risk Analysis for AI Research. arXiv preprint 2022, 2206.05862,
doi:10.48550/ARXIV.2206.05862.
31. Bucknall, B.S.; Dori-Hacohen, S. Current and Near-Term AI as a Potential Existential Risk Factor. Proceedings of the 2022
AAAI/ACM Conference on AI, Ethics, and Society 2022, doi:10.1145/3514094.3534146.
32. Cohen, M.K.; Hutter, M.; Osborne, M.A. Advanced Artificial Agents Intervene in the Provision of Reward. AI Magazine
2022, 43, 282–293, doi:10.1002/aaai.12064.
33. Ngo, R.; Chan, L.; Mindermann, S. The Alignment Problem from a Deep Learning Perspective. arXiv preprint 2022,
2209.00626, doi:10.48550/ARXIV.2209.00626.
34. Kleinman, Z.; Wain, P. Why Making AI Safe Isn’t as Easy as You Might Think. BBC News website 2023, 13 June.
https://www.bbc.com/news/technology-65850668 (accessed on 12 September 2023).
35. Carlsmith, J. Is Power-Seeking AI an Existential Risk? arXiv preprint 2022, 2206.13353, doi:10.48550/arXiv.2206.13353.
36. Kleinman, Z.; Gerken, T. Nick Clegg: AI Language Systems Are “Quite Stupid.” BBC News website 2023, 19 July.
https://www.bbc.com/news/technology-66238004 (accessed on 12 September 2023).
37. Vallance, C. AI Could Replace Equivalent of 300 Million Jobs - Report. BBC News website 2023, 28 March.
https://www.bbc.com/news/technology-65102150 (accessed on 12 September 2023).
38. Vallance, C. Sarah Silverman Sues OpenAI and Meta. BBC News website 2023, 12 July.
https://www.bbc.com/news/technology-66164228 (accessed on 12 September 2023).
39. Bearne, S. New AI Systems Collide with Copyright Law. BBC News website 2023, 1 August.
https://www.bbc.com/news/business-66231268 (accessed on 12 September 2023).
40. Future of Life Institute Pause Giant AI Experiments: An Open Letter. 2023. https://futureoflife.org/open-letter/pause-giant-
ai-experiments (accessed on 12 September 2023).
41. Vallance, C. Powerful Artificial Intelligence Ban Possible, Government Adviser Warns. BBC News website 2023, 1 June.
Powerful artificial intelligence ban possible, government adviser warns (accessed on 12 September 2023).
42. Generative AI: Perspectives from Stanford HAI. How Do You Think Generative AI Will Affect Your Field and Society Going Forward?;
F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay, E. Michele, D. E. Ho, P. Liangs, E. Brynjolfsson, C. D. Manning, R.
Reich, P. Norvig, Eds.; HAI, Stanford University, Human-Centred Artificial Inteligence: Palo Alto, 2023.
43. McCallum, S. Seven AI Companies Agree to Safeguards in the US. BBC News website 2023, 22 July.
https://www.bbc.com/news/technology-66271429 (accessed on 12 September 2023).
45. Imafidon, A.-M. et al. AI Open Letter to UK Government and Industry; BCS, The Chartered Institute for IT, 2023.
https://www.bcs.org/sign-our-open-letter-on-the-future-of-ai (accessed on 12 September 2023).
46. Simons, J. The Creator of ChatGPT Thinks AI Should Be Regulated. Time 2023, February 5. https://time.com/6252404/mira-
murati-chatgpt-openai-interview/ (accessed on 12 September 2023).
47. White House FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial
Intelligence Companies to Manage the Risks Posed by AI. The White House Brief 2023, July 21.
https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-
secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/
(accessed on 12 September 2023).
48. Altman, S. Planning for AGI and Beyond. OpenAI blog 2023. https://openai.com/blog/planning-for-agi-and-beyond
(accessed on 12 September 2023).
49. Lund, B.D.; Wang, T.; Mannuru, N.R.; Nie, B.; Shimray, S.; Wang, Z. ChatGPT and a New Academic Reality: AI-Written
Research Papers and the Ethics of the Large Language Models in Scholarly Publishing. Journal of the Association for
Information Science and Technology 2023, 74, 570–581, doi:10.1002/asi.24750.
50. Haque, M.U.; Dharmadasa, I.; Sworna, Z.T.; Rajapakse, R.N.; Ahmad, H. “I Think This Is the Most Disruptive Technology”:
Exploring Sentiments of ChatGPT Early Adopters Using Twitter Data. arXiv preprint 2022, 2212.05856,
doi:10.48550/arXiv.2212.05856.
51. Altmäe, S.; Sola-Leyva, A.; Salumets, A. Artificial Intelligence in Scientific Writing: A Friend or a Foe? Reproductive
BioMedicine Online 2023, 47, 3–9, doi:10.1016/j.rbmo.2023.04.009.
52. Else, H. Abstracts Written by ChatGPT Fool Scientists. Nature 2023, 613, 423–423, doi:10.1038/d41586-023-00056-7.
53. Gao, C.A.; Howard, F.M.; Markov, N.S.; Dyer, E.C.; Ramesh, S.; Luo, Y.; Pearson, A.T. Comparing Scientific Abstracts
Generated by ChatGPT to Original Abstracts Using an Artificial Intelligence Output Detector, Plagiarism Detector, and
Blinded Human Reviewers. Digital Medicine 2023, 6, doi:10.1038/s41746-023-00819-6.
54. Osmanovic Thunström, A. We Asked GPT-3 to Write an Academic Paper about Itself-Then We Tried to Get It Published
An Artificially Intelligent First Author Presents Many Ethical Questions–and Could Upend the Publishing Process.
Scientific American 2022, June 30. Available online: https://www.scientificamerican.com/article/we-asked-gpt-3-to-write-
an-academic-paper-about-itself-mdash-then-we-tried-to-get-it-published/ (accessed on 12 September 2023).
55. Generative Pretrained Transformer, G.; Thunström Osmanovic, A.; Steingrimsson, S. Can GPT-3 Write an Academic Paper
on Itself, with Minimal Human Input? HAL Preprint 2022, hal-03701250.
56. Petiška, E. ChatGPT Cites the Most-Cited Articles and Journals, Relying Solely on Google Scholar’s Citation Counts. As a
Result, AI May Amplify the Matthew Effect in Environmental Science. arXiv preprint 2023, 2304.06794,
doi:10.48550/arXiv.2304.06794.
57. Májovský, M.; Černý, M.; Kasal, M.; Komarc, M.; Netuka, D. Artificial Intelligence Can Generate Fraudulent but Authentic-
Looking Scientific Medical Articles: Pandora’s Box Has Been Opened. Journal of Medical Internet Research 2023, 25, e46924,
doi:10.2196/46924.
58. Bhattacharyya, M.; Miller, V.M.; Bhattacharyya, D.; Miller, L.E. High Rates of Fabricated and Inaccurate References in
ChatGPT-Generated Medical Content. Cureus 2023, 15, e39238, doi:10.7759/cureus.39238.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 27 of 30
59. Buholayka, M.; Zouabi, R.; Tadinada, A. Is ChatGPT Ready to Write Scientific Case Reports Independently? A Comparative
Evaluation Between Human and Artificial Intelligence. Cureus 2023, 15, e39386, doi:10.7759/cureus.39386.
60. Lund, B.D.; Wang, T. Chatting about ChatGPT: How May AI and GPT Impact Academia and Libraries? Library Hi Tech
News 2023, 40, 26–29, doi:10.1108/lhtn-01-2023-0009.
61. Salvagno, M.; Taccone, F.S.; Gerli, A.G. Can Artificial Intelligence Help for Scientific Writing? Critical Care 2023, 27, 75,
doi:10.1186/s13054-023-04380-2.
62. Pividori, M.; Greene, C.S. A Publishing Infrastructure for AI-Assisted Academic Authoring. bioRxiv preprint 2023,
doi:10.1101/2023.01.21.525030.
63. Venema, L.; Jerde, T.; Huth, J.; Pieropan, M.; Matusevych, Y. The AI Writing on the Wall. Nature Machine Intelligence 2023,
5, 1, doi:10.1038/s42256-023-00613-9.
64. Getahun, H. After an AI Bot Wrote a Scientific Paper on Itself, the Researcher behind the Experiment Says She Hopes She
Didn’t Open a “Pandora’s Box.” Bussines Insider Nederland 2022.
66. Understanding ChatGPT Is a Bold New Challenge for Science. Nature 2023, 619.
67. Moskvichev, A.; Odouard, V.V.; Mitchell, M. The ConceptARC Benchmark: Evaluating Understanding and Generalization
in the ARC Domain. arXiv preprint 2023, 2305.07141, doi:10.48550/ARXIV.2305.07141.
68. Kocoń, J.; Cichecki, I.; Kaszyca, O.; Kochanek, M.; Szydło, D.; Baran, J.; Bielaniewicz, J.; Gruza, M.; Janz, A.; Kanclerz, K.; et
al. ChatGPT: Jack of All Trades, Master of None. Information Fusion 2023, 99, 101861, doi:10.1016/j.inffus.2023.101861.
69. Bubeck, S.; Chandrasekaran, V.; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y.T.; Li, Y.; Lundberg, S.; et al.
Sparks of Artificial General Intelligence: Early Experiments with GPT-4. arXiv preprint 2023, 2303.12712,
doi:10.48550/ARXIV.2303.12712.
70. Koo, M. The Importance of Proper Use of ChatGPT in Medical Writing. Radiology 2023, 307, doi:10.1148/radiol.230312.
71. Liebrenz, M.; Schleifer, R.; Buadze, A.; Bhugra, D.; Smith, A. Generating Scholarly Content with ChatGPT: Ethical
Challenges for Medical Publishing. The Lancet Digital Health 2023, 5, e105–e106, doi:10.1016/s2589-7500(23)00019-5.
72. Hill-Yardin, E.L.; Hutchinson, M.R.; Laycock, R.; Spencer, S.J. A Chat(GPT) about the Future of Scientific Publishing. Brain,
Behavior, and Immunity 2023, 110, 152–154, doi:10.1016/j.bbi.2023.02.022.
73. Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large
Language Models. arXiv preprint 2023, 2303.18223, doi:10.48550/ARXIV.2303.18223.
74. Fan, L.; Li, L.; Ma, Z.; Lee, S.; Yu, H.; Hemphill, L. A Bibliometric Review of Large Language Models Research from 2017
to 2023. arXiv preprint 2023, 2304.02020, doi:10.48550/arXiv.2304.02020.
75. Christiano, P.; Leike, J.; Brown, T.B.; Martic, M.; Legg, S.; Amodei, D. Deep Reinforcement Learning from Human
Preferences (V4). arXiv preprint 2023, 1706.03741, doi:10.48550/ARXIV.1706.03741.
76. Perrigo, B. Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic. Time 2023,
January 18. Available online: https://time.com/6247678/openai-chatgpt-kenya-workers (accessed on 12 September 2023).
77. OpenAI ChatGPT Plugins. OpenAI blog 2023. https://openai.com/blog/chatgpt-plugins (accessed on 12 September 2023).
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 28 of 30
78. Opera Opera Browser AI - Aria FAQ. Opera Help 2023. https://help.opera.com/en/browser-ai-faq (accessed on 12 September
2023).
79. Manyika, J. An Overview of Bard: An Early Experiment with Generative AI. ai.google static documents 2023.
https://ai.google/static/documents/google-about-bard.pdf (accessed on 12 September 2023).
80. Collins, E.; Ghahramani, Z. LaMDA: Our Breakthrough Conversation Technology. Blog Google 2021.
https://blog.google/technology/ai/lamda/ (accessed on 12 September 2023).
81. Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.-T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al.
LaMDA: Language Models for Dialog Applications. arXiv preprint 2022, 2201.08239, doi:10.48550/ARXIV.2201.08239.
82. Bai, Y.; Kadavath, S.; Kundu, S.; Askell, A.; Kernion, J.; Jones, A.; Chen, A.; Goldie, A.; Mirhoseini, A.; McKinnon, C.; et al.
Constitutional AI: Harmlessness from AI Feedback. arXiv preprint 2022, 2212.08073, doi:10.48550/ARXIV.2212.08073.
83. Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint 2022,
2204.05862, doi:10.48550/ARXIV.2204.05862.
84. Lunt, H.G. Common Slavic, Proto-Slavic, Pan-Slavic: What Are We Talking About? International Journal of Slavic Linguistics
and Poetics 1997, 41, 7–67.
85. Curta, F. The Making of the Slavs : History and Archaeology of the Lower Danube Region, c. 500-700; Cambridge University Press,
2001; ISBN 978-0-511-49629-5.
86. Curta, F. Migrations in the Archaeology of Eastern and Southeastern Europe in the Early Middle Ages (Some Comments
on the Current State of Research). In Migration Histories of the Medieval Afro-Eurasian Transition Zone: Aspects of mobility
between Africa, Asia and Europe, 300-1500 CE; Preise-Kapeller, J., Reinfandt, L., Stouraitis, Y., Eds.; Brill: Leiden, Boston,
2020; pp. 101–140.
87. Pritsak, O. Gli Slavi Occidentali e Meridionali Nell’alto Medioevo. 15 - 21 Aprile 1982. Settimane Di Studio Del Centro
Italiano Di Studi Sull’Alto Medioevo, XXX. In Gli slavi occidentali e meridionali nell’alto medioevo. 15 - 21 Aprile 1982.
Settimane di Studio del Centro Italiano di Studi Sull’Alto Medioevo, XXX; Fondazione CISAM, 1983; pp. 353–435 ISBN 88-
7988-029-2.
88. Kazanski, M. Archaeology of the Slavic Migrations. In Encyclopedia of Slavic Languages and Linguistics; Greenberg, M.L.,
Grenoble, L.A., Eds.; Brill: Leiden, Boston, 2020.
89. Pohl, W. The Avars: A Steppe Empire in Europe, 567-822; The Avars: a steppe empire in Europe, 567-822; Cornell University
Press, 2018; ISBN 978-1-5017-2940-9.
90. Heather, P. Empires and Barbarians: The Fall of Rome and the Birth of Europe; Oxford University Press, 2010.
91. Lindstedt, J.; Salmela, E. Language Contact and the Early Slavs. In New perspectives on the Early Slavs and the rise of Slavic;
Klír, T., Boček, V., Jansens, N., Eds.; Empirie und Theorie der Sprachwissenschaft; Universitätsverlag Winter GmbH,
2020; pp. 275–299 ISBN 978-3-8253-4707-9.
92. Štular, B.; Lozić, E.; Belak, M.; Rihter, J.; Koch, I.; Modrijan, Z.; Magdič, A.; Karl, S.; Lehner, M.; Gutjahr, C. Migration of
Alpine Slavs and Machine Learning: Space-Time Pattern Mining of an Archaeological Data Set. PLOS ONE 2022, 17,
e0274687, doi:10.1371/journal.pone.0274687.
93. Ratzel, F. Anthropogeographie. Bd. 1. Grundzüge Der Anwendung Der Erdkunde; J. Engelhorns Nach.: Berlin, 1909.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 29 of 30
94. MacEachern, S. Genes, Tribes, and African History. Current Anthropology 2000, 41, 357–384.
95. Bentley, C.G. Ethnicity and Practice. Comparative studies in society and history 1987, 29, 24–55.
96. Knapp, B.A. Prehistoric and Protohistoric Cyprus. Identity, Insularity, and Connectivity; Oxford University Press, 2008.
97. Grafenauer, B. Zgodovina Slovenskega Naroda, I. Zvezek: Od Naselitve Do Uveljavljenja Frankovskega Fevdalnega Reda; Kmečka
knjiga, 1954;
98. Woods, R. AI + You: How to Use ChatGPT for Content Creation. Microsoft Learn Articles 2023, July 11.
https://create.microsoft.com/en-us/learn/articles/how-to-use-chatgpt-for-content-creation (accessed on 12 September
2023).
99. PLOS How to Write a Peer Review. PLOS 2023. https://plos.org/resource/how-to-write-a-peer-review/ (accessed on 12
September 2023).
100. Lozić, E. Application of Airborne LiDAR Data to the Archaeology of Agrarian Land Use: The Case Study of the Early
Medieval Microregion of Bled (Slovenia). Remote Sensing 2021, 13, 3228, doi:10.3390/rs13163228.
101. Štular, B. Grave Orientation In The Middle Ages. A Case Study from Bled Island; E-Monographiae Instituti Archaeologici
Sloveniae; Založba ZRC, 2022; Vol. 14; ISBN 978-961-05-0633-1; doi:10.3986/9789610506331.
102. Kajkowski, K. The Boar in the Symbolic and Religious System of Baltic Slavs in the Early Middle Ages Dzik w Systemie
Symboliczno Religijnym Słowian Nadbałtyckich Wczesnego Średniowiecza. Studia Mythologica Slavica 2012, 15, 201–215.
103. Pleterski, A. Kulturni Genom : Prostor in Njegovi Ideogrami Mitične Zgodbe; Založba ZRC, 2014; ISBN 978-961-254-736-3.
104. Lucy, L.; Bamman, D. Proceedings of the Third Workshop on Narrative Understanding. In Proceedings of the Third Workshop
on Narrative Understanding; Nader Akoury, F.B., Snigdha Chaturvedi Elizabeth Clark Mohit Iyyer Lara J. Martin, Ed.;
Association for Computational Linguistics, 2021; pp. 48-55.
105. Hutchinson, B.; Prabhakaran, V.; Denton, E.; Webster, K.; Zhong, Y.; Denuyl, S. Proceedings of the 58th Annual Meeting
of the Association for Computational Linguistics. In Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics; Dan Jurafsky, J.C., Natalie Schluter Joel Tetreault, Ed.; Association for Computational
Linguistics, 2020; pp. 5491-5501.
106. Kotek, H.; Dockum, R.; Sun, D.Q. Gender Bias and Stereotypes in Large Language Models. arXiv preprint 2023, 2308.14921,
doi:10.48550/ARXIV.2308.14921.
107. Rovira, C.; Codina, L.; Lopezosa, C. Language Bias in the Google Scholar Ranking Algorithm. Future Internet 2021, 13, 31,
doi:10.3390/fi13020031.
108. Hrushevsky, M. History of Ukraine-Rus’. Volume 1. From Prehistory to the Eleventh Century; History of Ukraine-Rus’, vol. 1;
University of Alberta Press, 1997; Vol. 1; ISBN 1-895571-19-7.
109. Obolensky, D. The Byzantine Commonwealth: Eastern Europe, 500-1453; History of Civilization series. London: Weidenfeld
and Nicolson, 1971.
110. Dolukhanov, P. The Early Slavs: Eastern Europe from the Initial Settlement to the Kievan Rus; Routledge, 1996; ISBN 978-1-315-
84355-1.
112. Parczewski, M. Origins of Early Slav Culture in Poland. Antiquity 1991, 65, 676–683.
Preprint 2023, Lozić & Štular. Please cite the published version at https://doi.org/10.3390/fi15100336 30 of 30
113. Dvornik, F. The Slavs in European History and Civilization; Rutgers University Press, 1962.
114. Pleterski, A. The Slavs on the Danube. Homeland Found. In The Slavs on the Danube. Homeland Found; Rabinovich, R.A.,
Gavritukhin, I.O., Eds.; Stratum Plus. Archaeology and Cultural Anthropology 5; 2015; Vol. 5, pp. 117-150.
115. Pavlovič, D. Začetki Zgodnjeslovanske Poselitve Prekmurja = Beginnings of the Early Slavic Settlement in the Prekmurje
Region, Slovenia. Arheološki vestnik 2017, 68, 349–386.
116. Malyarchuk, B.A.; Derenko, M.V. Diversity and Structure of Mitochondrial Gene Pools of Slavs in the Ethnogenetic Aspect.
Biology Bulletin Reviews 2021, 11, 122-133.
117. Živković, T.; Crnčević, D.; Bulić, D.; Petrović, V.; Cvijanović, I.; Radovanović, B. The World of the Slavs. Studies on the East,
West and South Slavs: Civitas, Oppidas, Villas and Archeological Evidence (7th to 11th Centuries AD); The Institute of History,
2013.
118. Nagtegaal, L.W.; de Bruin, R.E. The French Connection and Other Neo-Colonial Patterns in the Global Network of Science.
Research Evaluation 1994, 4, 119–127, doi:10.1093/rev/4.2.119.
119. Schürer, Y. Decolonising the Library - in Deutschland? Bibliothek Forschung und Praxis 2023, 47, doi:10.1515/bfp-2022-0029.
120. Beel, J.; Gipp, B. Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI ’09), Volume
1. In; Larsen, B., Leta, J., Eds.; International Society for Scientometrics and Informetrics, 2009; pp. 230–241.
121. Martín-Martín, A.; Orduna-Malea, E.; Ayllón, J.M.; López-Cózar, E.D. Back to the Past: On the Shoulders of an Academic
Search Engine Giant. Scientometrics 2016, 107, 1477–1487.
122. Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.
arXiv preprint 2019, 1910.01108, doi:10.48550/ARXIV.1910.01108.
123. Ackoff, R.L. From Data to Wisdom. Journal of applied systems analysis 1989, 16, 3–9.
124. Štular, B.; Belak, M. Deep Data Example: Zbiva, Early Medieval Data Set for the Eastern Alps. Research Data Journal for the
Humanities and Social Sciences 2022, 1–13, doi:10.1163/24523666-bja10024.
125. Hodder, I. The Archaeological Process an Introduction; Wiley-Blackwell, 1999; ISBN 978-0-631-19885-7.
126. Stokel-Walker, C. AI Bot ChatGPT Writes Smart Essays - Should Professors Worry? Nature 2022, Dec, doi:10.1038/d41586-
022-04397-7.
127. Ganguli, S. Generative AI: Perspectives from Stanford HAI. How Do You Think Generative AI Will Affect Your Field and
Society Going Forward? In Generative AI: Perspectives from Stanford HAI. How do you think generative AI will affect your field
and society going forward?; F.-F. Li, A. Russ, C. Langlotz, S. Ganguli, J. Landay, E. Michele, D. E. Ho, P. Liangs, E.
Brynjolfsson, C. D. Manning, R. Reich, P. Norvig, Eds.; HAI, Stanford University, Human-Centred Artificial Inteligence:
Palo Alto, 2023; pp. 8–9.