0% found this document useful (0 votes)
22 views13 pages

Semantic Similarity Between Medium-Sized Texts

Uploaded by

Dongmin Jeong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

Semantic Similarity Between Medium-Sized Texts

Uploaded by

Dongmin Jeong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Semantic Similarity Between

Medium-Sized Texts

Jacobo Farray Rodrı́guez(B) , Antonio Jesús Fernández-Garcı́a ,


and Elena Verdú

Universidad Internacional de La Rioja, Logroño, Spain


[email protected]
https://gruposinvestigacion.unir.net/dds/

Abstract. Semantically comparing texts is a task that is useful in var-


ious fields, such as the automatic correction of exams and/or activities.
Making use of Natural Language Processing (NLP) and deep learning
techniques, the correction task can be facilitated for the teacher, so that
a greater number of knowledge tests can be offered to the student. The
objective of this work is to semantically compare texts in order to be able
to evaluate the student’s knowledge automatically. For this, models will
be built based on Transformers architectures, specialized in the Spanish
language and in 2 subjects. These models can be used and evaluated
through an application. After using the different models to measure the
similarity between a set of student’s answers and the ideal answer pro-
vided by the teacher, a Pearson correlation coefficient greater than 80%
is obtained when comparing the similarity measured and the teacher’s
grade. Given the Pearson correlation obtained, a MAE of 0.13 and an
RMSE of 0.17, it is concluded that the models obtained would serve
as an evaluation guide for both the teaching team and the students in
the medium term, opening the door to further research to create an
autonomous system in the long-term.

Keywords: deep learning · automatic correction · automatic short


answer grading · semantic comparison · semantic similarity ·
transformers

1 Introduction
Semantically comparing texts involves analyzing their meaning in different con-
texts and can be very useful for a variety of applications such as sentiment
analysis, plagiarism detection, or content analysis, among others. Education is
not unfamiliar with this field. We find ourselves with the need to correct exams
and/or activities, this being a necessary activity to validate the knowledge of
the examinees.
Current natural language processing techniques allow us to obtain a semantic
meaning of sentences, in such a way that we can find similarities of sentences writ-
ten differently, but with similar semantics. While English may be more advanced
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Chbeir et al. (Eds.): MEDES 2023, CCIS 2022, pp. 361–373, 2024.
https://doi.org/10.1007/978-3-031-51643-6_26
362 J. Farray Rodrı́guez et al.

in terms of Natural Language Processing (NLP) technologies due to the avail-


ability of large amounts of data used to create large models, there is a growing
body of work in other languages like Spanish or Portuguese, and we can expect
to see continued progress in this area in the coming years [4,6,10].
This study aims to make a contribution to the field of Education and NLP
in the Spanish language, the second most spoken language in the world. Specif-
ically, we focus on the automatic correction of exams/activities. The purpose of
this research paper is to investigate the effectiveness of using NLP for the cor-
rections of questions whose answers are around 200–250 words. These answers
will be compared with the ideal answer provided by the teacher. To perform the
comparison task our proposal is based on an architecture of Siamese Transform-
ers models [9], architecture that reduces dimensionality and makes efficient use
of resources when training the model.
This context led us to pose the following research question:

RQ How accurate can current models based on Transformers architectures be for


semantically comparing Spanish language texts in the context of educational
assessment?

For this study, a question/answer dataset from 2 subjects delivered at Univer-


sidad Internacional de La Rioja was used. Using them, specialized NLP models
were created, which were benchmarked both quantitatively and qualitatively.
The rest of this communication is organized as follows. Section 2 reviews
some related projects that involve measuring the similarity of texts. Section 3
provides an overview of the approach methodology. Section 4 shows the quanti-
tative and qualitative results obtained, which are discussed in Sect. 5. Finally,
some conclusions and further considerations are summarised in Sect. 6.

2 State of Art
This chapter presents the recent progress in the calculation of semantic similar-
ity between texts, from the more classical approaches, such as those based on
the similarity of vectors created from a latent probabilistic semantic analysis,
generally using the cosine distance as a similarity metric [13] or variants of word
embeddings models [3], to new approaches with the advent of machine learning.
With machine learning, a lot of work has been done taking into account the con-
text of the text to carry out NLP tasks. The calculation of semantic similarity
is not immune to this trend. That is why the first approaches applied recurrent
networks as long short-term memory (LSTM) ones [13], which were able to learn
long-term dependencies in the text until the presentation of the Transformers
architecture [11]. Transformers architecture has meant a turning point within
the NLP, where the models have the novelty of the replacement of the recurrent
layers by the so-called attention layers. These layers remove the recursion that
LSTM has, so sentences are processed as a whole (positional encoding) instead
of word by word. With this, it reduces the complexity and allows parallelization,
thus improving the efficiency of the calculation.
Semantic Similarity Between Medium-Sized Texts 363

In a study [12] of semantic comparison of English texts in the clinical environ-


ment using Transformers, good results are obtained using pre-trained models like
Bert [5] and Roberta [7], obtaining a Pearson coefficient of 0.9010 for the second
when comparing clinical texts, being able to apply them for clinical applications
such as deduplication and summary of clinical texts.
Nowadays, most of the models are trained for the English language, but
more and more multilingual models or models trained for different tasks in other
languages are appearing recently [8]. Below, we list some of the NLP models that
are used in this study:

– BERT [5]: Model that represented a turning point in NLP [3], in which
Transformer architecture was used to obtain a better semantic meaning of
the whole text.
– mBERT [5]: Extension of the initial Bert to which multilanguage support
has been given for a total of 104 languages.
– BETO [4]: NLP model that has been prepared exclusively for Spanish. In
the study by Cañete et al. [4] better results have been obtained than using
mBert [5].

Continuing the progress, Reimers and Gurevych [9] presented a more


advanced architecture based on the Transformers architecture, which uses two
joined Transformers models. This has the main advantage of being able to work
with larger texts when comparing them semantically, as it is not necessary to join
the two texts as done with a classic Transformer model. This architecture reduces
dimensionality, making efficient use of resources when training the model. Given
the advantages offered by this architecture and the size of the text with which we
have worked, this architecture is selected for the present study. Figure 1 shows
the architecture proposal based on 2 Transformer models instead of 1.

Fig. 1. SBert architectures for classification tasks (left) and semantic comparison tasks
(right). Source: [9]
364 J. Farray Rodrı́guez et al.

The aforementioned models are open-source, but there are applications on


the market for calculating semantic similarity. We have selected 2 to make a
comparison with them.

– Retina API [2]: API from the company Cortical with support for 50 languages,
which provides different measures of similarity, such as cosine, Euclidean,
Jaccard, and some own distance metrics.
– Dandelion API [1]: Product of a startup based in Italy, which offers different
APIs for natural language processing, such as language detection or semantic
similarity.

3 Methodology

In order to achieve the objective of automatically grading open responses of no


more than 250 words, measuring their semantic similarity with an ideal response,
a SCRUM methodology has been followed. The main tasks carried out have been:

1. Analysis of the data source to be used in the project. This is a set of ques-
tions/answers from two subjects delivered at Universidad Internacional de
La Rioja (UNIR). The total number of records in the dataset is 240, with an
average response length of the teacher of 130 words and the student of 150
words, there being cases in which the student’s response is 405 words.
2. Identification of NLP Transformers models to use. Although the architecture
used to specialize the models has always been that of sentence transformers,
we have used four existing conjoined transformer models and two occurrences
that create the Siamese architecture from scratch defined as shown in Fig. 2.
A maximum input size of 256 words has been defined.
For the choice of these models, support for Spanish or multilanguage and/or
the number of tokens the models allow has been taken into account. Table 1
shows the selected models, where the S-ST type means new siamese trans-
former from the scratch and E-ST means existing Siamese Transformer mod-
els. Although in our datasets we have an average student response greater
than 128 words, models with this number of tokens have been chosen, since
they were multilingual models and the teacher’s average response is 130 words.
It is also worth mentioning that in the event that a sentence exceeds the maxi-
mum input size allowed by the model, the sentence will be truncated, affecting
the training of the models. In this study we have prioritized the average size of
the teacher’s response, understanding that every response should be correct
with that approximate size.
3. Specialization of NLP models, to which a fine-tuning phase has been applied.
For the training process of the models, the open-source library Hugging Face
(https://huggingface.com) has been used. For each of the Siamese models
described above, the training will be done for epochs 1, 5, 10, 30, 50 and 100.
In addition to the number of epochs, the other parameters used are:
– train loss: As a loss evaluation function we selected the cosine similarity.
Semantic Similarity Between Medium-Sized Texts 365

Table 1. Selected Transformer Models.

Id Model Type Tokens n◦


A sentence-transformers/all-distilroberta-v1 E-ST 512
B sentence-transformers/distiluse-base-multilingual-cased-v1 E-ST 128
C sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 E-ST 128
D sentence-transformers/paraphrase-multilingual-mpnet-base-v2 E-ST 128
E bert -base- multilingual - uncased [5] S-ST 256
F dccuchile/bert -base- spanish - wwm -uncased [4] S-ST 256

Fig. 2. Architecture followed in Siamese Transformers

– train dataloader: In order to make better use of memory, we did the


training in batches, defining the size of each batch as 10% of the size of
the training data set.
– steps per epoch: The default value has been taken, so the number of train-
ing steps per epoch will be the size of the training data set.
– warmup steps: Represents the number of training steps in which the learn-
ing rate will be maximum, after which the learning rate decreases linearly
until it reaches zero. It is defined as the number of steps until reaching
10% of the training data.
– evaluation steps: We gave it the value of 10% of the size of the training
dataset.
– save best model: set with the value True, in order to save the best model
evaluated.
366 J. Farray Rodrı́guez et al.

4. Application development. To facilitate the task, an application was developed


using the gradio SDK (https://gradio.app), whose main purpose is to make
use of the obtained models. This application allows one to compare one pair
of texts or a battery of texts. This second point serves as a support tool for
evaluation for both teachers and students.
5. Evaluation and comparison of the different NLP models.
– Quantitative comparative evaluation. On the one hand, we have chosen
the Pearson, Spearman, and Kendall correlations to measure the correla-
tion between the teacher’s grade and the grade awarded (semantic simi-
larity) by the model. The purpose is to measure the relationship between
the teacher’s grade and the grade obtained using the models, regardless
of the error that we can obtain with the generated models. On the other
hand, the MAE, MSE, RMSE, RMSLE and R2 metrics are used to eval-
uate the error made compared to the teacher’s grade. See Evaluations 1
and 2 in Sect. 4.
– Qualitative evaluation with random examples, where we have also wanted
to test extreme cases such as negations and antonyms. See Evaluation 3
in Sect. 4.
– Comparative evaluation between the best model obtained in this study
and 2 existing applications in the market (Retina API [2] and Dandelion
API [1]). See Evaluation 4 in Sect. 4.

4 Results
Evaluation 1. Considering the Pearson correlation (See Table 2), better results
are obtained in the models previously trained with the Spanish language, and
even better if we look at the Beto model, which is a model specifically designed
for the Spanish language and not multilingual.
Observing the Spearman correlation (See Table 3), although the coefficients
of this correlation are not so close to 1, a behavior similar to that with the
Pearson coefficient is observed, having better results when we use multilingual
models and even better in the case of the model based on Beto.

Table 2. Pearson correlation of the models obtained.

Epochs A B C D E F
1 0.76 0.72 0.67 0.76 0.77 0.80
5 0.75 0.81 0.66 0.76 0.77 0.81
10 0.76 0.79 0.69 0.81 0.77 0.78
30 0.77 0.81 0.72 0.78 0.80 0.80
50 0.76 0.78 0.73 0.75 0.82 0.81
100 0.77 0.81 0.74 0.77 0.77 0.82
Semantic Similarity Between Medium-Sized Texts 367

Table 3. Spearman correlation of the models obtained.

Epochs A B C D E F
1 0.39 0.32 0.13 0.31 0.44 0.52
5 0.35 0.56 0.15 0.45 0.43 0.50
10 0.36 0.49 0.15 0.59 0.51 0.43
30 0.43 0.63 0.32 0.52 0.59 0.51
50 0.37 0.48 0.33 0.44 0.63 0.56
100 0.38 0.66 0.37 0.48 0.43 0.54

Considering the models trained with 50 epochs as those that offer a better
balance between results and computational cost, the correlation plots including
Pearson, Spearman and Kendall coefficients for these models are shown in Fig. 3.

Evaluation 2. MAE, MSE, RMSE, RMSLE, R2


It is observed that the models that present the best metrics are those trained
with 50 epochs (Table 4), behaving better the models created with the Siamese
architecture from scratch. The model with the best metrics is the one based on
Beto [4] followed by mBert [5].

Evaluation 3. Negations
We studied the semantic similarity in those cases in which one of the texts
to be compared semantically is the negation of the other. Since the goal is to
rate an answer, a denial can mean a complete change in the rating. For example,
for the question “Is Spain on the European continent?”, the student’s answer
could be “Spain is on the European continent” or “Spain is not on the European
continent”. Both sentences are very similar but mean the complete opposite.
Analyzing the semantic similarity using the trained models based on BETO
returns a semantic similarity of 0.783, a value that would tell us that these texts
have a lot in common in terms of semantic meaning.
As an extension of this point, we can also include affirmation and denial in
the same sentence.

Evaluation 4. Although in the quantitative evaluation, the models such as


mBert or Beto had better results, the qualitative perception has been that the
paraphrase-multilingual-MiniLM-L12-V2 model worked better. That is why this
model has been chosen for carrying out certain random tests. In Table 5 we
show some tested cases for the paraphrase-multilingual-MiniLM-L12-V2 model,
comparing its results with 2 existing applications on the market such as Retina
API [2] and Dandelion API [1]. In case 3, we wanted to test how the models
would behave for antonymous words.
368 J. Farray Rodrı́guez et al.

Fig. 3. Pearson, Spearman, and Kendall correlation for models trained with 50 Epochs
Semantic Similarity Between Medium-Sized Texts 369

Table 4. Metrics MAE, MSE, RMSE, RMSLE, MAPE and R2 of the models obtained.

all-distilroberta-v1
epochs MAE MSE RMSE RMSLE R2
1 0.149 0.033 0.182 0.012 0.575
5 0.156 0.036 0.190 0.012 0.537
10 0.151 0.035 0.186 - 0.555
30 0.142 0.032 0.178 0.011 0.592
50 0.149 0.033 0.182 - 0.573
100 0.145 0.032 0.180 0.011 0.585
distiluse-base-multilingual-cased-v1
epochs MAE MSE RMSE RMSLE R2
1 0.177 0.044 0.209 - 0.438
5 0.137 0.028 0.167 - 0.642
10 0.144 0.030 0.173 0.011 0.616
30 0.129 0.027 0.165 - 0.652
50 0.141 0.030 0.175 0.011 0.609
100 0.129 0.027 0.164 - 0.653
paraphrase-multilingual-MiniLM-L12-v2
epochs MAE MSE RMSE RMSLE R2
1 0.170 0.045 0.213 0.016 0.418
5 0.172 0.046 0.213 0.016 0.415
10 0.165 0.043 0.207 0.015 0.450
30 0.153 0.039 0.197 - 0.503
50 0.149 0.037 0.193 0.013 0.524
100 0.152 0.037 0.191 - 0.529
paraphrase-multilingual-mpnet-base-v2
epochs MAE MSE RMSE RMSLE R2
1 0.153 0.037 0.191 0.014 0.531
5 0.158 0.035 0.186 0.012 0.556
10 0.141 0.029 0.169 - 0.633
30 0.148 0.033 0.182 - 0.575
50 0.154 0.037 0.193 0.013 0.520
100 0.143 0.033 0.182 - 0.576
mbert
epochs MAE MSE RMSE RMSLE R2
1 0.183 0.060 0.244 - 0.235
5 0.188 0.064 0.252 - 0.182
10 0.191 0.062 0.249 - 0.201
30 0.134 0.029 0.171 0.010 0.626
50 0.133 0.028 0.169 - 0.635
100 0.186 0.063 0.251 0.021 0.190
beto
epochs MAE MSE RMSE RMSLE R2
1 0.154 0.033 0.182 0.012 0.574
5 0.139 0.027 0.165 0.009 0.652
10 0.139 0.032 0.179 0.012 0.586
30 0.139 0.030 0.173 - 0.614
50 0.130 0.027 0.164 0.010 0.654
100 0.154 0.034 0.185 0.011 0.562
370 J. Farray Rodrı́guez et al.

Table 5. Cases tested at random.

semantic Cortical dandelion


similarity IO
CASE 1
Colón discovered America
Colón discovered Japan 0.645 0.71 0.64
Colón discovered India 0.749 1 0 .69
Colón discovered the American continent 0.966 0.79 1
Colón found the American continent 0.952 0.58 1
CASE 2
The numbers being multiplied are known as factors, while the result of the multiplication is known
as the product
The factors are the numbers that are multiplied and the result is the product 0.87 0.79 1
The factors are not the numbers being multiplied and the result is not the 0.659 0.79 1
product
The factors are the numbers that are multiplied and the result is the dividend. 0.639 0.59 0 .92
CASE 3 (ANTONYMS)
Solving algorithm problems is easy
Solving algorithmic problems is not difficult 0.847 0.71 1
Solving algorithmic problems is easy 0.972 0.66 0 .85
Solving algorithmic problems is difficult 0.686 0.71 1
CASE 4
The cell is the morphological and functional unit of all living things. The cell is the smallest element
that can be considered alive. Living organisms can be classified according to the number of cells
they have: if they only have one, they are called unicellular; if they have more, they are called
multicellular
The cell is the morphological and functional unit of living beings. It is the 0.973 0.78 0 .95
smallest living element. Living organisms are classified according to the num-
ber of cells into unicellular, if they only have one, or multicellular, if they have
more.
The cell is the functional unit of living beings. Living organisms are classified 0.917 0.65 0.82
as unicellular, when they have one cell, or multicellular, when they have more.
The cell is the functional unit of living beings. Living organisms are classified 0.92 0.65 0 .82
as unicellular, when they have one cell, or multicellular, when they have more.
The cell is the smallest unit of living things. Cells are classified into univocal 0.878 0.59 0 .68
and multicellular.
The cell is the morphological and functional unit of all living things. The cell 0.832 0.59 1
is the largest element that can be considered alive.

5 Discussion

The models trained with 50 epochs are the ones with better metrics. Within
these, the best ones are the Siamese models built from scratch, being first the
one based on Beto [4] followed by mBert [5]. This may be because they were
built with a more complex architecture, adding an extra dense layer (see Fig. 2).
In Fig. 3 we show Pearson, Spearman, and Kendall coefficients for these models.
For the Pearson coefficient, a linear dependence between the teacher’s qualifi-
cation and the semantic similarity of at least 0.81 is obtained. Considering the
rank correlation, values of at least 0.56 are obtained for Spearman and 0.49 for
Kendall. This Kendall value is obtained for the siamese model based on mBert,
so we can conclude that if we order the students’ answers according to the grade
given by the teacher, we find that this order is respected in 49% of the cases.
Semantic Similarity Between Medium-Sized Texts 371

Analyzing correlations from Fig. 3 we see 2 important points:


– The models are not able to give good similarities, even when the teacher’s
grade is high. This is mainly because for the model to give a 1 as semantic
similarity, the 2 texts must be exactly the same.
– We see quite a few cases in the intermediate zone of the graphs, in which the
models give a better semantic similarity or “mark” than the one assigned by
the teacher.
– In addition, we observed that better results are obtained with the mBert
model than with the Beto model (model trained specifically for Spanish),
although this may be due to the size of the corpus used to train said models.
For this reason, the translation between semantic similarity and note could
not be done directly, so it would be necessary to investigate a mechanism
to carry out the said translation. To carry out this translation, other factors
that may influence the qualification must also be taken into account, such as
misspellings.
It is worth mentioning that the qualitative evaluation carried out by the
human expert shows that our best model is paraphrase-multilingual-MiniLM-
L12-V2, having performance comparable to that of commercial applications.
In the qualitative test, we worked with short texts, with no truncation. The
model chosen by the human evaluator seems to have better results than one of
the models that work with 128 tokens, so truncation could have affected these
models and led to lower performance in the quantitative evaluation.

6 Conclusion
Starting from an architecture of Siamese Transformers models, a relatively mod-
ern architecture and very useful for the case at hand, where we want to measure
the similarity of two medium-size text inputs, this study delves into:
– Dealing with medium-sized texts leads to greater complexity and dimension-
ality of the models, which is why the architecture adopted is very important,
directly impacting the performance of the models and mainly their training.
– Working with texts in Spanish since most research work is in English.
Putting emphasis on these 2 challenges, the relevance of the study lies in
the union of both, that is working with medium-sized texts in Spanish for the
semantic comparison between them. Analyzing the results obtained in detail,
we see that the models obtained, although they have an acceptable performance
(Pearson correlation around 82% for the best two), are far from being a solution
that can be used autonomously without human review. In relation to this, it is
necessary to take into account the volume of data that has been used to train
the models, with a total of 240 labeled question-answers. With this volume of
data, it has been possible to assess, in a certain way, if the proposed architecture
solution would be valid, but it would be advisable to train the models with
the largest volume of labeled data. In addition to starting with a larger volume
372 J. Farray Rodrı́guez et al.

of data, it would be interesting if we had the teacher’s response and equally


valid alternatives to it. This could help a better calibration of the model while
training.
Although we still cannot use the models autonomously, it has been a starting
point to detect and delve into possible branches of future work, such as:
– Studying the possibility of resolving the problem of comparing medium-sized
texts breaking the text into smaller sentences and using these to obtain seman-
tic similarity.
– Deepen into how the truncation of texts affects the calculation of semantic
similarity
– Deepen into how negations and antonyms affect the calculation of semantic
similarity.
– Integrate semantic similarity models with other models, each of them with the
purpose of evaluating a specific point (spelling mistakes, semantic similarity,
writing style, denials, etc.).
– Investigate the possibility of not only giving a mark on an exam but also
giving feedback to the student with which they can know where they have
failed and how to improve both the completion of an exam and their study
techniques.

Acknowledgements. This Work is partially funded by the PLeNTaS project,


“Proyectos I+D+i 2019”, PID2019-111430RB-I00/AEI/10.13039/501100011033, and
by the EVATECO-PLN project, Proyecto PROPIO UNIR, projectId B0036.

References
1. Dandelion API. https://dandelion.eu/semantic-text/text-similarity-demo. Acce-
ssed 28 Feb 2023
2. Retina API. https://www.Cortical.Io/Retina-Api-Documentation. Accessed 28
Feb 2023
3. Babić, K., Guerra, F., Martinčić-Ipšić, S., Meštrović, A.: A comparison of
approaches for measuring the semantic similarity of short texts based on word
embeddings. J. Inf. Organ. Sci. 44(2) (2020). https://doi.org/10.31341/jios.44.2.2,
https://jios.foi.hr/index.php/jios/article/view/142
4. Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-
trained BERT model and evaluation data. Pml4dc ICLR 2020(2020), 1–10 (2020)
5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep
bidirectional transformers for language understanding (2018). https://doi.org/10.
48550/ARXIV.1810.04805, https://arxiv.org/abs/1810.04805
6. Gonçalo Oliveira, H., Sousa, T., Alves, A.: Assessing lexical-semantic regularities
in portuguese word embeddings. Int. J. Interact. Multimed. Artif. Intell. 6, 34 (03
2021). https://doi.org/10.9781/ijimai.2021.02.006
7. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019).
https://doi.org/10.48550/ARXIV.1907.11692, https://arxiv.org/abs/1907.11692
8. Qiu, X., Sun, T., X.Y., et al.: Pre-trained models for natural language processing:
a survey. Sci. China Technol. Sci. 63, 1872–1897 (2020). https://doi.org/10.1007/
s11431-020-1647-3
Semantic Similarity Between Medium-Sized Texts 373

9. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese


BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for
Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-
1410, https://aclanthology.org/D19-1410
10. de la Rosa, J., Ponferrada, E., Villegas, P., González de Prado Salas, P., Romero,
M., Grandury, M.: BERTIN: Efficient pre-training of a Spanish language model
using perplexity sampling. Procesamiento Lenguaje Nat. 68, 13–23 (2022). http://
journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6403
11. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances
in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017).
https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845
aa-Paper.pdf
12. Yang, X., He, X., Zhang, H., Ma, Y., Bian, J., Wu, Y.: Measurement of seman-
tic textual similarity in clinical texts: comparison of transformer-based models.
JMIR Med. Inform. 8(11), e19735 (2020). https://doi.org/10.2196/19735, http://
medinform.jmir.org/2020/11/e19735/
13. Zhang, L., Huang, Y., Yang, X., Yu, S., Zhuang, F.: An automatic short-answer
grading model for semi-open-ended questions. Interact. Learn. Environ. 30(1), 177–
190 (2022). https://doi.org/10.1080/10494820.2019.1648300

You might also like