0% found this document useful (0 votes)

22 views13 pages

Semantic Similarity Between Medium-Sized Texts

Uploaded by

Dongmin Jeong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views13 pages

Semantic Similarity Between Medium-Sized Texts

Uploaded by

Dongmin Jeong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Semantic Similarity Between

Medium-Sized Texts

Jacobo Farray Rodrı́guez(B) , Antonio Jesús Fernández-Garcı́a ,

and Elena Verdú

Universidad Internacional de La Rioja, Logroño, Spain

[email protected]
https://gruposinvestigacion.unir.net/dds/

Abstract. Semantically comparing texts is a task that is useful in var-

ious fields, such as the automatic correction of exams and/or activities.
Making use of Natural Language Processing (NLP) and deep learning
techniques, the correction task can be facilitated for the teacher, so that
a greater number of knowledge tests can be offered to the student. The
objective of this work is to semantically compare texts in order to be able
to evaluate the student’s knowledge automatically. For this, models will
be built based on Transformers architectures, specialized in the Spanish
language and in 2 subjects. These models can be used and evaluated
through an application. After using the different models to measure the
similarity between a set of student’s answers and the ideal answer pro-
vided by the teacher, a Pearson correlation coefficient greater than 80%
is obtained when comparing the similarity measured and the teacher’s
grade. Given the Pearson correlation obtained, a MAE of 0.13 and an
RMSE of 0.17, it is concluded that the models obtained would serve
as an evaluation guide for both the teaching team and the students in
the medium term, opening the door to further research to create an
autonomous system in the long-term.

Keywords: deep learning · automatic correction · automatic short

answer grading · semantic comparison · semantic similarity ·
transformers

1 Introduction
Semantically comparing texts involves analyzing their meaning in different con-
texts and can be very useful for a variety of applications such as sentiment
analysis, plagiarism detection, or content analysis, among others. Education is
not unfamiliar with this field. We find ourselves with the need to correct exams
and/or activities, this being a necessary activity to validate the knowledge of
the examinees.
Current natural language processing techniques allow us to obtain a semantic
meaning of sentences, in such a way that we can find similarities of sentences writ-
ten differently, but with similar semantics. While English may be more advanced
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Chbeir et al. (Eds.): MEDES 2023, CCIS 2022, pp. 361–373, 2024.
https://doi.org/10.1007/978-3-031-51643-6_26
362 J. Farray Rodrı́guez et al.

in terms of Natural Language Processing (NLP) technologies due to the avail-

ability of large amounts of data used to create large models, there is a growing
body of work in other languages like Spanish or Portuguese, and we can expect
to see continued progress in this area in the coming years [4,6,10].
This study aims to make a contribution to the field of Education and NLP
in the Spanish language, the second most spoken language in the world. Specif-
ically, we focus on the automatic correction of exams/activities. The purpose of
this research paper is to investigate the effectiveness of using NLP for the cor-
rections of questions whose answers are around 200–250 words. These answers
will be compared with the ideal answer provided by the teacher. To perform the
comparison task our proposal is based on an architecture of Siamese Transform-
ers models [9], architecture that reduces dimensionality and makes efficient use
of resources when training the model.
This context led us to pose the following research question:

RQ How accurate can current models based on Transformers architectures be for

semantically comparing Spanish language texts in the context of educational
assessment?

For this study, a question/answer dataset from 2 subjects delivered at Univer-

sidad Internacional de La Rioja was used. Using them, specialized NLP models
were created, which were benchmarked both quantitatively and qualitatively.
The rest of this communication is organized as follows. Section 2 reviews
some related projects that involve measuring the similarity of texts. Section 3
provides an overview of the approach methodology. Section 4 shows the quanti-
tative and qualitative results obtained, which are discussed in Sect. 5. Finally,
some conclusions and further considerations are summarised in Sect. 6.

2 State of Art
This chapter presents the recent progress in the calculation of semantic similar-
ity between texts, from the more classical approaches, such as those based on
the similarity of vectors created from a latent probabilistic semantic analysis,
generally using the cosine distance as a similarity metric [13] or variants of word
embeddings models [3], to new approaches with the advent of machine learning.
With machine learning, a lot of work has been done taking into account the con-
text of the text to carry out NLP tasks. The calculation of semantic similarity
is not immune to this trend. That is why the ﬁrst approaches applied recurrent
networks as long short-term memory (LSTM) ones [13], which were able to learn
long-term dependencies in the text until the presentation of the Transformers
architecture [11]. Transformers architecture has meant a turning point within
the NLP, where the models have the novelty of the replacement of the recurrent
layers by the so-called attention layers. These layers remove the recursion that
LSTM has, so sentences are processed as a whole (positional encoding) instead
of word by word. With this, it reduces the complexity and allows parallelization,
thus improving the eﬃciency of the calculation.
Semantic Similarity Between Medium-Sized Texts 363

In a study [12] of semantic comparison of English texts in the clinical environ-

ment using Transformers, good results are obtained using pre-trained models like
Bert [5] and Roberta [7], obtaining a Pearson coeﬃcient of 0.9010 for the second
when comparing clinical texts, being able to apply them for clinical applications
such as deduplication and summary of clinical texts.
Nowadays, most of the models are trained for the English language, but
more and more multilingual models or models trained for diﬀerent tasks in other
languages are appearing recently [8]. Below, we list some of the NLP models that
are used in this study:

– BERT [5]: Model that represented a turning point in NLP [3], in which
Transformer architecture was used to obtain a better semantic meaning of
the whole text.
– mBERT [5]: Extension of the initial Bert to which multilanguage support
has been given for a total of 104 languages.
– BETO [4]: NLP model that has been prepared exclusively for Spanish. In
the study by Cañete et al. [4] better results have been obtained than using
mBert [5].

Continuing the progress, Reimers and Gurevych [9] presented a more

advanced architecture based on the Transformers architecture, which uses two
joined Transformers models. This has the main advantage of being able to work
with larger texts when comparing them semantically, as it is not necessary to join
the two texts as done with a classic Transformer model. This architecture reduces
dimensionality, making eﬃcient use of resources when training the model. Given
the advantages oﬀered by this architecture and the size of the text with which we
have worked, this architecture is selected for the present study. Figure 1 shows
the architecture proposal based on 2 Transformer models instead of 1.

Fig. 1. SBert architectures for classiﬁcation tasks (left) and semantic comparison tasks
(right). Source: [9]
364 J. Farray Rodrı́guez et al.

The aforementioned models are open-source, but there are applications on

the market for calculating semantic similarity. We have selected 2 to make a
comparison with them.

– Retina API [2]: API from the company Cortical with support for 50 languages,
which provides different measures of similarity, such as cosine, Euclidean,
Jaccard, and some own distance metrics.
– Dandelion API [1]: Product of a startup based in Italy, which offers different
APIs for natural language processing, such as language detection or semantic
similarity.

3 Methodology

In order to achieve the objective of automatically grading open responses of no

more than 250 words, measuring their semantic similarity with an ideal response,
a SCRUM methodology has been followed. The main tasks carried out have been:

1. Analysis of the data source to be used in the project. This is a set of ques-
tions/answers from two subjects delivered at Universidad Internacional de
La Rioja (UNIR). The total number of records in the dataset is 240, with an
average response length of the teacher of 130 words and the student of 150
words, there being cases in which the student’s response is 405 words.
2. Identification of NLP Transformers models to use. Although the architecture
used to specialize the models has always been that of sentence transformers,
we have used four existing conjoined transformer models and two occurrences
that create the Siamese architecture from scratch defined as shown in Fig. 2.
A maximum input size of 256 words has been defined.
For the choice of these models, support for Spanish or multilanguage and/or
the number of tokens the models allow has been taken into account. Table 1
shows the selected models, where the S-ST type means new siamese trans-
former from the scratch and E-ST means existing Siamese Transformer mod-
els. Although in our datasets we have an average student response greater
than 128 words, models with this number of tokens have been chosen, since
they were multilingual models and the teacher’s average response is 130 words.
It is also worth mentioning that in the event that a sentence exceeds the maxi-
mum input size allowed by the model, the sentence will be truncated, affecting
the training of the models. In this study we have prioritized the average size of
the teacher’s response, understanding that every response should be correct
with that approximate size.
3. Specialization of NLP models, to which a fine-tuning phase has been applied.
For the training process of the models, the open-source library Hugging Face
(https://huggingface.com) has been used. For each of the Siamese models
described above, the training will be done for epochs 1, 5, 10, 30, 50 and 100.
In addition to the number of epochs, the other parameters used are:
– train loss: As a loss evaluation function we selected the cosine similarity.
Semantic Similarity Between Medium-Sized Texts 365

Table 1. Selected Transformer Models.

Id Model Type Tokens n◦

A sentence-transformers/all-distilroberta-v1 E-ST 512
B sentence-transformers/distiluse-base-multilingual-cased-v1 E-ST 128
C sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 E-ST 128
D sentence-transformers/paraphrase-multilingual-mpnet-base-v2 E-ST 128
E bert -base- multilingual - uncased [5] S-ST 256
F dccuchile/bert -base- spanish - wwm -uncased [4] S-ST 256

Fig. 2. Architecture followed in Siamese Transformers

– train dataloader: In order to make better use of memory, we did the

training in batches, deﬁning the size of each batch as 10% of the size of
the training data set.
– steps per epoch: The default value has been taken, so the number of train-
ing steps per epoch will be the size of the training data set.
– warmup steps: Represents the number of training steps in which the learn-
ing rate will be maximum, after which the learning rate decreases linearly
until it reaches zero. It is deﬁned as the number of steps until reaching
10% of the training data.
– evaluation steps: We gave it the value of 10% of the size of the training
dataset.
– save best model: set with the value True, in order to save the best model
evaluated.
366 J. Farray Rodrı́guez et al.

4. Application development. To facilitate the task, an application was developed

using the gradio SDK (https://gradio.app), whose main purpose is to make
use of the obtained models. This application allows one to compare one pair
of texts or a battery of texts. This second point serves as a support tool for
evaluation for both teachers and students.
5. Evaluation and comparison of the diﬀerent NLP models.
– Quantitative comparative evaluation. On the one hand, we have chosen
the Pearson, Spearman, and Kendall correlations to measure the correla-
tion between the teacher’s grade and the grade awarded (semantic simi-
larity) by the model. The purpose is to measure the relationship between
the teacher’s grade and the grade obtained using the models, regardless
of the error that we can obtain with the generated models. On the other
hand, the MAE, MSE, RMSE, RMSLE and R2 metrics are used to eval-
uate the error made compared to the teacher’s grade. See Evaluations 1
and 2 in Sect. 4.
– Qualitative evaluation with random examples, where we have also wanted
to test extreme cases such as negations and antonyms. See Evaluation 3
in Sect. 4.
– Comparative evaluation between the best model obtained in this study
and 2 existing applications in the market (Retina API [2] and Dandelion
API [1]). See Evaluation 4 in Sect. 4.

4 Results
Evaluation 1. Considering the Pearson correlation (See Table 2), better results
are obtained in the models previously trained with the Spanish language, and
even better if we look at the Beto model, which is a model specifically designed
for the Spanish language and not multilingual.
Observing the Spearman correlation (See Table 3), although the coefficients
of this correlation are not so close to 1, a behavior similar to that with the
Pearson coefficient is observed, having better results when we use multilingual
models and even better in the case of the model based on Beto.

Table 2. Pearson correlation of the models obtained.

Epochs A B C D E F
1 0.76 0.72 0.67 0.76 0.77 0.80
5 0.75 0.81 0.66 0.76 0.77 0.81
10 0.76 0.79 0.69 0.81 0.77 0.78
30 0.77 0.81 0.72 0.78 0.80 0.80
50 0.76 0.78 0.73 0.75 0.82 0.81
100 0.77 0.81 0.74 0.77 0.77 0.82
Semantic Similarity Between Medium-Sized Texts 367

Table 3. Spearman correlation of the models obtained.

Epochs A B C D E F
1 0.39 0.32 0.13 0.31 0.44 0.52
5 0.35 0.56 0.15 0.45 0.43 0.50
10 0.36 0.49 0.15 0.59 0.51 0.43
30 0.43 0.63 0.32 0.52 0.59 0.51
50 0.37 0.48 0.33 0.44 0.63 0.56
100 0.38 0.66 0.37 0.48 0.43 0.54

Considering the models trained with 50 epochs as those that oﬀer a better
balance between results and computational cost, the correlation plots including
Pearson, Spearman and Kendall coeﬃcients for these models are shown in Fig. 3.

Evaluation 2. MAE, MSE, RMSE, RMSLE, R2

It is observed that the models that present the best metrics are those trained
with 50 epochs (Table 4), behaving better the models created with the Siamese
architecture from scratch. The model with the best metrics is the one based on
Beto [4] followed by mBert [5].

Evaluation 3. Negations
We studied the semantic similarity in those cases in which one of the texts
to be compared semantically is the negation of the other. Since the goal is to
rate an answer, a denial can mean a complete change in the rating. For example,
for the question “Is Spain on the European continent?”, the student’s answer
could be “Spain is on the European continent” or “Spain is not on the European
continent”. Both sentences are very similar but mean the complete opposite.
Analyzing the semantic similarity using the trained models based on BETO
returns a semantic similarity of 0.783, a value that would tell us that these texts
have a lot in common in terms of semantic meaning.
As an extension of this point, we can also include aﬃrmation and denial in
the same sentence.

Evaluation 4. Although in the quantitative evaluation, the models such as

mBert or Beto had better results, the qualitative perception has been that the
paraphrase-multilingual-MiniLM-L12-V2 model worked better. That is why this
model has been chosen for carrying out certain random tests. In Table 5 we
show some tested cases for the paraphrase-multilingual-MiniLM-L12-V2 model,
comparing its results with 2 existing applications on the market such as Retina
API [2] and Dandelion API [1]. In case 3, we wanted to test how the models
would behave for antonymous words.
368 J. Farray Rodrı́guez et al.

Fig. 3. Pearson, Spearman, and Kendall correlation for models trained with 50 Epochs
Semantic Similarity Between Medium-Sized Texts 369

Table 4. Metrics MAE, MSE, RMSE, RMSLE, MAPE and R2 of the models obtained.

all-distilroberta-v1
epochs MAE MSE RMSE RMSLE R2
1 0.149 0.033 0.182 0.012 0.575
5 0.156 0.036 0.190 0.012 0.537
10 0.151 0.035 0.186 - 0.555
30 0.142 0.032 0.178 0.011 0.592
50 0.149 0.033 0.182 - 0.573
100 0.145 0.032 0.180 0.011 0.585
distiluse-base-multilingual-cased-v1
epochs MAE MSE RMSE RMSLE R2
1 0.177 0.044 0.209 - 0.438
5 0.137 0.028 0.167 - 0.642
10 0.144 0.030 0.173 0.011 0.616
30 0.129 0.027 0.165 - 0.652
50 0.141 0.030 0.175 0.011 0.609
100 0.129 0.027 0.164 - 0.653
paraphrase-multilingual-MiniLM-L12-v2
epochs MAE MSE RMSE RMSLE R2
1 0.170 0.045 0.213 0.016 0.418
5 0.172 0.046 0.213 0.016 0.415
10 0.165 0.043 0.207 0.015 0.450
30 0.153 0.039 0.197 - 0.503
50 0.149 0.037 0.193 0.013 0.524
100 0.152 0.037 0.191 - 0.529
paraphrase-multilingual-mpnet-base-v2
epochs MAE MSE RMSE RMSLE R2
1 0.153 0.037 0.191 0.014 0.531
5 0.158 0.035 0.186 0.012 0.556
10 0.141 0.029 0.169 - 0.633
30 0.148 0.033 0.182 - 0.575
50 0.154 0.037 0.193 0.013 0.520
100 0.143 0.033 0.182 - 0.576
mbert
epochs MAE MSE RMSE RMSLE R2
1 0.183 0.060 0.244 - 0.235
5 0.188 0.064 0.252 - 0.182
10 0.191 0.062 0.249 - 0.201
30 0.134 0.029 0.171 0.010 0.626
50 0.133 0.028 0.169 - 0.635
100 0.186 0.063 0.251 0.021 0.190
beto
epochs MAE MSE RMSE RMSLE R2
1 0.154 0.033 0.182 0.012 0.574
5 0.139 0.027 0.165 0.009 0.652
10 0.139 0.032 0.179 0.012 0.586
30 0.139 0.030 0.173 - 0.614
50 0.130 0.027 0.164 0.010 0.654
100 0.154 0.034 0.185 0.011 0.562
370 J. Farray Rodrı́guez et al.

Table 5. Cases tested at random.

semantic Cortical dandelion

similarity IO
CASE 1
Colón discovered America
Colón discovered Japan 0.645 0.71 0.64
Colón discovered India 0.749 1 0 .69
Colón discovered the American continent 0.966 0.79 1
Colón found the American continent 0.952 0.58 1
CASE 2
The numbers being multiplied are known as factors, while the result of the multiplication is known
as the product
The factors are the numbers that are multiplied and the result is the product 0.87 0.79 1
The factors are not the numbers being multiplied and the result is not the 0.659 0.79 1
product
The factors are the numbers that are multiplied and the result is the dividend. 0.639 0.59 0 .92
CASE 3 (ANTONYMS)
Solving algorithm problems is easy
Solving algorithmic problems is not difficult 0.847 0.71 1
Solving algorithmic problems is easy 0.972 0.66 0 .85
Solving algorithmic problems is difficult 0.686 0.71 1
CASE 4
The cell is the morphological and functional unit of all living things. The cell is the smallest element
that can be considered alive. Living organisms can be classified according to the number of cells
they have: if they only have one, they are called unicellular; if they have more, they are called
multicellular
The cell is the morphological and functional unit of living beings. It is the 0.973 0.78 0 .95
smallest living element. Living organisms are classified according to the num-
ber of cells into unicellular, if they only have one, or multicellular, if they have
more.
The cell is the functional unit of living beings. Living organisms are classified 0.917 0.65 0.82
as unicellular, when they have one cell, or multicellular, when they have more.
The cell is the functional unit of living beings. Living organisms are classified 0.92 0.65 0 .82
as unicellular, when they have one cell, or multicellular, when they have more.
The cell is the smallest unit of living things. Cells are classified into univocal 0.878 0.59 0 .68
and multicellular.
The cell is the morphological and functional unit of all living things. The cell 0.832 0.59 1
is the largest element that can be considered alive.

5 Discussion

The models trained with 50 epochs are the ones with better metrics. Within
these, the best ones are the Siamese models built from scratch, being first the
one based on Beto [4] followed by mBert [5]. This may be because they were
built with a more complex architecture, adding an extra dense layer (see Fig. 2).
In Fig. 3 we show Pearson, Spearman, and Kendall coefficients for these models.
For the Pearson coefficient, a linear dependence between the teacher’s qualifi-
cation and the semantic similarity of at least 0.81 is obtained. Considering the
rank correlation, values of at least 0.56 are obtained for Spearman and 0.49 for
Kendall. This Kendall value is obtained for the siamese model based on mBert,
so we can conclude that if we order the students’ answers according to the grade
given by the teacher, we find that this order is respected in 49% of the cases.
Semantic Similarity Between Medium-Sized Texts 371

Analyzing correlations from Fig. 3 we see 2 important points:

– The models are not able to give good similarities, even when the teacher’s
grade is high. This is mainly because for the model to give a 1 as semantic
similarity, the 2 texts must be exactly the same.
– We see quite a few cases in the intermediate zone of the graphs, in which the
models give a better semantic similarity or “mark” than the one assigned by
the teacher.
– In addition, we observed that better results are obtained with the mBert
model than with the Beto model (model trained specifically for Spanish),
although this may be due to the size of the corpus used to train said models.
For this reason, the translation between semantic similarity and note could
not be done directly, so it would be necessary to investigate a mechanism
to carry out the said translation. To carry out this translation, other factors
that may influence the qualification must also be taken into account, such as
misspellings.
It is worth mentioning that the qualitative evaluation carried out by the
human expert shows that our best model is paraphrase-multilingual-MiniLM-
L12-V2, having performance comparable to that of commercial applications.
In the qualitative test, we worked with short texts, with no truncation. The
model chosen by the human evaluator seems to have better results than one of
the models that work with 128 tokens, so truncation could have affected these
models and led to lower performance in the quantitative evaluation.

6 Conclusion
Starting from an architecture of Siamese Transformers models, a relatively mod-
ern architecture and very useful for the case at hand, where we want to measure
the similarity of two medium-size text inputs, this study delves into:
– Dealing with medium-sized texts leads to greater complexity and dimension-
ality of the models, which is why the architecture adopted is very important,
directly impacting the performance of the models and mainly their training.
– Working with texts in Spanish since most research work is in English.
Putting emphasis on these 2 challenges, the relevance of the study lies in
the union of both, that is working with medium-sized texts in Spanish for the
semantic comparison between them. Analyzing the results obtained in detail,
we see that the models obtained, although they have an acceptable performance
(Pearson correlation around 82% for the best two), are far from being a solution
that can be used autonomously without human review. In relation to this, it is
necessary to take into account the volume of data that has been used to train
the models, with a total of 240 labeled question-answers. With this volume of
data, it has been possible to assess, in a certain way, if the proposed architecture
solution would be valid, but it would be advisable to train the models with
the largest volume of labeled data. In addition to starting with a larger volume
372 J. Farray Rodrı́guez et al.

of data, it would be interesting if we had the teacher’s response and equally

valid alternatives to it. This could help a better calibration of the model while
training.
Although we still cannot use the models autonomously, it has been a starting
point to detect and delve into possible branches of future work, such as:
– Studying the possibility of resolving the problem of comparing medium-sized
texts breaking the text into smaller sentences and using these to obtain seman-
tic similarity.
– Deepen into how the truncation of texts affects the calculation of semantic
similarity
– Deepen into how negations and antonyms affect the calculation of semantic
similarity.
– Integrate semantic similarity models with other models, each of them with the
purpose of evaluating a specific point (spelling mistakes, semantic similarity,
writing style, denials, etc.).
– Investigate the possibility of not only giving a mark on an exam but also
giving feedback to the student with which they can know where they have
failed and how to improve both the completion of an exam and their study
techniques.

Acknowledgements. This Work is partially funded by the PLeNTaS project,

“Proyectos I+D+i 2019”, PID2019-111430RB-I00/AEI/10.13039/501100011033, and
by the EVATECO-PLN project, Proyecto PROPIO UNIR, projectId B0036.

References
1. Dandelion API. https://dandelion.eu/semantic-text/text-similarity-demo. Acce-
ssed 28 Feb 2023
2. Retina API. https://www.Cortical.Io/Retina-Api-Documentation. Accessed 28
Feb 2023
3. Babić, K., Guerra, F., Martinčić-Ipšić, S., Meštrović, A.: A comparison of
approaches for measuring the semantic similarity of short texts based on word
embeddings. J. Inf. Organ. Sci. 44(2) (2020). https://doi.org/10.31341/jios.44.2.2,
https://jios.foi.hr/index.php/jios/article/view/142
4. Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-
trained BERT model and evaluation data. Pml4dc ICLR 2020(2020), 1–10 (2020)
5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep
bidirectional transformers for language understanding (2018). https://doi.org/10.
48550/ARXIV.1810.04805, https://arxiv.org/abs/1810.04805
6. Gonçalo Oliveira, H., Sousa, T., Alves, A.: Assessing lexical-semantic regularities
in portuguese word embeddings. Int. J. Interact. Multimed. Artif. Intell. 6, 34 (03
2021). https://doi.org/10.9781/ijimai.2021.02.006
7. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019).
https://doi.org/10.48550/ARXIV.1907.11692, https://arxiv.org/abs/1907.11692
8. Qiu, X., Sun, T., X.Y., et al.: Pre-trained models for natural language processing:
a survey. Sci. China Technol. Sci. 63, 1872–1897 (2020). https://doi.org/10.1007/
s11431-020-1647-3
Semantic Similarity Between Medium-Sized Texts 373

9. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese

BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for
Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-
1410, https://aclanthology.org/D19-1410
10. de la Rosa, J., Ponferrada, E., Villegas, P., González de Prado Salas, P., Romero,
M., Grandury, M.: BERTIN: Eﬃcient pre-training of a Spanish language model
using perplexity sampling. Procesamiento Lenguaje Nat. 68, 13–23 (2022). http://
journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6403
11. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances
in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017).
https://proceedings.neurips.cc/paper/2017/ﬁle/3f5ee243547dee91fbd053c1c4a845
aa-Paper.pdf
12. Yang, X., He, X., Zhang, H., Ma, Y., Bian, J., Wu, Y.: Measurement of seman-
tic textual similarity in clinical texts: comparison of transformer-based models.
JMIR Med. Inform. 8(11), e19735 (2020). https://doi.org/10.2196/19735, http://
medinform.jmir.org/2020/11/e19735/
13. Zhang, L., Huang, Y., Yang, X., Yu, S., Zhuang, F.: An automatic short-answer
grading model for semi-open-ended questions. Interact. Learn. Environ. 30(1), 177–
190 (2022). https://doi.org/10.1080/10494820.2019.1648300

Evaluating of Efficacy Semantic Similarity Methods
No ratings yet
Evaluating of Efficacy Semantic Similarity Methods
8 pages
2401.14559
No ratings yet
2401.14559
132 pages
A Survey of Text-Matching Techniques
No ratings yet
A Survey of Text-Matching Techniques
53 pages
3735974
No ratings yet
3735974
23 pages
Semantic Textual Similarity
No ratings yet
Semantic Textual Similarity
39 pages
Siamese_Neural_Networks_Method_for_Semantic_Requirements_Similarity_Detection
No ratings yet
Siamese_Neural_Networks_Method_for_Semantic_Requirements_Similarity_Detection
16 pages
Neural Machine Translation in Foreign Language Teaching and Learning a Systematic Review
No ratings yet
Neural Machine Translation in Foreign Language Teaching and Learning a Systematic Review
20 pages
Agarwal, Resume Shortlisting and Ranking with Transformers
No ratings yet
Agarwal, Resume Shortlisting and Ranking with Transformers
12 pages
Fine-Grained Visual Textual Alignment For Cross-Modal Retrieval Using Transformer Encoders
No ratings yet
Fine-Grained Visual Textual Alignment For Cross-Modal Retrieval Using Transformer Encoders
23 pages
PESTS: Persian - English Cross Lingual Corpus For Semantic Textual Similarity
No ratings yet
PESTS: Persian - English Cross Lingual Corpus For Semantic Textual Similarity
21 pages
Trend
No ratings yet
Trend
47 pages
Complexity and Resilience in The Social and Ecological Sciences (2019, Palgrave Macmillan UK)
100% (4)
Complexity and Resilience in The Social and Ecological Sciences (2019, Palgrave Macmillan UK)
288 pages
Sun 等 - 2022 - Sentence Similarity Based on Contexts
No ratings yet
Sun 等 - 2022 - Sentence Similarity Based on Contexts
16 pages
Moverscore: Text Generation Evaluating With Contextualized Embeddings and Earth Mover Distance
No ratings yet
Moverscore: Text Generation Evaluating With Contextualized Embeddings and Earth Mover Distance
16 pages
2108.06130v3
No ratings yet
2108.06130v3
9 pages
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
No ratings yet
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
8 pages
NLP Cook BOOK With Transformers
No ratings yet
NLP Cook BOOK With Transformers
27 pages
Nlp Project[1]
No ratings yet
Nlp Project[1]
16 pages
Meldas 64
No ratings yet
Meldas 64
352 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
English Paper
No ratings yet
English Paper
13 pages
Learning Text Similarity With Siamese Recurrent Networks: Paul Neculoiu, Maarten Versteegh Mihai Rotaru
No ratings yet
Learning Text Similarity With Siamese Recurrent Networks: Paul Neculoiu, Maarten Versteegh Mihai Rotaru
10 pages
ria_37.03_24
No ratings yet
ria_37.03_24
7 pages
Evolution of Semantic Similarity - A Survey
No ratings yet
Evolution of Semantic Similarity - A Survey
35 pages
Electronics 10 01372 With Cover
No ratings yet
Electronics 10 01372 With Cover
24 pages
Generative AI in the Era of Transformers
No ratings yet
Generative AI in the Era of Transformers
8 pages
10 1002@cpe 5971
No ratings yet
10 1002@cpe 5971
17 pages
Information 14 00242
No ratings yet
Information 14 00242
17 pages
2020.lrec-1.851
No ratings yet
2020.lrec-1.851
6 pages
NLP-proj
No ratings yet
NLP-proj
13 pages
A Survey of Numerous Text Similarity Approach
No ratings yet
A Survey of Numerous Text Similarity Approach
10 pages
cs224n Winter2023 Lecture1 Notes Draft
No ratings yet
cs224n Winter2023 Lecture1 Notes Draft
13 pages
NLP tutorial1
No ratings yet
NLP tutorial1
7 pages
Dynamic Embedding Projection-Gated
No ratings yet
Dynamic Embedding Projection-Gated
10 pages
Mridul 2021 Ijca 921582
No ratings yet
Mridul 2021 Ijca 921582
7 pages
Deep Learning For Semantic Similarity
No ratings yet
Deep Learning For Semantic Similarity
7 pages
Data Redundancy Using LSTM
No ratings yet
Data Redundancy Using LSTM
24 pages
CET REVIEWER - MATH (Answers)
No ratings yet
CET REVIEWER - MATH (Answers)
38 pages
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
No ratings yet
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
7 pages
A Review of Prompt-Free Few-Shot Text Classification Methods
No ratings yet
A Review of Prompt-Free Few-Shot Text Classification Methods
19 pages
A Review of Prompt-Free Few-Shot Text Classification Methods
No ratings yet
A Review of Prompt-Free Few-Shot Text Classification Methods
19 pages
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
No ratings yet
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
29 pages
Published Paper
No ratings yet
Published Paper
12 pages
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
No ratings yet
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
9 pages
Sentence Similarity Based On Semantic Networks
No ratings yet
Sentence Similarity Based On Semantic Networks
36 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
No ratings yet
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
13 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
AAAI06-123 (Revisar para Referencias)
No ratings yet
AAAI06-123 (Revisar para Referencias)
6 pages
A Cognitive Study On Semantic Similarity Analysis
No ratings yet
A Cognitive Study On Semantic Similarity Analysis
6 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
Goal Programming PPT Ok
No ratings yet
Goal Programming PPT Ok
52 pages
Linguistic Regularities in Continuous Space Word Representations
No ratings yet
Linguistic Regularities in Continuous Space Word Representations
6 pages
The Quartile For Grouped Data
100% (1)
The Quartile For Grouped Data
13 pages
Shapes Worksheets Pack
No ratings yet
Shapes Worksheets Pack
6 pages
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
No ratings yet
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
11 pages
50 Questions Linear Algebra Net Gate Aspirants PDF
No ratings yet
50 Questions Linear Algebra Net Gate Aspirants PDF
9 pages
Text-To-Text Semantic Similarity For Automatic Short Answer Grading
No ratings yet
Text-To-Text Semantic Similarity For Automatic Short Answer Grading
9 pages
Normalization in DBMS
No ratings yet
Normalization in DBMS
17 pages
PolicyGradient
No ratings yet
PolicyGradient
33 pages
CE-324 Lecture 1: Mohr Circle and Pole Method For Calculating Stresses
No ratings yet
CE-324 Lecture 1: Mohr Circle and Pole Method For Calculating Stresses
16 pages
Text Similarity Using Siamese Networks and Transformers
No ratings yet
Text Similarity Using Siamese Networks and Transformers
10 pages
Text Semantic Similarity
No ratings yet
Text Semantic Similarity
17 pages
Ch Complex
No ratings yet
Ch Complex
23 pages
A Comparison of Document Similarity Algorithms
No ratings yet
A Comparison of Document Similarity Algorithms
10 pages
Text Summarization and Conversion of Speech To Text
No ratings yet
Text Summarization and Conversion of Speech To Text
5 pages
ASTM D 945
No ratings yet
ASTM D 945
11 pages
Renato Cristin Heiddegger and Leibniz
100% (1)
Renato Cristin Heiddegger and Leibniz
10 pages
DLD
No ratings yet
DLD
14 pages
Grade 9 Seamo: Answer The Questions
100% (1)
Grade 9 Seamo: Answer The Questions
3 pages
SAT Math To Know in One Page PDF
No ratings yet
SAT Math To Know in One Page PDF
3 pages
DSP
No ratings yet
DSP
14 pages
Review On NLP Paraphrase Detection Approaches
No ratings yet
Review On NLP Paraphrase Detection Approaches
4 pages
Georefrencing GIS Lab
No ratings yet
Georefrencing GIS Lab
7 pages
Discussion
100% (3)
Discussion
3 pages
A Comparative Performance Analysis of Various CMOS Design Techniques For XOR and XNOR Circuits
No ratings yet
A Comparative Performance Analysis of Various CMOS Design Techniques For XOR and XNOR Circuits
10 pages
NSK & RHP Designation Systems
No ratings yet
NSK & RHP Designation Systems
64 pages
Multiple regression
No ratings yet
Multiple regression
8 pages
Assignment 2 Statistics 103 Salem Mesiya Hadedeaniel C. Piala
No ratings yet
Assignment 2 Statistics 103 Salem Mesiya Hadedeaniel C. Piala
6 pages
CCH 13
No ratings yet
CCH 13
21 pages
QC Mock Test IV
No ratings yet
QC Mock Test IV
7 pages
ZDU Question Paper 3
No ratings yet
ZDU Question Paper 3
3 pages
Longstaff Schwartz (95) Risky Debt (P)
No ratings yet
Longstaff Schwartz (95) Risky Debt (P)
18 pages
Sankranti Holiday Work PDF
No ratings yet
Sankranti Holiday Work PDF
6 pages
MA 411 Test 2 2024
No ratings yet
MA 411 Test 2 2024
2 pages
Pipe Formula Derivation
No ratings yet
Pipe Formula Derivation
2 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
An Introduction to Functional Programming Through Lambda Calculus
From Everand
An Introduction to Functional Programming Through Lambda Calculus
Greg Michaelson
No ratings yet
Large Language Models
From Everand
Large Language Models
A. Scholtens
2/5 (2)
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Semantic Similarity Between Medium-Sized Texts

Uploaded by

Semantic Similarity Between Medium-Sized Texts

Uploaded by

Semantic Similarity Between

Jacobo Farray Rodrı́guez(B) , Antonio Jesús Fernández-Garcı́a ,

Universidad Internacional de La Rioja, Logroño, Spain

Abstract. Semantically comparing texts is a task that is useful in var-

Keywords: deep learning · automatic correction · automatic short

in terms of Natural Language Processing (NLP) technologies due to the avail-

RQ How accurate can current models based on Transformers architectures be for

For this study, a question/answer dataset from 2 subjects delivered at Univer-

In a study [12] of semantic comparison of English texts in the clinical environ-

Continuing the progress, Reimers and Gurevych [9] presented a more

The aforementioned models are open-source, but there are applications on

In order to achieve the objective of automatically grading open responses of no

Table 1. Selected Transformer Models.

Id Model Type Tokens n◦

Fig. 2. Architecture followed in Siamese Transformers

– train dataloader: In order to make better use of memory, we did the

4. Application development. To facilitate the task, an application was developed

Table 2. Pearson correlation of the models obtained.

Table 3. Spearman correlation of the models obtained.

Evaluation 2. MAE, MSE, RMSE, RMSLE, R2

Evaluation 4. Although in the quantitative evaluation, the models such as

Table 5. Cases tested at random.

semantic Cortical dandelion

Analyzing correlations from Fig. 3 we see 2 important points:

of data, it would be interesting if we had the teacher’s response and equally

Acknowledgements. This Work is partially funded by the PLeNTaS project,

9. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese

You might also like