0% found this document useful (0 votes)
16 views

Towards Automatically Extracting UML Class Diagram

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Towards Automatically Extracting UML Class Diagram

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Towards Automatically Extracting UML Class Diagrams from

Natural Language Specifications


Song Yang Houari Sahraoui
Université de Montréal Université de Montréal
Montreal, Canada Montreal, Canada
[email protected] [email protected]

ABSTRACT development activities by considering domain models, such as UML


In model-driven engineering (MDE), UML class diagrams serve class diagrams, as first-class development artifacts.
as a way to plan and communicate between developers. However, Though smaller, a gap still exists between the domain concepts
arXiv:2210.14441v1 [cs.SE] 26 Oct 2022

it is complex and resource-consuming. We propose an automated and the tools and languages that are produced to model them [8].
approach for the extraction of UML class diagrams from natural lan- For a domain specialist, creating UML models from scratch is a time-
guage software specifications. To develop our approach, we create a consuming and error-prone process that requires various technical
dataset of UML class diagrams and their English specifications with skills. To address that problem, various approaches target the gener-
the help of volunteers. Our approach is a pipeline of steps consisting ation of models from different structured information such as user
of the segmentation of the input into sentences, the classification stories [4]. However, little work has been done on the extraction of
of the sentences, the generation of UML class diagram fragments natural language specifications. In the specific case of UML class
from sentences, and the composition of these fragments into one diagrams, existing work rely either on techniques that use machine
UML class diagram. We develop a quantitative testing framework learning in a semi-automated process [12] or rule-based techniques
specific to UML class diagram extraction. Our approach yields low that are fully automated but require a restricted input [1, 7]. In
precision and recall but serves as a benchmark for future research. this paper, we propose an approach that combines both machine
learning and rules while accepting free-flowing text.
CCS CONCEPTS Our approach uses natural language patterns and machine learn-
ing to fully automate the generation process. We first decompose a
• Software and its engineering → Software design engineer-
specification into sentences. Then, using a trained classifier, we tag
ing; • Computing methodologies → Information extraction;
each sentence as describing either a class or a relationship. Next, us-
Classification and regression trees.
ing grammar patterns, we map each sentence into a UML fragment.
Finally, we assemble the fragments into a complete UML diagram
KEYWORDS
using a composition algorithm. In addition to our approach, we
Model-driven engineering, Machine learning, Natural language build a dataset thanks to the effort of the modeling community. This
processing, Domain modeling dataset is used to train the classifier et to evaluate the approach.
ACM Reference Format: We evaluate our approach on a dataset of 62 diagrams containing
Song Yang and Houari Sahraoui. 2022. Towards Automatically Extracting 624 fragments. Although the accuracy of our approach does not
UML Class Diagrams from Natural Language Specifications. In ACM/IEEE reach an accuracy level needed for practical use, our work explores
25th International Conference on Model Driven Engineering Languages and the benefits of mixing machine learning with natural language
Systems (MODELS ’22 Companion), October 23–28, 2022, Montreal, QC, Canada.
patterns for a fully automated process. Our approach can serve as
ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3550356.3561592
a baseline for future research on generating UML diagrams from
English specifications, and the dataset created together with the
1 INTRODUCTION defined quantitative metrics can serve as a benchmark for this
Software development is a complex and error-prone process. Part problem.
of this complexity comes from the gap between domain experts The rest of the paper is structured as follows. Section 2 gives
who are familiar with the domain knowledge but have limited an overview of the proposed generation pipeline and the details
expertise with development tools, and software specialists who of each step. The setup and the results of evaluating the approach
master the development environments but are unfamiliar with are provided in Section 3.1. Section 5 discusses the related work
the target application domain. To fill that gap, the model-driven and positions our contribution to it. Section 4 lists some threats to
engineering paradigm aims at raising the level of abstraction in validity. Finally, we conclude this paper in Section 6.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation 2 APPROACH
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
2.1 Overview
fee. Request permissions from [email protected]. The goal is to design a method to translate English specifications
MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada to UML diagrams. To do this, we implement a tool pipeline that
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9467-3/22/10. . . $15.00 generates UML class diagrams from natural language specifica-
https://doi.org/10.1145/3550356.3561592 tions. First, we create a dataset. Secondly, we implement an NLP
MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada Yang et al.

pipeline that performs the extraction of UML class diagrams. Figure we do not offer monetary compensation for participation. However,
2 summarizes the process. Figure 1 summarizes the process. since participation was low, we did not impose a contribution limit.
Our approach combines machine learning with pattern-based After about two months of crowdsourcing, we receive labels for
diagram generation. To perform machine learning, we start by cre- 649 fragments across 62 UML class diagrams. The produced dataset
ating a dataset of UML class diagrams and their corresponding is available on a public repository2 . To ensure quality, labels are
specifications in natural language (top part of Figure 1). We select reviewed and some are rejected. We replace the rejected labels by
pre-existing UML class diagrams from the AtlanMod Zoo reposi- labeling them again ourselves. Figure 4 shows example labels.
tory 1 . The selected diagrams are decomposed into fragments and
manually labeled by volunteer participants. After postprocessing, 2.3 Preprocessing and Fragmentation
the labeled diagrams are stored in a repository.
Preprocessing is the first step after receiving an input specification
The bottom part of Figure 1 consists of the actual diagram genera-
from the user as shown in Figure 1. We substitute pronouns through-
tion process, which takes place right after a user submits a software
out the text, such as it and him, by their reference nouns. This is
specification. The submitted natural language specification is then
done using coreferee [5], which is a tool written in Python that
preprocessed and decomposed into sentences. Using a classifier
performs coreference resolution, including pronoun substitution.
built from the above-mentioned dataset, the sentences are labeled
according to the nature of the UML construct they refer to, i.e. a A course is taught by a teacher. A classroom is as-
class or a relation. According to this label, specific procedures of signed to it.
parsing and extraction are performed on the sentence to generate =⇒ A course is taught by a teacher. A classroom is
a UML fragment. In the end, all UML fragments are composed back assigned to a course.
together into one UML class diagram. Pronoun substitution allows sentences in the English specifica-
tion to be less dependent on each other for semantic purposes. The
2.2 Dataset Creation accuracy of coreferee for general English text is 81%.
We create a new dataset for both the operation and the evaluation of Sentence fragmentation is the second step in the runtime oper-
our approach. In particular, we use this dataset to learn a classifier ations in Figure 1. We split the preprocessed text into individual
for the Classification step in Figure 1. sentences, using spaCy [3]. spaCy is an NLP library in Python that
To build the dataset, we start from an existing set of UML class can be used for various NLP tasks, such as sentence splitting. spaCy
diagrams from the AtlanMod Zoo. The AtlanMod Zoo has a repos- splits text into sentences by looking at punctuation and special
itory of 305 high-quality UML class diagrams that model various cases like abbreviations. Its decisions are powered by pre-trained
domains. The size of the diagrams varies from a few to hundreds statistical models. We use the small English model, which has a
of classes. We fragment each diagram into simple classes (Figure 2) good speed and respectable performance. For instance, in the fol-
and relationships (Figure 3). Table 1 shows the size of the initial lowing example, the first two dots are not considered for splitting
set of diagrams and the fragments, as well as the portion that we the sentences but the third dot is.
labeled.
An employee has a level of studies, i.e., a degree. An
Dataset UML models UML fragments employee is affiliated to a department.
AtlanMod Zoo 305 8706 =⇒
Labeled 62 649 𝑠 1 : An employee has a level of studies, i.e., a degree.
𝑠 2 : An employee is affiliated to a department.
Table 1: UML datasets and their sizes by version

2.4 Sentence Classification


Sentence classification is the third step in the runtime operations of
Since we are interested in the translation of specifications into
Figure 1. Classification provides additional information on the Eng-
diagrams, each UML class diagram needs to be paired with an Eng-
lish specification that can be used later to better generate the related
lish specification. To achieve that goal, we set up a website where
UML diagram fragment. Each sentence is classified as describing
we crowdsource the labeling of fragments. The website proposes
either a "class" or a "relationship".
the labeling of 305 diagrams containing 8706 fragments. We present
The training data for the classifier comes from the dataset de-
the diagrams in ascending order of complexity. The website first
scribed in Section 2.2. Each data point is structured as a pair <Eng-
shows a complete diagram, then iterates on its fragments for la-
lish specification, UML fragment> and is assigned a label of a "class"
beling while keeping the whole diagram in view. The volunteer
or "relationship" from the moment the dataset was processed from
participants write an English specification for each fragment. We
AtlanMod Zoo. The pairing means that the English specification
give examples of labels to help the participants write at the right
belongs to that specific UML fragment. Our classifier is trained to
level of abstraction.
predict the "class/relationship" label from an English specification.
We send the labeling invitation to different MDE mailing lists
To evaluate the accuracy of the classifier, we use 80% of the data
and specific large research groups active in the MDE field. Volunteer
for the training, and the remaining 20% for testing.
participants are mostly university students and faculty members
across the world. To ensure that the labeling is done in good faith,
1 https://web.imt-atlantique.fr/x-info/atlanmod/index.php?title=Zoos 2 https://github.com/XsongyangX/uml-classes-and-specs
Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada

Data collection

AtlanMod UML Model Crowdsourced


Postprocessing Data
Zoo Fragmentation Labeling

Runtime operations

Specification Sentence
English specs Preprocessing Fragmentation

Sentence
Classification
UML Class UML UML Fragment
Diagram Composition Generation

Figure 1: Overview of the extraction process of UML class diagrams from natural language specification

ClassName SimpleRDBS

attributeName
Key
attributeName

Figure 2: Class fragment (a) Key is a class in SimpleRDBMS package

0..* SimpleRDBS
ClassA ClassB
related 1..1
Key Table
owner
Figure 3: Relationship fragment
(b) A Key is owned by one and only one Table in an RDBMS
When training classifiers on natural language text, we have to
select a method to map those sentences into numerical representa- Figure 4: Example labels received by crowdsourcing
tions. To this end, we experiment with two vectorization methods,
count and tf-idf, which are designed to turn words into vectors. tf-
idf ’s key difference from count is the penalization of very common well on both in theory. We attribute the difference in performance
words in a document. This allows giving less importance to words to the randomized splitting of the dataset when training and testing.
like "the". Moreover, if a Bernoulli distribution captures enough information to
As for the classification algorithms, we experiment with various classify well, it seems frequency-based vectorization is not needed.
algorithms from the scikit-learn library [9]. We use the default Should a given sentence may describe both a UML class and a
hyperparameter settings for each algorithm. Table 2 shows the UML relationship, we let the classifier make the decision.
performance of the algorithms on the test data. It is worth noting
that the training takes less than one minute. 2.5 UML Fragment Generation
Although some classifiers have better accuracy, we pick the After classifying each sentence as describing either a "class" or
Bernoulli Naive Bayes classifier with a tf-idf vectorizer. Bernoulli a "relationship", we generate the corresponding UML fragment
Naive Bayes is simple, has a good accuracy that is more stable according to this classification, which is the fourth step in the
across training experiments and is generally faster to execute. runtime operations of Figure 1.
Interestingly, Bernoulli Naive Bayes performs better on a tf-idf Using spaCy’s small English model [3], we define several gram-
vectorizer than the count vectorizer, when it should perform equally mar patterns to match the English sentences. We design the patterns
MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada Yang et al.

Vectorizer If the classification of the sentence resulted in "class", we gener-


Classifier tf-idf (%) Count (%) ate a UML class fragment consisting of only one class with some
Bernoulli Bayes 87 83 potential attributes. For example, the CP8: and clause pattern creates
Multinomial Bayes 83 85 a class whose name is the subject noun and whose attributes are
k neighbors 82 74 the objects among the "and" clauses, as shown in Figure 5.
Linear SVC 88 84
SVC 88 55 News have titles and links
ADA 85 85 −→
Random Forest 81 70 News
Logistic Regression 86 85-95
title
Table 2: Accuracy of binary classification of English sen-
tences link

Figure 5: Example of class generation


Pattern Example of matching sentence
Class fragments If the classification of the sentence resulted in a "relationship", we
CP1: Copula Key is a class in SimpleRDBMS package generate a UML fragment with two classes and one unidirectional
CP2: "there is" There is a place. association between them. In the case of a "relationship", we can
CP3: Compound noun Drawing Interchange Format also extract the multiplicity, if present in the sentence. Here is an
CP4: Compound explicit Workflow State class example with the pattern RP1: to have in Figure 6.
CP5: "to have" a Mesh has a name of type String
CP6: "class named" A class named "Actor". A MSProject has at least one task.
TextualPathExp is part of the −→
CP7: "of package" 1..*
package TextualPathExp MSProject Task
CP8: "and" clauses News have titles and links task
Relationship fragments
RP1: "to have" A MSProject has at least one task. Figure 6: Example of relationship generation
RP2: Passive voice A news is published on a specific date
RP3: "composed" A node is composed of a label During the creation of UML class fragments, we perform addi-
RP4: Active voice Eclipse plugins may require other plugins tional processing on the parse tree results, such as lemmatization,
RP5: Noun "with" A table with a caption noun phrase discovery, and variable naming. In Figure 5, the at-
In a Petri Net a Place may be the tributes are in the singular form of the original word. If Figure 5’s
RP6: Copula
destination of a Transition input text was "News have bold titles and url links", the noun phrase
Table 3: Summary of patterns discovery combines the attributes into "boldTitle" and "urlLink" to
follow the variable naming convention.

2.6 Composition of UML Fragments


based on the data we collected through crowdsourcing. We broadly The last step in the runtime operations of Figure 1 is the composition
group the patterns in Table 3. For sentences labeled as class descrip- of UML fragments into one UML class diagram. After each sentence
tions, we define eight patterns 𝐶𝑃1 to 𝐶𝑃8, and for those describing is turned into a UML fragment, we produce the final UML diagram
relationships, we define six patterns 𝑅𝑃1 to 𝑅𝑃6. The patterns make by combining the fragments. Since the merging of general UML
use of part-of-speech tagging and dependency analysis. class diagrams is NP-hard [11], we design an algorithm tailored to
Multiple patterns can overlap and as such, the spaCy parser our use case. The time complexity of our algorithm is polynomial,
produces several parse trees for the same sentence. For example, in more specifically 𝑂 ((𝑐 + 𝑟 )𝑎 2𝑐) where 𝑐 is the number of classes, 𝑟
the category of class fragments, the patterns CP3: compound noun is the number of relationships between classes and 𝑎 is the number
and CP4: compound explicit are likely to be both applied at the same of class attributes.
time. In this case, we set the CP4: compound explicit pattern at a The composition algorithm takes a greedy approach. Algorithm 1
higher priority and discard the parse tree from CP3: compound noun. merges one UML fragment at a time into a larger, work-in-progress
In general, the priority of patterns in the event of multiple parse UML diagram. When all fragments are used, the work-in-process
trees is based on how specific the pattern is and how much infor- diagram is the completed UML diagram.
mation can be acquired in the parse tree. Hence, for relationship During composition, fragments may present contradicting in-
fragments, the patterns for passive voice and active voice are so formation to the model in progress. We identify two situations
general that they always yield priority to the other patterns. for this. The first is an Attribute-Class conflict and the second is
After a pattern and its parse tree have been chosen, we generate an Attribute-Relationship conflict (Figure 8). In both situations,
a UML fragment using a specific template. the resolution involves removing attributes to create a new class
Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada

Algorithm 1 Composition algorithm CurrentModel


1: 𝑚𝑜𝑑𝑒𝑙 ← previous composition result or any fragment if the
composition is just starting News
2: 𝑓 ← incoming fragment title
3: if kind(𝑓 ) = class then publisher
4: if ∃𝑐 ∈ 𝑚𝑜𝑑𝑒𝑙.classes where 𝑐.name = 𝑓 .name then Publisher
5: if ∃ Attribute-Class conflict then
É name
6: resolve Attribute-Class conflict according to Fig-
ure 7 resolves into
7: else MergeResult
8: merge attributes from 𝑓 into 𝑐 with Algorithm 2
9: else News 0..* Publisher
10: insert 𝑓 into 𝑚𝑜𝑑𝑒𝑙
title publishername
11: else if kind(𝑓 ) = relationship then
12: if ∃ Attribute-Relationship conflict then
13: resolve Attribute-Relationship conflict according to Fig-
ure 8 Figure 7: Attribute-Class conflict and its resolution
14: left ← class from which 𝑓 points
15: right ← class to which 𝑓 points
16: if left ∉ 𝑚𝑜𝑑𝑒𝑙.classes then association inside the relationship fragment. Let 𝐷 be the class
17: insert left into 𝑚𝑜𝑑𝑒𝑙 that is the destination of the unidirectional association inside the
18: if right ∉ 𝑚𝑜𝑑𝑒𝑙.classes then relationship fragment. If the names of 𝐴 and 𝐶 are not the same,
19: insert right into 𝑚𝑜𝑑𝑒𝑙 this is not a conflict and we proceed with a standard insertion.
20: if 𝑓 ’s relationship ∈ 𝑚𝑜𝑑𝑒𝑙.relationships then Otherwise, the resolution starts by removing the attribute 𝑎 from
21: do nothing 𝐴. Then we merge the attributes of 𝐴 and 𝐶. The É relationship from
22: else 𝐶 to 𝐷 is now from the attribute merge result 𝐴 𝐶 to 𝐷.
23: insert 𝑓 into 𝑚𝑜𝑑𝑒𝑙
24: return 𝑚𝑜𝑑𝑒𝑙 News
CurrentModel
title
Algorithm 2 Merge attributes
News
1: class ← recipient UML class publisher
date
2: 𝑎 ← incoming attribute 1..1
publisher Corporation
3: if ∃𝑐 ∈ class.attributes where 𝑐.name = 𝑎.name then
4: if 𝑎 has a type and 𝑐 has no type then
É name
5: replace 𝑐 by 𝑎 inside class
6: else resolves into
7: insert 𝑐 into class.attributes MergeResult
8: return class
News publisher Corporation
or a new relationship. We favor having many smaller classes and title 1..1 name
relationships, instead of a few very big classes. date
An Attribute-Class conflict arises when the UML class diagram in
progress contains an attribute with a name identical to the name of a
class from a class fragment. We resolve this conflict by removing the Figure 8: Attribute-Relationship conflict and its resolution
attribute from the diagram in progress, inserting the class fragment
into the larger UML diagram, and creating a new relationship from
the class that previously contained the attribute to the inserted Lines 16 and 18 in the composition algorithm (Algorithm 1) make
class. This relationship has the name of the attribute as its name use of "relationship" equality. We define two relationships to be
and a multiplicity of zero-to-many. See an example in Figure 7. equal if the classes they are related to have the same name and
An Attribute-Relationship conflict arises when the UML class if the name of the relationship is the same after processing. This
diagram in progress contains an attribute 𝑎 whose name is identical implies that multiplicity is ignored when assessing equality.
to the name of the relationship inside the incoming relationship Finally, this entire pipeline produces one UML class diagram
fragment. Let 𝐴 be the class of this attribute 𝑎 in the diagram in from the received input. We compile the result into an image using
progress. Let 𝐶 be the class that is the source of the unidirectional a compiler called plantuml [10].
MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada Yang et al.

3 EVALUATION Second, we have relaxed matching, which is a weaker form of


exact matching. In relaxed matching, we still look at how many
3.1 Setup
ground truth classes and relationships are present in the generated
To test the performance of our approach, we use the dataset we diagram. However, a ground truth class is considered "present"
created through crowdsourcing in Section 2.2. We first group all if there is a predicted class with the same name. We don’t look
the English specifications for fragments by the UML model they at attributes anymore. Similarly, ground truth relationships are
originated from. This creates 62 testing samples. For example, the considered "present" in the prediction in the same way as exact
following grouped specification corresponds to the UML class dia- matching, except that multiplicities are ignored.
gram shown in Figure 9. Third, we have general matching. This is the most lenient
Drawing Interchange Format. a Drawing Interchange matching criterion. In general matching, classes are still evaluated
model may have multiple meshes. a Mesh has a name like in relaxed matching. Relationships, on the other hand, are eval-
of type String. a Mesh may have any number of points. uated collectively instead of individually. We look at a diagram’s
a point maps to only one Mesh. a point has a name of graph connectivity and compare this connectivity to the ground
type String and coordinates X and Z of type Double. truth diagram’s connectivity. This comparative metric ignores class
names, the orientation of relationships, and the name of relation-
ships. As such, general matching ignores semantics.
To compare two diagrams’ connectivity, we use the technique
of eigenvector similarity [6]. In short, this technique looks at the
eigenvalues of the Laplacian matrices of the two undirected graphs.
If the distance between the most prominent eigenvalues is small,
the two graphs are similarly connected. Distances are in the range
[0, ∞). We apply a mapping to normalize the distance into a score
in the interval (0,1], where 0 means no similarity at all and 1
means the two diagrams are identically connected. The mapping
is 𝑓 (𝑥) = 2(1 − 𝜎 (𝑥)), where 𝑥 ∈ [0, ∞) is the distance and 𝜎 (𝑥) is
the sigmoid function. A perfect connectivity score does not mean
the two diagrams are identical.
To complement connectivity, we add a size difference score to the
general matching metric. The size difference is evaluated by com-
puting ||𝑠 1 − 𝑠 2 ||, where 𝑠𝑖 = 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒 ([number of nodes, number
of edges]) of graph 𝑖. The norm is the Pythagorean distance, and
the graphs are the two undirected graphs used in√the eigenvector
similarity calculation. The norm ranges from 0 to 2, with √ 0 being
the best score. We apply the mapping 𝑔(𝑥) = 1 − 𝑥/ 2 to make
the score fall in the interval [0,1] with 0 being the worst score and
Figure 9: An original UML class diagram from the AtlanMod 1 being the best score. This means 0 is attributed to graphs with
Zoo vastly different sizes, and 1 is attributed to graphs with similar sizes.
A score of 1 means the size vectors 𝑠𝑖 are oriented closely, not that
the graphs have the same sizes. To get a better grasp, we look at the
precision and recall results for class generation and the connectivity
3.2 Evaluation Metrics similarity.
To evaluate the accuracy of our approach, we use comparative
metrics. We design three levels of strictness for comparing the
diagrams generated by our tool and the ground truth of the dataset. 3.3 Results and Discussion
We assume there is only one good UML class diagram per input After generating 62 candidate UML class diagrams from English
specification. specifications, we use an automated script to compare the predic-
First, we have exact matching, which is the most strict com- tions with the ground truth. Each predicted diagram is compared
parison. Under exact matching, we look at how many classes and with its ground truth counterpart in the dataset. We take the aver-
relationships from the ground truth are present in the generated age of the 62 results for all metrics and present it in Tables 4 and
diagrams. A ground truth class is considered present in the predic- 5.
tion if there is a class in the generated diagram that has an identical The evaluation results are not great. The performances are below
name and identical attributes. A ground truth relationship is consid- 50%, which makes our approach not suited for practical use. We
ered "present" in the prediction if both of the classes attached to it explore several reasons for this and the meaning of low results
are present and if there is a relationship between these two classes in the challenge of extracting UML class diagrams from natural
with the same name and multiplicity in the predicted output. Each language specifications.
UML diagram under evaluation outputs precision, recall, and f-1 Since the precision for class generation is low at 17% for exact
score. matching and 35% for relaxed matching, it might be caused by too
Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada

Metric Precision Recall F-1 score 4 THREATS TO VALIDITY


Exact matching 0.171 0.251 0.200 Although we rejected bad labels during the creation of the dataset,
Relaxed matching 0.355 0.506 0.409 some volunteers provided specifications with questionable seman-
General matching 0.355 0.506 0.409 tics and spelling errors. We kept those labels because we want our
Table 4: Performance of generating classes from English approach to operate under imperfect conditions. Our approach can-
specification not deal with spelling errors and confusing specifications. Each
spelling mistake creates an extra UML class or relationship that
should not exist. And if the specification is unclear, then the gener-
Metric Connectivity similarity Size difference ated diagram is also unclear.
General matching 0.639 0.673 Due to a lack of sufficient data, we did not set aside unseen
Table 5: Performance of generating relationships from Eng- data for the evaluation. The evaluation uses the entire dataset for
lish specification testing. Despite the classifier of Section 2.4 splitting the data into
80-20, 80% of the evaluation data has been seen during the training
of the classifier. We believe this bias to be minimal because the
classification is an intermediate step.
many classes in the prediction that can be traced to synonyms in In the composition algorithm (Algorithm 1), the merging of the
the specification. English specifications can contain synonyms and UML class diagram with fragments is treated in a non-commutative
different wordings for the same idea. If the user decides to use fashion. In other words, the pseudo-code only addresses the con-
two different terms for the same concept in a specification, they flicts when it is an Attribute-Class (Figure 7) and not Class-Attribute.
might have wanted two different classes, or the user might have A similar situation is happening for the Attribute-Relationship con-
wanted a specification that is more interesting for humans to read. flict. If the conflicting relationship is already inside the model in
This ambiguity cannot be resolved without user feedback. However, progress, the algorithm will not flag that as an Attribute-Relationship
given our approach generates too many classes on average, a more conflict and it will therefore not resolve it. An improved version of
aggressive merging during the composition step (Section 2.6) would the algorithm should treat the merging of the UML class diagram
be beneficial. and the fragment in two directions, i.e. in a commutative way. This
While our approach generates too many classes, recall for class would increase the performance of our approach.
generation is still too low at 25% for exact matching and 50% for
relaxed matching. This means there are elements in the ground 5 RELATED WORK
truth UML class diagram that are not extracted from English spec-
ifications. This can be improved by adding more patterns to the 5.1 Survey of Relevant Literature
rule set in Section 2.5. Moreover, a better noun phrase extraction In 2021, 24 published tools and methods for the extraction of UML
mechanism can extract more class names and attributes from the class diagrams from natural language specifications have been sur-
text. Currently, our method works well with noun phrases that are veyed. In the survey, most tools and methods require consistent
one word or two words long. Noun phrases longer than two words user intervention. Most tools also required the specification to be
require a more sophisticated extraction process. We used spaCy’s given in a specific format, such as a more restricted vocabulary
default noun phrase detector, but exploring the dependency parse of English or a more rigid structure rather than free-flowing text.
tree ourselves directly might be a better idea. The authors concluded that no fully automated tool to generate
A low performance signals that our approach has limits. Since we complex UML class diagrams exists [2].
incorporate several statistical components with their own imperfect
accuracy in our pipeline, there is an upper ceiling of performance
Degree of automation # of papers
we cannot exceed. Our performance cannot be better than the
Semi-automatic 9
performances of these components. If we assume all components
have an equal influence on the output, we have an expected upper Automatic 15
limit of accuracy of 0.63 as seen in Table 6. One way of reducing the Input # of papers
effect of compounding errors is to introduce a retroactive step. In Unrestricted English 12
our case, that step is the composition algorithm’s attempt to resolve Restricted English 8
conflicts in Algorithm 1. Structured format 4
Table 7: Results of the 2021 survey on other English-to-UML
Statistical component Accuracy approaches [2]
Pronoun resolution (coreferee) 0.81
Grammar analysis (spaCy) 0.90
Binary classification 0.87
The survey uses qualitative methods to evaluate the outputs
Combined product 0.63
of UML extractors. Though valuable, qualitative evaluation is not
Table 6: Accuracies of each statistical component in our ap- enough to assess the correctness of the proposed approach and
proach to compare them. We provide quantitative metrics and a testing
dataset that can be used for all future UML class diagram extractors.
MODELS ’22 Companion, October 23–28, 2022, Montreal, QC, Canada Yang et al.

5.2 Automatic Approaches framework. From the novelty perspective, we explore intermediate
Automatic approaches do not require extensive user intervention. machine learning steps to simplify a mostly rule-based approach.
Once input is given, the user only needs to wait for the recom- Furthermore, our approach uses a divide-and-conquer strategy
mended result. The automatic approaches make use of more tradi- when fragmenting diagrams and text and when composing them
tional NLP techniques, such as hand-written rules and grammar back together.
parsing. For example, the authors of [1, 7] use several heuristics to In the future, a more complex pattern system can improve the
analyze the natural language specification. performance of our approach. Currently, we only use a single rule
The automated approaches presented in [1, 7] have normaliza- to generate a UML fragment, but if several rules contribute together,
tion rules, which require users to write specifications in a restricted the performance can increase. The composition algorithm can also
English sentence structure. Our approach accepts free-flowing text. be improved, such as by considering a confidence score in each
We don’t have any normalization rules that users must keep in fragment. Furthermore, inheritance can be generated as a new type
mind. of relationship by adding more grammar patterns. Finally, we can
generalize our approach to handle other types of UML diagrams, in
5.3 Semi-automatic Approaches particular behavioral ones.
Semi-automatic approaches make use of an AI assistant to guide REFERENCES
the user in general UML class diagrams. DoMoBOT is an example [1] Esra A Abdelnabi, Abdelsalam M Maatuk, Tawfig M Abdelaziz, and Salwa M
of such a tool. Overall, DoMoBOT makes use of machine learning Elakeili. 2020. Generating UML class diagram using NLP techniques and heuristic
via knowledge bases inside pre-trained models and word embed- rules. In 2020 20th International Conference on Sciences and Techniques of Automatic
Control and Computer Engineering (STA). IEEE, 277–282.
dings. The UML class diagram is generated progressively as the [2] Esra A. Abdelnabi, Abdelsalam M. Maatuk, and Mohammed Hagal. 2021. Gener-
user provides feedback to DoMoBOT [12]. ating UML Class Diagram from Natural Language Requirements: A Survey of
Approaches and Techniques. In 2021 IEEE 1st International Maghreb Meeting of
Although human intervention during the generation of UML the Conference on Sciences and Techniques of Automatic Control and Computer En-
models improves quality, the additional effort spent by users makes gineering MI-STA. 288–293. https://doi.org/10.1109/MI-STA52233.2021.9464433
the tools difficult to use, especially by domain experts. Our approach [3] Explosion AI. 2016. spaCy: Industrial-Strength Natural Language Processing.
Explosion AI, Berlin, Germany. Retrieved Jul 13, 2022 from https://spacy.io/
is meant to improve the automation of the generation process [4] Meryem Elallaoui, Khalid Nafil, and Raja Touahni. 2015. Automatic generation
as much as possible. An automated approach is also better for of UML sequence diagrams from user stories in Scrum process. In 2015 10th
consistent testing. international conference on intelligent systems: theories and applications (SITA).
IEEE, 1–6.
[5] Richard Paul Hudson. 2021. Coreferee: Coreference resolution for multiple languages.
6 CONCLUSION Explosion AI, Berlin, Germany. Retrieved Jul 13, 2022 from https://spacy.io/
universe/project/coreferee
In this paper, we propose an automated approach to extract UML [6] Danai Koutra, Ankur Parikh, Aaditya Ramdas, and Jing Xiang. 2011. Algorithms
class diagrams from English specifications. The approach uses ma- for Graph Similarity and Subgraph Matching. (2011), 15–16. Retrieved Jul 18,
2022 from https://www.cs.cmu.edu/~jingx/docs/DBreport.pdf
chine learning and pattern-based techniques. Machine learning is [7] Priyanka More and Rashmi Phalnikar. 2012. Generating UML diagrams from
used in the form of a binary classifier that labels sentences as either natural language specifications. International Journal of Applied Information
Systems 1, 8 (2012), 19–23.
describing a class or a relationship. The pattern-based techniques [8] Gunter Mussbacher et al. 2020. Opportunities in intelligent modeling assistance.
are handwritten grammar rules to parse English sentences. In this Software and Systems Modeling 19, 5 (2020), 1045–1053.
approach, we fragment the English input into sentences, gener- [9] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
ate UML class diagram fragments from them, and combine all the napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
fragments together into a final result. Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
To develop our tool, we first create a dataset of UML diagrams [10] PlantUML. 2021. Open-source tool that uses simple textual descriptions to draw
beautiful UML diagrams. PlantUML. Retrieved Jul 14, 2022 from https://plantuml.
paired with English specifications. The specifications are produced com
by a crowdsourcing initiative. The resulting dataset, although small, [11] Julia Rubin and Marsha Chechik. 2013. N-Way Model Merging. In Proceedings of
the 2013 9th Joint Meeting on Foundations of Software Engineering (Saint Peters-
is enough to train the classifier and evaluate our approach. burg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York,
We define three evaluation metrics of varying strictness to test NY, USA, 301–311. https://doi.org/10.1145/2491411.2491446
our approach’s accuracy in generating classes and relationships [12] Rijul Saini, Gunter Mussbacher, Jin LC Guo, and Jörg Kienzle. 2022. Automated,
interactive, and traceable domain modelling empowered by artificial intelligence.
from an English specification. The results for classes are 17% pre- Software and Systems Modeling 21, 3 (2022), 1015–1045.
cision and 25% recall for exact matching, the strictest metric. The
results for relationships are a connectivity similarity of 63% and a
size difference of 67%.
The correctness of the produced diagrams is limited. However,
these results are in part explained by the imprecision of the NLP
tools we used. Using more sophisticated NLP tools will help to
improve these results. In addition, more grammar patterns can be
added in Section 2.5 and an improved version of the composition
algorithm will reduce irrelevant classes.
From a broader perspective, our research lays the work for a
consistent quantitative evaluation framework with our approach
being the baseline and with the dataset and metrics being the testing

You might also like