0% found this document useful (0 votes)

18 views16 pages

Dint A 00062

Uploaded by

Raphael Noriega

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views16 pages

Dint A 00062

Uploaded by

Raphael Noriega

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

RESEARCH PAPER

Deep Learning, Feature Learning, and Clustering

Analysis for SEM Image Classification

Rossella Aversa1,2, Piero Coronica1,3, Cristiano De Nobili1,4 & Stefano Cozzini1,5†

National Research Council-Istituto Officina dei Materiali (CNR-IOM), 34136 Trieste, Italy
1

KIT-Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Research Software Engineering, University of Cambridge, Cambridge CB3 0FA, UK
3

4
Freelance at denocris.com
Area Science Park, Padriciano 99, 34149 Trieste, Italy
5

Keywords: Neural networks; Feature learning; Clustering analysis; Scanning Electron Microscope (SEM); Image
classification

Citation: R. Aversa, P. Coronica, C. De Nobili & S. Cozzini. Deep learning, feature learning, and clustering analysis for SEM image
classification. Data Intelligence 2(2020), 513–528. doi: 10.1162/dint_a_00062

ABSTRACT

In this paper, we report upon our recent work aimed at improving and adapting machine learning
algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope
(SEM). This is done by coupling supervised and unsupervised learning approaches. We first investigate
supervised learning on a ten-category data set of images and compare the performance of the different
models in terms of training accuracy. Then, we reduce the dimensionality of the features through autoencoders
to perform unsupervised learning on a subset of images in a selected range of scales (from 1 μm to 2 μm).
Finally, we compare different clustering methods to uncover intrinsic structures in the images.

1. INTRODUCTION

Image classification, as well as image recognition, image retrieval and other algorithms based on
neural networks, are widely applied in many different research areas, including nanoscience and
nanotechnology [1, 2]. Scientists working with microscopy techniques are particularly interested in general
tools able to automatically identify and recognize specific features within images.

†
Corresponding author: Stefano Cozzini (Email: [email protected]; ORCID: 0000-0001-6049-5242).

© 2020 Chinese Academy of Sciences. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Neural networks were employed in a number of recent studies for feature extraction from different types
of microscope images. For example, recognition of cellular organisms from scanning probe microscopy
images was shown using artificial neural networks [3]. Neural network classifiers were also used to estimate
the morphology of carbon nanotube structures, such as their curvature and alignment [4]. In the framework
of microscopy cells images, [5] presented a method for cell counting based on a fully convolutional neural
network (CNN) able to predict a spatial density map for target cells; a similar method was used by [6] for
candidate region selection and for further discrimination between target cells and background; in [7], a
supervised CNN was trained to identify spots in image patches.

In nanoscience, where large numbers of images are the typical outcome of experiments, image recognition

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

techniques can be an extremely powerful tool. In the framework of the NFFA-EUROPE project [8], the
information and data management repository platform (IDRP) was developed to suit the data sharing needs
of the nanoscience community. The first aim of this distributed research infrastructure is to provide a
common repository where NFFA-EUROPE scientists can easily collect and store scientific data produced
at the experimental and theoretical facilities available among the project partner institutes. The second and
even more important purpose is to allow the users to share and publish the collected data according to the
FAIR principles [9]. A central problem is thus to guarantee the search and retrieval of such heterogeneous
data, and to establish a proper way to organize the repository: this makes almost mandatory to provide
tools which automatically enrich data with appropriate metadata defining its content.

We focus on the data produced by a single instrument, the Scanning Electron Microscope (SEM). This is
an extremely versatile instrument, routinely used in nanoscience and nanotechnology to explore the
structure of materials with spatial resolution down to 1 nm. Almost 150,000 images were collected in the
last five years by the TASC laboratory at CNR-IOM in Trieste [10] and such number will increase in
the near future. We thus face the problem to classify and store them in a FAIR way.

As a fundamental step, a sample counting more than 18,000 images was extracted from the original
sample and manually labelled in 10 categories, forming the SEM data set [11], which we employed in [12]
and in [13] to investigate transfer learning [14], in particular feature extraction, for automatic image
categorization. The test accuracy resulting by the adopted technique settled around 90%.

In this work we aim at improving the accuracy in the classification task through an extensive comparison
among three well established CNNs and different machine learning techniques [15]. As a further scientific
development, we also face the challenge of improving the existing categories by means of a semi-supervised
approach to automatically find hidden structures in the data [16]. This is the first step towards the automatic
addition of new categories and the creation of a hierarchical tree of sub-categories, reducing the huge
human effort required to manually label the training set.

The paper is organized as follow: in Section 2 we show the supervised approach using CNN networks.
We also introduce a further improvement which allows classifying SEM images in terms of their scale.
Section 3 presents our unsupervised approach to the problem, discussing why a completely unsupervised
approach for feature learning is not possible in this case. Finally, in Section 4 some conclusions and future
perspectives are presented.

514 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

2. SUPERVISED LEARNING

The goal of this Section is to illustrate the techniques we adopted to increase the accuracy of the image
classifier presented in [13]. We first trained from scratch different state-of-the-art CNN architectures on the
SEM data set, and then we went further applying the following transfer learning methods:

• Feature Extraction: Start with a pre-trained checkpoint, reset and randomly initialize only the
parameters of the last layer. Then, retrain the network allowing back-propagation just on the last layer;
• (Complete) Fine Tuning: Start with a pre-trained checkpoint, reset and randomly initialize only the
parameters of the last layer. Then, retrain the network allowing back-propagation through all the

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

layers.

We adopted checkpoints pre-trained on two data sets: ImageNet [17], a large visual database designed
for object recognition, in the version of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
2012 [18], and our own SEM data set [11]. We note that the second case cannot be formally defined as a
transfer learning technique, since the fine tuning of the CNN is performed on the same data set of the
checkpoint; nevertheless, this is a commonly adopted way to efficiently refine the parameters of the network.

The architectures used are Inception-v3, Inception-v4, and Inception-Resnet-v2 [19, 20, 21]. The core
idea under this family of networks is the inception module: it consists of parallel convolutional layers of
different kernel sizes, which improve the ability of the network to efficiently detect features of objects having
different dimensions.

All the computations shown in this work were performed on the C3E Cloud Computing Environment of
COSILT [22], from now on called C3HPC, located in Tolmezzo (Italy) and managed by eXact Lab srl [23],
equipped with two Tesla K20 Nvidia Graphics Processing Units (GPUs) loading the Nvidia CUDA Deep
Neural Network library (cuDNN).

2.1 Training from Scratch on SEM Data Set: Comparison between Different Architectures

In the field of deep learning applied to computer vision, the first CNN that outperformed and won the
ImageNet ILSVRC 2012 was AlexNet [24]. We initially trained from scratch this simple model on our SEM
data set and reached 73% of accuracy. This result was not impressive; thus, as a next step, we trained the
version v3, v4, and Resnet-v2 (hereafter Resnet) of the more recent Inception architecture [19, 20] on the
SEM data set.

In Figure 1, the accuracy computed on the test set up to 240 training epochs is shown for the mentioned
architectures. As expected, the Inception family of networks reaches a remarkably higher value than the
simpler AlexNet. It seems that Inception-v4 performs only slightly worse than Inception-v3 and Inception-
Resnet on the SEM data set. However, 240 epochs were not enough for Inception-v4 to converge, as
the loss function had not reached a stable minimal value yet. Moreover, it required more than twice the
time needed by the other networks to reach the same accuracy in a stable way: ~ 160 hours, with respect
to ~ 70 hours. After these considerations, we decided to rule out the training from scratch of Inception-v4.

Data Intelligence 515

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Figure 1. Test accuracy as a function of the number of training epochs obtained by training from scratch Inception-
v3 (magenta), Inception-v4 (orange), Inception-Resnet (green), and AlexNet (black) on SEM data set. All the models
were trained with the best combination of hyperparameters, according to the memory capability of the available
hardware.

2.2 Feature Extraction and Fine Tuning from ImageNet to SEM Data Set

Transfer learning is becoming a very popular technique in deep learning. It is based on the idea of storing
the knowledge learned from one task and applying it to a different but related one [25]. This approach is
faster than training from scratch, but the results might be less accurate in some cases (e.g., feature extraction),
and the architectures which can be used are restricted to the pre-trained checkpoints available in the
literature.

In this work, transfer learning was tested on our target SEM data set, using Inception-v3, Inception-v4,
and Inception-ResNet checkpoints pre-trained either on ImageNet ILSVRC 2012 [17] or on SEM data set
(Section 2.1).

As shown in the inset of Figure 2, feature extraction accuracy does not exceed 90% for any CNN
architecture. This reproduces the results obtained by [13], and confirms the limits of this transfer learning
method.

Fine tuning reveals itself as a more successful technique, increasing the test accuracy to ~ 97% as can
be seen in the main panel of Figure 2 (magenta and orange lines). Due to its complexity (in terms of floating-
point operations), Inception-Resnet architecture gave rise to GPU memory issues when setting the batch

516 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

size bs > 16. This made the training slower with respect to the other architectures, because of the greater
number of back-propagations performed. Thus, we decided to address the complete investigation of
Inception-Resnet in the future on different hardware equipment, and to omit the results in order to avoid
confusion.

Having excluded the training from scratch of Inception-v4 and the fine tuning of Inception-Resnet, we
finally fine tuned Inception-v3 using the SEM checkpoint obtained in Section 2.1 by training from scratch
this architecture on the SEM data set. The test accuracy, shown in Figure 2 (blue line), is comparable with
the ones obtained from the ImageNet checkpoint. These results lead us to the conclusion that the SEM data
set is complete enough to allow the autonomous training of deep neural networks, without the need to

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

only rely on the models pre-trained on huge data sets such as ImageNet.

Thus, Inception-v3 fine tuned on the SEM data set revealed to be the most suited architecture for our
purposes; all the analyses presented in the rest of the paper have been done using this model.

Figure 2. Main: Test accuracy as a function of the number of training epochs obtained when fine tuning on
the SEM data set Inception-v3 (magenta) and Inception-v4 (orange) starting from the ImageNet checkpoint, and
Inception-v3 (blue) from the SEM checkpoint that, as expected, converges very rapidly. Inset: Test accuracy as a
function of the number of training epochs obtained when performing feature extraction of Inception-v3 (magenta),
Inception-v4 (orange), and Inception-Resnet (green) on the SEM data set starting from the ImageNet checkpoint.
All the models were trained with the best combination of hyperparameters, according to the memory capability of
the hardware available.

Data Intelligence 517

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

3. UNSUPERVISED LEARNING

In this section we report on our first attempt to use unsupervised techniques to automatically identify
new categories within our data set [16]. The data set can be seen as a cloud of points in a space of
dimension given by the number of total pixels (namely 1024 × 768 for the vast majority of the images).
Pattern detection within data (i.e., to detect common unspecified features in the images) in such a high
dimensional space is a challenging task due to the sparsity of data in the input space. This is generally
referred to as the curse of dimensionality.

Our approach to deal with this issue comprises three different steps. We first select a subset of SEM

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

images captured at the same resolution to facilitate detection of hidden structures at the same level of
magnification. We then apply a procedure to reduce the dimensionality of this subset. Finally, we adopt
several clustering methods and quantitatively evaluate the different algorithms. In the following subsections,
we describe in detail each step and discuss the results obtained so far.

3.1 The 1μ–2μ Data Set Selection

The SEM data set includes images of objects whose size can vary over several orders of magnitude,
ranging from 1 mm to 1 nm. Even the same objects, imaged at different levels of magnification, may exhibit
different high-level features. For this reason, the image resolution is a fundamental quantity to split the data
set by scale. This piece of information is not available as metadata for all the images; nevertheless, it can
be recovered as it is annotated on each image on a stripe containing different information. In order to read
and store the scale data, an algorithm was implemented using the OCR engine Tesseract v3.04.01 and the
library OpenCV v3.4.1 for image segmentation and contours detection. Using the above algorithm (as
outlined in [16]) we classify all the images in different bins of scale. In particular, the bins of 1 μm and 2
μm count a substantial amount of images and a non-zero population on all the original 10 categories. For
this reason, we select the images in the interval from 1 μm to 2 μm to form a data set, which we will refer
to as 1μ–2μ data set. All the analyses described in the rest of the paper have been performed on it. Within
the 52,682 images in this data set, 7,557 appear in the SEM data set and so have a hand-assigned label.
Table 1 shows the breakdown according to the 10 categories.

Table 1. Number of images for each label in the 1μ–2μ data set, adopting the same labelling used in [11, 12, 13],
reported here for completeness: 0 = Porous sponges, 1 = Patterned surfaces, 2 = Particles, 3 = Films and coated
surfaces, 4 = Powders, 5 = Tips, 6 = Nanowires, 7 = Biological, 8 = MEMS devices and electrodes, 9 = Fibres.

Label 0 1 2 3 4 5 6 7 8 9 TOT
Images 55 1,403 933 147 236 713 1,891 588 1,572 19 7,557

518 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

3.2 Feature Learning

To perform clusters analysis on the data set, a notion of distance between images is needed. In the
literature there are several definitions of similarity between images (see for example [26, 27]); however,
most of them do not perform content analysis, precluding the possibility of detecting similarities among
different objects. A different way to proceed is to pre-process the images by selecting the high-level features
that most characterize their content. The advantages of this approach are twofold. It highlights the most
meaningful features of the images, and it helps to bypass the curse of dimensionality.

We therefore design an intermediate procedure between supervised and unsupervised learning methods.

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

The key idea is to exploit the best performing architecture described in Section 2.2 (i.e., Inception-v3) to
extract from the images in the 1μ–2μ data set the features obtained in the last fully connected layer. For
sake of clarity, we do not discuss here the different choices of model-layer that can be performed and we
focus exclusively on the last layer, namely, the Logits layer, of the model fine-tuned on the SEM data set
(refer to [16] for an analysis of different cases). Because of the original design of the network, suited for
the prediction of the 1,001 classes of the ImageNet data set, the output of this layer has dimension 1,001.
This is why the data set obtained by this procedure is referred to as the 1μ–2μ_1001 data set.

3.3 Intrinsic Dimension and Dimensional Reduction

The 1,001 features considered in the 1μ–2μ_1001 data set are still too many to bypass the curse of
dimensionality. To further shrink the features space without losing essential information, we perform a
nonlinear dimensional reduction using autoencoders [28, 29]. Autoencoders are a class of neural networks
which attempt to compress the information of the input variables into a reduced dimensional space, the
so-called coding space, and then recreate the input data set.

It should be recognized that the features considered in 1μ–2μ_1001 were extracted by a network trained
to distinguish the 10 classes in the SEM data set. Thus, in the specific case we are analyzing, we could
already assume 10 to be a reasonable coding dimension. However, in a more general framework (models
trained on different data sets or features coming from lower layers), the coding dimension is a parameter
that has to be set carefully. A good way to proceed is to estimate the Intrinsic Dimension (ID) of the data
set via the so-called 2-Nearest Neighbors (2-NN) algorithm recently presented in [30].

The green lines in Figure 3 summarize the results obtained by 2-NN estimator on the 1μ–2μ_1001 data
set. The Figure confirms our assumption: the lines show an evident plateau around 9, i.e., quite close to
the number of the original categories. Moreover, applying an autoencoder with coding dimension 10 does
not have a great impact on the ID of the data, as shown by the red lines. To maintain consistency in the
naming system, we refer to the reduced representations of the data set obtained by the autoencoder as
1μ–2μ_1001_10.

Data Intelligence 519

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Figure 3. Intrinsic Dimension of the 1μ–2μ_1001 data set, varying the sample size, computed before autoencod-
ing (green lines) and after autoencoding (red lines). The three brightness levels for each color correspond to the
percentage of points used in the linear fit: 90%, 70%, and 50%.

We finally evaluate whether the dimensional reduction we performed on the 1μ–2μ_1001 data set still
provides meaningful information with respect to the original 10 categories. In order to do so, we sample
400 images from it and compare the Euclidean distances induced by their reduced representations (at 1,001
and 10 features respectively), against the following discrete distance:

ddisc(xi, xi) = dIi ,Ij (1)

where li is the label of the image xi.

Figure 4 shows the heatmap of the discrete distance on the 400 sampled images sorted by category. The
black squared blocks on the diagonal represent the zero distance pairs within the same category.

520 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Figure 4. ddisc heatmap for a manually labelled subset of images.

The heatmaps of the the Euclidean distances inherited by the reduced representations before (left panel)
and after (right panel) applying the autoencoder are displayed in Figure 5. In both cases, the block structure
along the diagonal emerges clearly, being more definite on the right panel. This is confirmed by a slightly
larger correlation index between ddisc and the distance after autoencoders.

Figure 5. Heatmaps of the distances obtained via Inception-v3. The image captions specify the methods used and
indicate the correlation index with ddisc.

Data Intelligence 521

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

3.4 Clustering Analysis

The clustering analysis on the 1μ–2μ_1001_10 data set is possible with a moderate computational effort.
Among the several different algorithms available in the literature, we focus on the hierarchical agglomerative
clustering methods defined by four classic linkages criteria: single, complete, centroid and Ward. Moreover,
we also completed the analysis including a hierarchical version of a recently introduced density-based
technique called density peaks [31].

To evaluate the quality of the clusters obtained at a given level of a hierarchy, we compare them to the
original 10 categories of the SEM data set [11] via the widely adopted Normalized Mutual Information

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

(NMI) [32]. As already specified in Section 3.1, not all images in 1μ–2μ were classified, and thus our
evaluation takes into account only the 7,557 images in the data set that comes with a hand-assigned label.

To better understand the scores of a hierarchical algorithm at different levels of the hierarchy we compare
them to the scores obtained by two artificial scenarios. The first one, called good scenario, is constructed
recursively in a divisive way: starting from the partition of the 7,557 images provided by the 10 categories,
at each step the biggest cluster is split evenly in two clusters. On the other hand, the uniform scenario is
obtained by a uniform assignment of k labels, for 10 < k ≤ 7557. We can then compute the NMI for both
scenarios and plot them as a function of the number of clusters. We remark that those are not bounds for
the NMI scores, but should be considered as a reference to help us evaluate the scores of the clustering
method adopted in this study.

Figure 6 shows the NMI scores (on the labelled data only) of the clustering obtained by the five hierarchical
algorithms applied on the whole 1μ–2μ_1001_10 data set. We also report the scores of the artificial
scenarios discussed above as dashed lines: the orange one refers to the good case, while the green one
represents the uniform case.

From Figure 6 we can spot some interesting patterns. The Single Linkage (brown line) provides the worst
results and this can be explained looking at the clusters’ cardinalities. This algorithm produces a big cluster
merging small clusters or singletons into it. On the other hand, the Complete and Ward linkages (cyan and
blue line, respectively) as well as Density Peaks (pink line) behave similarly and present good results.
Actually, the latter two produce almost identical scores, and a peak is observed around 10. This behavior
is due to the strong bias towards the 10 categories inherited by the reduced representation from the model
used to extract the high-level features. Nevertheless, the most interesting and impressive results are the ones
obtained by the Centroid Linkage (red line). Although poor scores are returned by this algorithm for a small
number of clusters, they rapidly grow after k ~ 70 and they outperform the results obtained by the artificial
refinements used as a reference of good scores.

522 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Figure 6. NMI scores of the clustering obtained by the five hierarchical algorithms (solid lines) considered as a
function of k, the number of clusters. The scores of the artificial scenarios are reported as orange (good case) and
green (uniform case) dashed lines.

The results summarized in Figure 6 are not only a mere evaluation of the adopted clustering algorithm,
but give interesting indications to further exploit the data set. The most remarkable hint is the impressive
results obtained by the Centroid Linkage (red line) around 200 clusters; this result encourages us to improve
the procedure in order to classify the SEM images in a tree structure of sub-categories with small or even
any manual labeling effort. The 200 clusters identified have to be scientifically validated, i.e., it must be
checked whether the features characterizing the clusters have a scientific, useful meaning for SEM users;
this evaluation is currently in progress at CNR-IOM in Trieste.

4. CONCLUSIONS

In this paper, we presented an overview of our recent work on the classification of images at the
nanoscale collected using the SEM.

Data Intelligence 523

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

In Section 2 we applied state-of-the-art deep learning techniques and CNNs on a recently published data
set composed by SEM images. We performed a comparison of different Inception architectures and learning
methods: training from scratch, feature extraction, and fine tuning. Feature extraction and fine tuning can
be even combined together. First, feature extraction can be applied on the target data set from a pre-trained
checkpoint, and then the entire model can be fine tuned. In this case, the fine tuning starts with the last
layer weights initialized to the values obtained from the previous training phase and it should take less time
to converge. On the other side, feature extraction may be applied after fine tuning to refine the training
weights and to improve (by a small fraction) the final accuracy. However, when the number of classes is
not huge, as in our case (where 10 classes are under consideration), we verified that there are no relevant

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

gains in performance. We showed the Inception-v3 architecture trained from scratch on the SEM data set
and then fine tuned to be the best choice in our case, targeting both accuracy and time to solution: it
outperformed the other architectures and even the previous work on the same data set [13].

In Section 3 we presented a possible strategy to detect intrinsic structures in the SEM images in a semi-
supervised way. The approach was defined semi-supervised because it was focused on the 10 features
corresponding to the categories of the SEM data set, previously learned. We performed a dimensional
reduction on a subset of images at similar scale by means of autoencoders, to be able to apply several
clustering algorithms. NMI coefficient was then used to score them, keeping the classification in 10
categories as ground truth. The scores reached by the Centroid Linkage showed a huge evidence of a
potential refinement of the current classification.

These encouraging preliminary results are the starting point for several actions, which are already
ongoing: we are evaluating more in detail the dimensional reduction algorithm, to devise a general strategy
to pass from the features extracted by supervised learning to a clustering method.

Moreover, we are performing the procedure described above (feature learning followed by clustering)
on single categories. In this case, the number of labelled images could be increased by including the
predictions realized by the CNN in [13] (even though this model achieves a worse overall accuracy than
the one used in this article, its confusion matrix reports a better accuracy on the less represented categories).
This will provide a tool for the automatic classification of subcategories, which we will be able to compare
with the results obtained manually.

ACKNOWLEDGEMENTS

This work has been done within the NFFA-EUROPE project and has received funding from the European
Union’s Horizon 2020 Research and Innovation Program under grant agreement No. 654360 NFFA-
EUROPE. The authors thank A. Cazzaniga for his contribution in preparing the final version of the plots.

524 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

AUTHOR CONTRIBUTIONS

S. Cozzini ([email protected]) and R. Aversa ([email protected]) planned the overall work.

C. De Nobili ([email protected]) and R. Aversa performed the supervised approach. P. Coronica
([email protected]) performed the unsupervised approach. R. Aversa wrote the manuscript. S. Cozzini
supervised the whole work. All the authors reviewed the manuscript.

REFERENCES

[1] G. Roughton, A.S. Varde, S. Robila, & J. Liang. A feature-based approach for processing nanoscale images.

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

In: Proceeding of Scanning Microscopy, 2010, pp. 1–9. doi: 10.1117/12.853412.
[2] A. Amani & D. Mohammadyani. Artificial neural networks: Applications in nanotechnology. In: C.L.P. Hui
(edited) Artificial Neural Networks—Application. Rijeka, Croatia: InTech, 2011, pp. 465–478.
[3] M.P. Nikiforov, V.V. Reukov, G.L. Thompson, A.A. Vertege, S. Guo, S.V. Kalinin, & S. Jesse. Functional recog-
nition imaging using artificial neural networks: Applications to rapid cellular identification via broadband
electromechanical response. Nanotechnology 20 (2009), No. 40.
[4] M.A. Al-Khedher, C. Pezeshki, J.L. McHale, & F.J. Knorr. Quality classification via Raman identification
and SEM analysis of carbon nanotube bundles using artificial neural networks. Nanotechnology 18(2007),
No. 35.
[5] W. Xie, J.A. Noble, & A. Zisserman. Microscopy cell counting with fully convolutional regression networks.
In: Deep Learning Workshop in MICCAI, 2015, pp. 1–8.
[6] H. Chen, Q. Dou, X. Wang, J. Qin, & P.A. Heng. Mitosis detection in breast cancer histology images via
deep cascaded networks. In: Proceedings of the 30th Conference on Artificial Intelligence (AAAI), 2016,
pp. 1160–1166.
[7] M. Mabaso, D. Withey, & B. Twala. Spot detection in microscopy images using convolutional neural network
with sliding-window approach. In: Proceedings of the 11th International Joint Conference on Biomedical
Engineering Systems and Technologies (BIOSTEC 2018) 2, 2018, pp. 67–74. doi: 10.5220/000672420067
0074.
[8] NFFA-EUROPE homepage. Available at: http://www.nffa.eu/about/.
[9] M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, … & B.
Mons. The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3(2016),
Article No.160018. doi: 10.1038/sdata.2016.18.
[10] CNR-IOM homepage. Available at: www.iom.cnr.it.
[11] R. Aversa, M.H. Modarres, S. Cozzini, R. Ciancio, & A. Chiusole. The first annotated set of scanning electron
microscopy images for nanoscience. Scientific Data 5 (2018), Article No. 180172. doi: 10.1038/sdata.
2018.172.
[12] R. Aversa. Scientific image processing within the NFFA-EUROPE data repository. Master thesis, International
School for Advanced Studies (SISSA-ISAS), 2016. Available at: https://iris.sissa.it/handle/20.500.11767/
36168#.XOZO3y2B1Bw.
[13] M.H. Modarres, R. Aversa, S. Cozzini, R. Ciancio, A. Leto, & G.P. Brandino. Neural network for nanoscience
scanning electron microscope image recognition. Scientific Reports 7 (2017), Article No. 13282. doi:
10.1038/s41598-017-13565-z.
[14] J. Yosinski, J. Clune, Y. Bengio, & H. Lipson. How transferable are features in deep neural networks? In:
Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1–9.

Data Intelligence 525

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

[15] C. De Nobili. Deep learning for nanoscience scanning electron microscope image recognition. Master
thesis, International School for Advanced Studies (SISSA-ISAS), 2017. Available at: https://iris.sissa.it/handle/
20.500.11767/68034#.XOZO1C2B1Bw.
[16] P. Coronica. Feature learning and clustering analysis for images classification. Master thesis, International
School for Advanced Studies (SISSA-ISAS), 2018. Available at: https://iris.sissa.it/handle/20.500.11767/
84226#.XOZLvi2B1Bw.
[17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, & L. Fei-Fei. ImageNet: A large-scale hierarchical image database.
In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, 248–255. doi: 10.1109/
CVPR.2009.5206848.
[18] ImageNet Large Scale Visual Recognition Challenge (ILSVRC) homepage. Available at: http://www.image-

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

net.org/challenges/LSVRC.
[19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, ... & A. Rabinovich. Going deeper
with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
doi: 10.1109/CVPR.2015.7298594.
[20] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, & Z. Wojna. Rethinking the inception architecture for computer
vision. arXiv preprint. arXiv:1512.00567, 2015.
[21] C. Szegedy, S. Ioffe, & V. Vanhoucke. Inception-v4, Inception-ResNet and the impact of residual connections
on learning. In: Proceedings of the 31th AAAI Conference on Artificial Intelligence (AAAI-17), 2017, pp. 1–3.
[22] C3HPC homepage. Available at: www.c3hpc.it.
[23] eXact Lab srl homepage. Available at: www.exact-lab.it.
[24] A. Krizhevsky, I. Sutskever, & G.E. Hinton. Imagenet classification with deep convolutional neural networks.
In: Advances in Neural Information Processing Systems, 2002, pp. 1097–1105.
[25] S.J. Pan & Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering
22(10)(2010), 1345–1359. doi: 10.1109/TKDE.2009.191.
[26] M.P. Sampat, Z. Wang, S. Gupta, A. C. Bovik, & M.K. Markey. Complex wavelet structural similarity: A new
image similarity index. IEEE Transactions on Image Processing 18(11)(2009), 2385–2401. doi: 10.1109/
TIP.2009.2025923.
[27] L. Wang, Y. Zhang, & J. Feng. On the Euclidean distance of images. IEEE Transactions on Pattern Analysis and
Machine Intelligence 27(8)(2005), pp. 1334–1339. doi: 10.1109/TPAMI.2005.165.
[28] L. van Der Maaten, E. Postma, & J. Van den Herik. Dimensionality reduction: A comparative review. Journal
of Machining Learning Research 10(2009), 66–71.
[29] A. Géron. Hands-on machine learning with Scikit-Learn and Tensor-Flow. Boston, MA: O’Reilly Media: ,
2017. isbn: 9781491962299
[30] E. Facco, M. d’Errico, A. Laio, & A. Rodriguez. Estimating the intrinsic dimension of datasets by a minimal
neighborhood information. Scientific Report 7(1)(2017), 1–8. doi:
[31] A. Rodriguez & A. Laio. Clustering by fast search and find of density peaks. Science 344(6191)(2014),
1492–1496.
[32] L. Danon, A. Diaz-Guilera, J. Duch, & A. Arenas. Comparing community structure identification. Journal of
Statistical Mechanics: Theory and Experiment 2005(9)(2005), P09008.

526 Data Intelligence

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

AUTHOR BIOGRAPHY

Rossella Aversa got a PhD in Astrophysics at the International School for

Advanced Studies (SISSA-ISAS) in Trieste, Italy. In the same town, she finalized
her studies with a Master in High Performance Computing (MHPC) and
worked as postdoc at CNR-IOM for three years, acquiring experience in
machine learning techniques and data management. She is currently employed
at Karlsruhe Institute of Technology (KIT) in Karlsruhe, Germany.
ORCID: 0000-0003-2534-0063

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Piero Coronica holds a PhD in Geometry from the International School for
Advanced Studies (SISSA-ISAS, Trieste) combined with further post graduate
studies in High Performance Computing and data science. He currently works
as part of the HPC team at the University of Cambridge’s Research Computing
Services assisting national and international research groups to carry out
academic work. The projects he has been involved in apply state-of-the-art
machine learning techniques to advance active research fields ranging from
astronomy to digital humanities and medical imaging.
ORCID: 0000-0001-8235-1899

Cristiano De Nobili is a theoretical particle physicist. After his PhD in

statistical physics at the International School for Advanced Studies (SISSA-
ISAS) (2016) and a further master in High-Performance Computing (MHPC),
he has been involved in deep learning. Starting from computer vision he is
now a senior deep learning scientist in the field of Natural Language Processing
working on Samsung's virtual assistant. Moreover, Cristiano is a machine/
deep learning instructor for several masters in both the private and academic
sectors. He is an active speaker and recently gave a TEDx talk on Artificial
Intelligence.
ORCID: 0000-0002-8429-1831

Data Intelligence 527

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Stefano Cozzini is presently director of the Institute of Research and

Technologies at Area Science Park where he coordinates several scientific
infrastructures and projects at national and international level. He has more
than 20 years’ experience in the area of scientific computing and HPC/Data
e-infrastructures. His main scientific interests are scientific computing and
machine learning techniques applied to scientific data management. He is
presently actively involved in the master’s degree on Data Science and
Scientific Computing master at University of Trieste, Italy.
ORCID: 0000-0001-6049-5242

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

528 Data Intelligence

ZEN 3.11 - Software Manual
No ratings yet
ZEN 3.11 - Software Manual
1,350 pages
2301.07499v1
No ratings yet
2301.07499v1
177 pages
deloitte data engineer
No ratings yet
deloitte data engineer
7 pages
158257747
No ratings yet
158257747
98 pages
ChromLab User Guide
No ratings yet
ChromLab User Guide
438 pages
1-s2.0-S0169023X24000090-main
No ratings yet
1-s2.0-S0169023X24000090-main
17 pages
A 12 Garbage Classification Using Deep Learning Techniques
No ratings yet
A 12 Garbage Classification Using Deep Learning Techniques
7 pages
Efficacy of Deep Learning Algorithms in Detecting Lung Cancer
No ratings yet
Efficacy of Deep Learning Algorithms in Detecting Lung Cancer
6 pages
Cat and dog 1
No ratings yet
Cat and dog 1
9 pages
Fine-Tuning CNN Image Retrieval With No Human Annotation
No ratings yet
Fine-Tuning CNN Image Retrieval With No Human Annotation
14 pages
VGG (Simonyan and Zisserman)
No ratings yet
VGG (Simonyan and Zisserman)
14 pages
10 Codes
88% (180)
10 Codes
1 page
LostNet a Smart Way for Lost and Find
No ratings yet
LostNet a Smart Way for Lost and Find
17 pages
Amazon Case Study
No ratings yet
Amazon Case Study
2 pages
Lightweight
No ratings yet
Lightweight
23 pages
Contemporary Perspective in Elearning Research
No ratings yet
Contemporary Perspective in Elearning Research
285 pages
Price list Waratah H212 Parts
No ratings yet
Price list Waratah H212 Parts
2 pages
Reusability report: Deep learning-based analysis of images and spectroscopy data with AtomAI
No ratings yet
Reusability report: Deep learning-based analysis of images and spectroscopy data with AtomAI
7 pages
An Analysis On Object Recognition Using Convolutional Neural Networks
No ratings yet
An Analysis On Object Recognition Using Convolutional Neural Networks
8 pages
SHabareshTS REPORT 38
No ratings yet
SHabareshTS REPORT 38
34 pages
Automatic Classification of Mechanical Components of Engines Using Deep Learning Techniques
No ratings yet
Automatic Classification of Mechanical Components of Engines Using Deep Learning Techniques
10 pages
IOS 18 On Iphone 13 - How Does It Run
No ratings yet
IOS 18 On Iphone 13 - How Does It Run
3 pages
EBSD Calssification
No ratings yet
EBSD Calssification
12 pages
Analysis of SEM Images With Artificial Intelligenc
No ratings yet
Analysis of SEM Images With Artificial Intelligenc
4 pages
Icc330 0619 Commercial Card Portal Manual Cardh Uk v1
No ratings yet
Icc330 0619 Commercial Card Portal Manual Cardh Uk v1
8 pages
Review of Image Classification Algorithms Based On
No ratings yet
Review of Image Classification Algorithms Based On
10 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Time Organization For The Music Visualization Extension Project in p5.js (One-Month Timeline)
No ratings yet
Time Organization For The Music Visualization Extension Project in p5.js (One-Month Timeline)
2 pages
Output
No ratings yet
Output
3 pages
Euler Method Practical
No ratings yet
Euler Method Practical
2 pages
A Survey On Deep Learning-Based Fine-Grained Object Classification and Semantic Segmentation
No ratings yet
A Survey On Deep Learning-Based Fine-Grained Object Classification and Semantic Segmentation
17 pages
Ad - Java Lab Manual 2021-22
No ratings yet
Ad - Java Lab Manual 2021-22
79 pages
Evaluating The Visualization of What A Deep Neural Network Has Learned
No ratings yet
Evaluating The Visualization of What A Deep Neural Network Has Learned
13 pages
2311.00825v1
No ratings yet
2311.00825v1
13 pages
CNN_MSE
No ratings yet
CNN_MSE
7 pages
Paper Review of Five Machine Vision Topics
No ratings yet
Paper Review of Five Machine Vision Topics
3 pages
Transguard-Company-Brochure
No ratings yet
Transguard-Company-Brochure
9 pages
CNN Maryim 2020
No ratings yet
CNN Maryim 2020
12 pages
Kusuma 2019 IOP Conf. Ser. Earth Environ. Sci. 335 012005
No ratings yet
Kusuma 2019 IOP Conf. Ser. Earth Environ. Sci. 335 012005
11 pages
Samsung and Nokia
No ratings yet
Samsung and Nokia
2 pages
Deep Learning For Material Recognition: Most Recent Advances and Open Challenges
No ratings yet
Deep Learning For Material Recognition: Most Recent Advances and Open Challenges
20 pages
Class 8 Ai - Answer Keys
63% (8)
Class 8 Ai - Answer Keys
9 pages
3098 15835 1 PB 2011 PDF
No ratings yet
3098 15835 1 PB 2011 PDF
6 pages
A C N V U S: Utomated Lassification of Anoparticles With Arious Ltrastructures and Izes
No ratings yet
A C N V U S: Utomated Lassification of Anoparticles With Arious Ltrastructures and Izes
11 pages
Environmental Models in Spatial Science
No ratings yet
Environmental Models in Spatial Science
13 pages
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
No ratings yet
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
6 pages
Project File X 402 24-25
No ratings yet
Project File X 402 24-25
8 pages
2122 Entr6081037 Lqda TK2-W4-S5-R2 Team1
No ratings yet
2122 Entr6081037 Lqda TK2-W4-S5-R2 Team1
1 page
Pcanet: A Simple Deep Learning Baseline For Image Classification?
No ratings yet
Pcanet: A Simple Deep Learning Baseline For Image Classification?
15 pages
Research Paper
No ratings yet
Research Paper
7 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
A Review On Multiscale-Deep-Learning Applications
No ratings yet
A Review On Multiscale-Deep-Learning Applications
28 pages
W11 Lecture ITS69204 Image Recognition (1)
No ratings yet
W11 Lecture ITS69204 Image Recognition (1)
44 pages
Remotesensing 13 04712 v2
No ratings yet
Remotesensing 13 04712 v2
51 pages
A_review_of_advances_in_image_recognition_models_F
No ratings yet
A_review_of_advances_in_image_recognition_models_F
5 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Memo
No ratings yet
Memo
12 pages
Object Detection Using CNN
No ratings yet
Object Detection Using CNN
5 pages
Visual Image Understanding
No ratings yet
Visual Image Understanding
7 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
Trustworthy - Final Essay
No ratings yet
Trustworthy - Final Essay
21 pages
Deep Learning Approach For Object Detection Using CNN: Abstract
No ratings yet
Deep Learning Approach For Object Detection Using CNN: Abstract
7 pages
Admin,+4554 Article+Text 17736 2 10 20210928
No ratings yet
Admin,+4554 Article+Text 17736 2 10 20210928
13 pages
Object Detection Using Convolutional Neural Network Transfer Learning
No ratings yet
Object Detection Using Convolutional Neural Network Transfer Learning
11 pages
O&M Scope of Work-1
100% (3)
O&M Scope of Work-1
6 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
1.convolutional Neural Networks For Image Classification
No ratings yet
1.convolutional Neural Networks For Image Classification
11 pages
Ijet 10892
No ratings yet
Ijet 10892
5 pages
Image Classification Using SVM and CNN: March 2020
No ratings yet
Image Classification Using SVM and CNN: March 2020
6 pages
Short Title
No ratings yet
Short Title
16 pages
A Brief Survey and An Application of Sem
No ratings yet
A Brief Survey and An Application of Sem
38 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Computer Vision Application
No ratings yet
Computer Vision Application
2 pages
Deep Convolutional Neural Network-Based Approaches
No ratings yet
Deep Convolutional Neural Network-Based Approaches
21 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Business Analysis Plan
No ratings yet
Business Analysis Plan
15 pages
Image Recognition in Self-Driving Cars Using CNN
No ratings yet
Image Recognition in Self-Driving Cars Using CNN
7 pages
Water Tank Stand
0% (1)
Water Tank Stand
1 page
Face Recognition Based On Convolutional Neural Network.: November 2017
No ratings yet
Face Recognition Based On Convolutional Neural Network.: November 2017
5 pages
ML Project Docs
No ratings yet
ML Project Docs
45 pages
Lecture 13 Image Segmentation Using Convolutional Neural Network
No ratings yet
Lecture 13 Image Segmentation Using Convolutional Neural Network
9 pages
Whirlpool FL 5064 (ET)
100% (1)
Whirlpool FL 5064 (ET)
8 pages
Image Segmentation in Deep Learning
No ratings yet
Image Segmentation in Deep Learning
12 pages
OLYMPUS Ultrasonic Thickness Gauge 26MG
No ratings yet
OLYMPUS Ultrasonic Thickness Gauge 26MG
2 pages
GSX-S150L8: Parts Catalogue
No ratings yet
GSX-S150L8: Parts Catalogue
92 pages
TC20 0809-xxxx 5780210563 BA en 2012 09
No ratings yet
TC20 0809-xxxx 5780210563 BA en 2012 09
98 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Dint A 00062

Uploaded by

Dint A 00062

Uploaded by

RESEARCH PAPER

Deep Learning, Feature Learning, and Clustering

Rossella Aversa1,2, Piero Coronica1,3, Cristiano De Nobili1,4 & Stefano Cozzini1,5†

KIT-Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

514 Data Intelligence

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Data Intelligence 515

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

516 Data Intelligence

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Data Intelligence 517

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

3.1 The 1μ–2μ Data Set Selection

518 Data Intelligence

3.2 Feature Learning

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

3.3 Intrinsic Dimension and Dimensional Reduction

Data Intelligence 519

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

ddisc(xi, xi) = dIi ,Ij (1)

where li is the label of the image xi.

520 Data Intelligence

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Data Intelligence 521

3.4 Clustering Analysis

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

522 Data Intelligence

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Data Intelligence 523

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

524 Data Intelligence

S. Cozzini ([email protected]) and R. Aversa ([email protected]) planned the overall work.

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Data Intelligence 525

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

526 Data Intelligence

Rossella Aversa got a PhD in Astrophysics at the International School for

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

Cristiano De Nobili is a theoretical particle physicist. After his PhD in

Data Intelligence 527

Stefano Cozzini is presently director of the Institute of Research and

Downloaded from http://direct.mit.edu/dint/article-pdf/2/4/513/1857507/dint_a_00062.pdf by guest on 03 June 2024

528 Data Intelligence

You might also like