0% found this document useful (0 votes)
5 views

Quantifying_the_Effects_of_Ground_Truth_Annotation_Quality_on_Object_Detection_and_Instance_Segmentation_Performance

This paper investigates the impact of ground truth annotation quality on the performance of object detection and instance segmentation models, specifically focusing on the COCO and Cityscapes datasets. By introducing uniform and radial noise into bounding box and polygon mask annotations, the study quantifies the degradation in mean average precision (mAP) as noise levels increase, revealing class-dependent effects. The findings highlight the critical relationship between annotation quality and model performance, emphasizing the need for careful consideration in dataset preparation for supervised learning tasks.

Uploaded by

SHYAM
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Quantifying_the_Effects_of_Ground_Truth_Annotation_Quality_on_Object_Detection_and_Instance_Segmentation_Performance

This paper investigates the impact of ground truth annotation quality on the performance of object detection and instance segmentation models, specifically focusing on the COCO and Cityscapes datasets. By introducing uniform and radial noise into bounding box and polygon mask annotations, the study quantifies the degradation in mean average precision (mAP) as noise levels increase, revealing class-dependent effects. The findings highlight the critical relationship between annotation quality and model performance, emphasizing the need for careful consideration in dataset preparation for supervised learning tasks.

Uploaded by

SHYAM
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received 8 February 2023, accepted 10 March 2023, date of publication 13 March 2023, date of current version 16 March 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3256723

Quantifying the Effects of Ground Truth


Annotation Quality on Object Detection and
Instance Segmentation Performance
CATHAOIR AGNEW , CIARÁN EISING , (Member, IEEE), PATRICK DENNY, (Member, IEEE),
ANTHONY SCANLAN , PEPIJN VAN DE VEN, AND EOIN M. GRUA
Data-Driven Computer Engineering (D2 iCE) Group, Department of Electronic and Computer Engineering, University of Limerick, Limerick, V94 T9PX Ireland
CONFIRM Centre for Smart Manufacturing, University of Limerick, Limerick, V94 T9PX Ireland
Corresponding author: Cathaoir Agnew ([email protected])
This work was supported by the Science Foundation Ireland (SFI) under Grant 16/RC/3918 (CONFIRM Centre).

ABSTRACT Fully-supervised object detection and instance segmentation models have accomplished
notable results on large-scale computer vision benchmark datasets. However, fully-supervised machine
learning algorithms’ performances are immensely dependent on the quality of the training data. Preparing
computer vision datasets for object detection and instance segmentation is a labor-intensive task requiring
each instance in an image to be annotated. In practice, this often results in the quality of bounding
box and polygon mask annotations being suboptimal. This paper quantifies empirically the ground truth
annotation quality and COCO’s mean average precision (mAP) performance by introducing two separate
noise measures, uniform and radial, into the ground truth bounding box and polygon mask annotations for
the COCO and Cityscapes datasets. Mask-RCNN models are trained on various levels of noise measures to
investigate the performance of each level of noise. The results showed degradation of mAP as the level of
both noise measures increased. For object detection and instance segmentation respectively, using the highest
level of noise measure resulted in a mAP degradation of 0.185 & 0.208 for uniform noise with reductions of
0.118 & 0.064 for radial noise on the COCO dataset. As for the Cityscapes datasets, reductions of mAP
performance of 0.147 & 0.142 for uniform noise and 0.101 & 0.033 for radial noise were recorded.
Furthermore, a decrease in average precision is seen across all classes, with the exception of the class
motorcycle. The reductions between classes vary, indicating the effects of annotation uncertainty are class-
dependent.

INDEX TERMS Annotation uncertainty, computer vision, instance segmentation, object detection, super-
vised learning.

I. INTRODUCTION limited to just convolutional neural network-based architec-


Following AlexNet’s success in ImageNet Large Scale Visual tures, as progress has been made with Vision Transformers
Recognition Competition (ILSVRC) in 2012 [1], a great Models [5] and Graph Convolutional Networks [6]. Sun et al.
deal of work has gone into refining deep neural network credited deep learning’s recent success in computer vision
architectures for computer vision tasks. This has led to tasks to three primary aspects [7]. To begin with, graph-
various convolutional neural network-based architectures ics processing units and parallel processing are becoming
developed for computer vision tasks, such as SegNet [2], more widely available, allowing for the training of big-
Mask RCNN [3] and YOLO [4]. The advancements are not ger models [7]. Following this, there have been technical
improvements in network architecture design, parameter ini-
The associate editor coordinating the review of this manuscript and tialization, and training methodologies [7], [8]. Finally, the
approving it for publication was Victor Sanchez . availability of vast and expanding datasets is increasing [7].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
25174 VOLUME 11, 2023
C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

We believe another key factor for deep learning’s recent relationship between annotation quality and performance will
success is the availability and facility of deep learning frame- yield helpful insight into the trade-off between annotation
works such as TensorFlow [9], PyTorch [10] and Apache quality and the time and cost associated with such annotation
MXNet [11], which enabled deep learning to become more quality. This information in turn will allow for informed
accessible to the broader research community. The recent decision-making and enables the tailoring of annotations to
advancements in deep learning methodologies for computer the use case of the application.
vision tasks have yielded momentous technologies in many The paper is structured as follows. In Section II an
domains such as intelligent transportation systems [12], [13], overview of related work is discussed. Then, in Section III,
sports analytics [14], [15] and medical imaging [16], [17]. an explanation for how annotation uncertainty is modeled for
These advancements are not restricted to RGB imagery, with this work is given. This is followed by a description of the
advances in infrared [18] and hyperspectral imagery [19]. experiment in Section IV and a presentation of experimental
Neural networks’ performance for supervised computer results in Section V. In Section VI, these results are analyzed
vision tasks relies on the data they are trained on. This and discussed. Lastly, in Section VII, the conclusions of this
includes the annotations that are utilized as ground truth for work are summarised.
supervised learning algorithms. Sun et al. found that perfor-
mance on vision tasks improves logarithmically as the train- II. RELATED WORK
ing dataset size increases [7]. Due to the large quantity of data Taran et al. used Cityscapes [31] fine and coarse annotated
that is regularly needed, the process of annotating datasets images to investigate the effects ground truth annotation
for computer vision-supervised learning tasks is time inten- quality has on semantic image segmentation performance of
sive. For example, it required approximately 60,000 worker traffic conditions [30]. The authors explored two situations,
hours to annotate the Common Objects in Context (COCO) firstly using the fine ground truth annotations for both train-
dataset [20]. For object detection, bounding boxes must be ing and inferencing; secondly training with the fine ground
manually annotated over the classes of interest for the entire truth annotations but inferencing on the coarse ground truth
dataset. Employing a crowd-sourcing method [21] that is annotations. PSPNet [32] was used for the semantic segmen-
optimized for bounding box annotation, each annotation in tation model and a subset of the Cityscapes dataset was used
the ImageNet Visual Recognition dataset [22] took around for the analysis, which included data from 3 cities and the
35s to annotate. For instance segmentation, a polygon mask following classes; road, car, pedestrian, traffic lights, and
must be outlined around each class of interest for the dataset. signs. Using mean intersection over union (IoU) as the metric
Polygon annotations are more accurate than bounding boxes of interest, the authors found the IoU values for coarse ground
but are also more laborious. This is reflected by the annotation truth annotated images in general, were higher than those for
time estimated to be 79.2s per polygon mask for the popular fine ground truth annotated images. In light of the results
COCO dataset [20]. of comparing fine and coarse ground truth annotations, the
The importance of ground truth annotation quality has authors suggest that deep neural networks could be utilized
been acknowledged for computer vision tasks in the litera- to generate coarse ground truth annotated datasets, that can
ture, with methods being developed attempting to rectify and be modified and used to fine-tune the pre-trained models for
account for noisy labels in computer vision tasks such as the specific application.
object detection [23], [24] and image classification [25], [26], A study by Mullen Jr et al. [33] compared annotation
[27], [28]. To the authors’ knowledge, there is limited lit- types and their effects on object detection performance on
erature attempting to quantify the effects the ground truth the Overhead Imagery Research Dataset (OIRDS) [34]. Three
bounding box and polygon mask annotation quality have on annotation types were considered for the analysis to detect
object detection and instance segmentation performance. cars from the OIRDS; polygon masks, bounding boxes, and
The main contribution of this paper is to quantify empir- target centroids. A modified version of the Overfeat [35]
ically the annotation quality levels and their effects on network architecture was used for the analysis. A Receiver
mAP [29] by introducing noise into both bounding boxes Operating Characteristic (ROC) curve assessed at all pixel
and polygon masks on a subset of the COCO dataset. To the locations along with the area under the curve (AUC) was
authors’ knowledge, this work is the first to investigate anno- calculated for the 3 annotation types. The results showed
tation uncertainty for three different aspects. To begin with, polygon mask annotations scored marginally better AUC than
this work is the first to investigate annotation uncertainty the other two annotation types. The authors concluded when
for instance segmentation. Secondly, for object detection, putting together a dataset for deep learning, comparing anno-
this work introduces noise to the scale of pixel distance, tation types is a key step, as the cost of annotations and the
allowing a finer scale of annotation uncertainty which may advantages and disadvantages of each annotation type should
be more representative of annotation uncertainty seen in be considered.
practice. Finally, the effects of annotation uncertainty on Xu et al. investigated training object detectors with noisy
each individual class’ average precision (AP) performance labels [23], including incorrect class labels and imprecise
are investigated to provide further insight. Quantifying the bounding boxes on both PASCAL VOC 2012 [36] and COCO

VOLUME 11, 2023 25175


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

2017 [20]. Xu et al. proposed Meta-Refine-Net, a meta-


learning-based approach to train more robust detectors from
noisy labels. In this study, the authors generated imprecise
bounding boxes by shifting the original annotations by factors
of the bounding boxes’ width and height. Category noise was
also included by randomly sampling a chosen proportion of
objects and modifying the class label to be incorrect. The
results showed degradation in mAP for both incorrect class
labels and imprecise bounding boxes for all ranges of noise
used on both datasets.
Acuna et al. noted there is a substantial amount of label
noise in relevant datasets for semantic border prediction [37].
The goal of semantic border prediction is to determine which
pixels correspond to object boundaries. The authors presented
a simple yet effective thinning layer and loss that can be
utilized with boundary detectors, at the time of publishing, FIGURE 1. Example of bounding box annotation from COCO dataset [20].
to reduce the label noise effects during training. The authors’
experiments revealed an improvement of 18.61% in average
precision on the CASENet [38] backbone network using the imprecise bounding boxes. However, in this research, the
new thinning layer and loss, along with significant improve- induced annotation noise used will be constant across object
ments in thinning semantic border labels over existing meth- sizes rather than using factors of the original bounding boxes’
ods on the Semantic Boundaries Dataset [39] and Cityscapes width and height. It is the authors’ belief that keeping the
dataset [31]. degradation across object sizes constant would yield a more
Rolnick et al. investigated label noise for image classi- comparable experiment across classes. This research also
fication using deep learning [40]. The following datasets delves into the effect annotation uncertainty has on each indi-
were used in the study; ImageNet [41], MNIST [42] and vidual class for the ranges of induced noise measures used in
CIFAR [43]. The authors concluded with 3 key takeaways. these experiments, which Xu et al. did not investigate. Acuna
Firstly, instead of just memorizing noise, deep neural net- et al. and Rolnick et al. investigated noise in semantic border
works can generalize after training on noisy data. Secondly, prediction and image classification respectively, but these
given a large enough training set, neural networks can handle results do not fully extend to object detection and instance
a wide range of label noise levels. Lastly, larger batch sizes segmentation due to the difference in annotation methodolo-
and downscaling the learning rate can offset the influence of gies. Our work furthers the investigation of annotation quality
noisy labels on effective batch size. and its effect on object detection performance and extends it
Whilst the presented literature answers a number of ques- to instance segmentation.
tions related to the influence of annotation quality, the
performance degradation for varying levels of annotation III. MODELING ANNOTATION UNCERTAINTY
uncertainty remains unknown. The objective of this study For supervised learning computer vision tasks, each image
is to quantify empirically the annotation quality levels for requires an associated annotation to be able to learn from. For
bounding boxes and polygon masks and the effects it has object detection, bounding boxes must be manually annotated
on mAP. Whereas Taran et al. investigated the effects of over the classes of interest for each image in the dataset.
ground truth annotation quality for semantic segmentation An example of a bounding box annotation for the class dog
using Cityscapes [31] fine and coarse datasets, the disparity in the COCO dataset [20] can be seen in Fig. 1. For instance
between the fine and coarse datasets does not yield insight segmentation, a polygon mask must be outlined around each
into the various levels of annotation uncertainty that may class of interest for each image in the dataset. An example
arise in object detection and instance segmentation datasets. of a polygon mask annotation for the class dog in the COCO
A direct comparison between results would also not be fea- dataset [20] can be seen in Fig. 2. A class label must also
sible due to the difference in annotation methodologies. For be given with each annotated object for object detection and
semantic segmentation, each individual pixel in an image instance segmentation. For this work, the focus of annotation
must be annotated, however, for object detection and instance uncertainty is on the polygons and bounding boxes. Class
segmentation, only the objects of interest are annotated in labels have not been tampered with. As such any effect of
an image. Mullen Jr et al. highlighted the need for exploring class label noise inherent in the COCO dataset would be
different annotation types and the associated costs along with consistent between all experiments.
each type, but the effects of annotation noise were not consid- Two methods for modelling annotation uncertainty were
ered in their study. Xu et al. investigated noisy labels, using used for this research. Firstly, Shapely’s polygon buffer
factors of the bounding boxes’ width and height to introduce method [44] was used to introduce an approximate uniform

25176 VOLUME 11, 2023


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

FIGURE 2. Example of bounding box & polygon mask annotation from


COCO dataset [20].

noise by expanding the ground truth annotation outwards by


an approximate euclidean pixel distance, as seen in Fig. 5. FIGURE 3. Example of Algorithm 1 on a datapoint.
A uniform noise was introduced as a means to set a baseline
for annotation uncertainty for these experiments. The pixel
distance ranged from the integer values of 1 to 10 inclusive.
COCO defines the bounding box annotation with x, y relating
to the upper-left coordinates of the bounding box, the width
defines the distance the object spans on the x-axis and finally
the height defines the distance the object spans on the y-axis.
The bounding boxes were updated to include the relevant
uniform pixel distance noise according to (1). In Equation (1)
width and height are represented by w and h, φ is the pixel
distance noise used, and finally, xu , yu , wu , hu represent the
new datapoints with uniform noise for the bounding box.
xu =x−φ
yu =y−φ
wu = w + 2(φ)
hu = h + 2(φ) (1)
FIGURE 4. Ground truth annotation from COCO dataset [20].
Secondly, Gaussian radial noise was added to each vertex
of the polygon mask to model annotation uncertainty. The
In Fig. 4, the ground truth annotation for the chair is shown.
Gaussian radial noise followed Algorithm 1 to introduce
In Fig. 5 the ground truth annotation using Shapely’s buffer
annotation uncertainty, with the standard deviation (σ ) vary-
method with an approximate uniform buffer pixel distance of
ing from the integer values of 1-5 inclusive to create 5 datasets
5 is shown. And finally, in Fig. 6 the radial noise is introduced
of varying degrees of modelled annotation quality. The range
with a σ = 5. Yellow circles are used to highlight some
for the allowable angles of θ was used to help push the poly-
differences in the annotations for Fig. 5 and Fig. 6 relative
gon masks outwards. An example of Algorithm 1 performed
to the ground truth annotation in Fig. 4.
on a single data point can be seen in Fig. 3. The bounding
When investigating annotation quality for a sample of the
boxes were updated following (2), where σ is shared between
COCO dataset, it was observed that annotation uncertainty
the polygon masks and bounding boxes and xr , yr , wr , hr
was generally on the outer side of the object, relating to
represent the new datapoints with radial noise.
expanding out the annotation. To the authors’ knowledge, the
xr = x − |N (0, 12 )| true distribution of annotation uncertainty for object detection
yr = y − |N (0, 12 )| and instance segmentation datasets is unknown. Taking this
into consideration, uniform noise and radial noise were used
wr = w + |N (0, σ 2 )| to model annotation uncertainty. Lastly, reducing the annota-
hr = h + |N (0, σ 2 )| (2) tions inwards was susceptible to self-intersecting polygons,

VOLUME 11, 2023 25177


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

Algorithm 1 Algorithm for Adding Radial Noise to Polygon


Mask
Input: Vertices for polygon mask, image dimensions
Output: Vertices with added radial noise for polygon mask
1: Calculate centroid of polygon mask; xcentroid , ycentroid
2: for xi , yi in polygon mask vertices do
3: Calculate the relevant quadrant for the current point
relative to the centroid
xdiffi := xi − xcentroid
ydiffi := yi − ycentroid
4: if (xdiffi ≥ 0 and ydiffi ≥ 0 ) then
5: θ := |N (45, 152 )|
6: end if
7: if (xdiffi ≤ 0 and ydiffi ≥ 0 ) then
8: θ := |N (135, 152 )|
9: end if
10: if (xdiffi ≤ 0 and ydiffi ≤ 0 ) then FIGURE 5. Shapely’s buffer method (distance = 5) Annotation from COCO
dataset [20].
11: θ := |N (225, 152 )|
12: end if
13: if (xdiffi ≥ 0 and ydiffi ≤ 0 ) then
14: θ := |N (315, 152 )|
15: end if
16: Calculate a random Gaussian value to move by
distancer := |N (0, σ 2 )|
17: Update values of x and y to reflect the added radial
noise
xi := xi + distancer ∗ cos(θ)
yi := yi + distancer ∗ sin(θ)
18: Ensure new data points are within the image dimen-
sions, 0, maxheight and 0,maxwidth
19: if (xi > maxwidth ) then
20: xi := maxwidth
21: end if
22: if (yi > maxheight ) then
23: yi := maxheight
FIGURE 6. Radial noise method (σ = 5) annotation from COCO
24: end if dataset [20].
25: if (xi ≤ 0) then
26: xi := 0
27: end if The objective of this study is to investigate the relationship
28: if (yi ≤ 0) then of the bounding box and polygon mask annotation quality on
29: yi := 0 the metrics of interest and to quantify the change of metrics to
30: end if each level of noise, relative to a baseline model trained with
31: end for the original ground truth annotations.
FiftyOne [45] was used to download 25,000 images from
COCO’s original training dataset that contained 11 classes,
which in turn can create numerous multipolygons for the one class per super category, which was randomly selected.
single polygon annotation. On account of this, reducing the The class person was omitted to avoid severe class imbalance.
annotation inwards was out of scope for this work. This is reflected in Table 1, where the class counts are shown
for 25,000 images when including the class person in com-
IV. EXPERIMENTAL DESIGN parison to not including the class in the selection criteria for
A. DATASET downloading the dataset. The randomly selected classes for
These experiments were conducted on a subset of the COCO the experiments were; bicycle, traffic light, dog, umbrella,
2017 dataset [20] and the Cityscapes dataset [31]. A subset of skateboard, bottle, pizza, chair, tv, oven, and vase.
the COCO dataset and its classes were used as this allowed The 25,000 images were then split using an 80/20 train
a more reasonable training time for the models. It was not and validation split. The test set images were selected
the aim of the study to attain state-of-the-art performance. from COCO’s original validation dataset that contained

25178 VOLUME 11, 2023


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

TABLE 1. Class counts for COCO dataset classes selection.

the selected classes. This resulted in a test dataset size


of 1,775 images. The resulting dataset breakdown was a
train/validation/test split of 75%/19%/6% with 19,946 train-
ing images, 5,054 validation images, and a test set of
1,775 images. FIGURE 7. Evidence of overfitting when comparing the train & validation
scores per epoch for the COCO dataset.
For the Cityscapes dataset [31], the 5,000 images that are
finely annotated were used for the experiments. The dataset
was converted to be utilized for the task of object detec-
tion and instance segmentation. The Cityscapes benchmark
considers 8 classes for the instance-level semantic labelling
task, the classes are as follows; person, rider, car, truck, bus,
train, motorcycle and bicycle. The original training dataset
was split into an approximate 80/20 train and validation split
which resulted in 2,400 images for training and 575 images
for validation. The original validation set of 500 images is
used as the out-of-sample test dataset.
A breakdown of each of the datasets can be seen in Table 10
and Table 11. The distribution for each class’ object size is
also given as a percentage under the columns Small, Medium,
and Large. The COCO definitions [29] for small, medium,
and large object sizes are used. Only single object annotations
were considered for this work to minimize the complexity of
the problem. Run length encoding (RLE) annotations which
are used to annotate a crowd of objects were omitted from
both datasets. RLE annotations are identified with COCO’s
iscrowd = 1 parameter, whereas single object annotations are
FIGURE 8. Evidence of overfitting when comparing the train & validation
identified with iscrowd = 0. scores per Epoch for the cityscapes dataset.

B. TRAINING SETUP
The MMdetection framework [46] was used to train 12 hours to train for Cityscapes. A batch size of 2 images
Mask-RCNN models with a ResNet-50 backbone for the was utilized, due to hardware constraints, with a stochastic
experiments [3], [47]. One advantage of this model is its gradient descent (SGD) optimizer using a learning rate of
ability to output both object detection and instance segmen- 0.02, a momentum of 0.9, and a weight decay of 0.0001.
tation results [3]. All experiments were conducted on a sin- A learning rate scheduler was utilized to drop the learning
gle workstation with an NVIDIA GeForce RTX 3060 GPU rate by a factor of 10 at training epoch numbers 65 and 71.
card with CUDA 11.6. For the experiments, the training and Evidence of over-fitting was apparent after epoch 66 when
validation annotations contained the relevant induced noise. training on the ground truth annotations, as seen in Fig. 7 for
The test dataset remained with the original annotations and the COCO dataset. As for Cityscapes, evidence of over-fitting
has not been tampered with. The Mask-RCNN models were was apparent after epoch 65. The model weights from epoch
trained from scratch for 73 epochs, which took approximately 66 and 65 were used for inferencing on the respective test
120 hours per model to train for the COCO dataset and datasets.

VOLUME 11, 2023 25179


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

C. METRICS
COCO’s defined mean average precision [29] (mAP) is
the primary metric of interest. When considering individual
classes, average precision (AP) can be and is used in place
of mAP. A breakdown of mAP and the individual classes’
AP results will be reported to provide further insight into
annotation quality and performance. COCO’s definitions for
mean average precision for small, medium, and large objects
are denoted by mAPs , mAPm , and mAPl , whereas the mean
average precision requiring an IoU threshold of 0.5 and
0.75 are denoted by mAP0.5 and mAP0.75 . mAP0.50:0.05:0.95
denotes the COCO primary challenge metric [29].
Whereas comparing the reduction in mAP scores provides
insight into how the individual components of mAP degraded,
this would not yield an appropriate comparison between
object sizes or classes. For example, if the initial score for
mAPs = 0.1, using the original annotations, the most mAPs FIGURE 9. COCO dataset Object Detection mAP results.
could degrade is its initial starting point. To put this into
perspective, if mAPl = 0.5 using the original annotations,
and was to degrade by 0.15 when using noise-induced anno-
tations, it would result in an mAPl = 0.35. However, looking
only at differences, mAPs has degraded less, yet no small
objects are being detected.
To provide further insight, linear regression models were
fitted to the individual components of mAP scores for each
level of noise, in an attempt to provide a standardized com-
parison between object classes and sizes, that relates directly
to mAP. The linear regression models were fitted on a single
variable, induced noise level, which in turn gives the interpre-
tation of the β coefficient; for a one-unit increase in induced
noise level, on average the mAP score will increase by β.

V. RESULTS
The results were obtained from the test set of 1,775 images
of the COCO dataset and 500 test images for the Cityscapes
dataset. The approximate uniform buffered pixel distances FIGURE 10. Cityscapes dataset object detection mAP results.
used in this experiment range from 1 to 10 inclusive, with
the radial induced noise ranging from σ = 1 to 5. The
ground truth annotations were also used to train a baseline values 0 to 5 inclusive and the second using pixel distance
model. In the figures to follow in this section, a noise level buffering size values 6 to 10.
of 0 refers to the ground truth annotations. Mask R-CNN
models were trained with both noise-induced annotations A. OBJECT DETECTION
along with ground truth annotations. For both the COCO The results of the experiments for object detection are out-
and Cityscapes datasets, 10 datasets, one for each level of lined in this section. In Fig. 9 and Fig. 10 a plot of the
approximate uniform pixel distance, were used to train the individual components of mAP against pixel distance buffer-
models. For radial noise, 5 datasets, one for each level of ing size for the uniform noise and σ for radial noise is
the standard deviation of radial noise were used to train the given for the COCO and Cityscapes datasets respectively.
models. For the fitted linear regression models in this section, Linear regression models were used to model the relationship
β refers to the coefficient of the pixel distance buffering size between induced noise measures and mAP. The results of the
variable for the uniform noise models, whereas for the radial models are presented in Table 2 for the COCO dataset and
noise models β refers to the coefficient of the σ variable. Table 3 for the Cityscapes dataset. In Fig. 11 and Fig. 12
A 95% confidence interval is given in square brackets for a plot of the per-class AP0.50:0.05:0.95 against pixel distance
the constant and β coefficients. Due to the saturation of buffering size for the uniform noise and σ for radial noise
results in mAPs after epoch 5 for the uniform models, two is given for the COCO and Cityscapes datasets respec-
linear regression models were used for mAPs . The first linear tively. The plots are separated by the majority object size
regression model was fit from pixel distance buffering size for the class in the test dataset. This was utilized for ease

25180 VOLUME 11, 2023


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

TABLE 2. COCO dataset linear regression model results for object TABLE 4. COCO dataset linear regression per-class model results for
detection. object detection.

TABLE 3. Cityscapes dataset linear regression model results for object


detection.

TABLE 5. Cityscapes dataset linear regression per-class model results for


object detection.

of readability. Linear regression models were used to model


the relationship between induced noise measures and per-
class AP0.50:0.05:0.95 . The results of these models are pre-
sented in Table 4 and Table 5.

B. INSTANCE SEGMENTATION
The results of the experiments for instance segmentation are
outlined in this section. In Fig. 13 and Fig. 14 a plot of
the individual components of mAP against pixel distance
buffering size for the uniform noise and σ for radial noise
is given for the COCO and Cityscapes datasets respectively.
Linear regression models were used to model the relationship
between induced noise measures and mAP. The results of the
models are presented in Table 6 for the COCO dataset and
Table 7 for the Cityscapes dataset. In Fig. 15 and Fig. 16
a plot of the per-class AP0.50:0.05:0.95 against pixel distance AP0.50:0.05:0.95 . The results of these models are presented in
buffering size for the uniform noise and σ for radial noise Table 8 and Table 9.
is given for the COCO and Cityscapes datasets respectively.
The plots are separated by the majority object size for the VI. DISCUSSION
class in the test dataset. This was utilized for ease of read- The results enable us to compare the mAP performance
ability. Linear regression models were used to model the for object detection and instance segmentation for vari-
relationship between induced noise measures and per-class ous ground truth annotation qualities. For object detection,

VOLUME 11, 2023 25181


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

FIGURE 12. Cityscapes dataset object detection AP per-class results.

FIGURE 11. COCO dataset object detection AP per-class results.

as seen in Fig. 9 and Fig. 10, when introducing uniform noise


into the datasets, there was a reduction across all compo-
nents of mAP. For the radially-induced noise, the degradation
across the components of mAP is lesser in comparison to
the uniform noise, albeit there is still degradation as anno-
tation uncertainty increases. These results indicate there is a
degradation in mAP performance when introducing annota-
tion uncertainty into the annotations for object detection, for
both noise types; uniform and radial. This reflects the need FIGURE 13. COCO dataset instance segmentation mAP results.
for accurate bounding boxes to be utilized as ground truth
annotations for object detection.
Looking into the per-class scores, as seen in Fig. 11 and used in the experiments for both induced noise types.
Fig. 12 along with the negative β coefficients in Table 4 However, the reductions between classes vary. This suggests
and Table 5, there was a reduction for all of the classes that annotation quality and AP0.50:0.05:0.95 performance are

25182 VOLUME 11, 2023


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

TABLE 8. COCO dataset linear regression per-class model results for


instance segmentation.

FIGURE 14. Cityscapes dataset instance segmentation mAP results.

TABLE 6. COCO dataset linear regression model results for instance


segmentation.

decreases in AP0.50:0.05:0.95 for both induced noise types for


object detection.
For instance segmentation, as seen in Fig. 13 and Fig. 14,
when introducing uniform noise into the datasets, there was
a reduction across all components of mAP. For radially-
TABLE 7. Cityscapes dataset linear regression model results for instance induced noise, the degradation across the components of
segmentation. mAP is far less severe in comparison to the uniform noise,
however, there is still some degradation as annotation uncer-
tainty increases. These results indicate there is a degradation
in mAP performance when introducing annotation uncer-
tainty into the annotations for instance segmentation, for
both noise types; uniform and radial. This reflects the need
for accurate polygon masks to be utilized as ground truth
annotations for instance segmentation.
Looking into the per-class scores, as seen in Fig. 15 and
Fig. 16 along with the β coefficients in Table 8 and Table 9,
there was a reduction for most of the classes used in the exper-
iments for both induced noise types. However, the reduc-
tions between classes vary. This suggests that annotation
quality and AP0.50:0.05:0.95 performance are class-dependent
for instance segmentation. Again a potential factor for the
observed class dependence is the size of the objects of inter-
class-dependent for object detection. One potential factor for est. The classes traffic light and bottle had a majority of
the observed class dependence for object detection is the size their instances in the size small category for the COCO
of the objects of interest. The classes traffic light and bottle dataset. Both these classes resulted in significant decreases
had a majority of their instances in the size small category for in AP0.50:0.05:0.95 for both induced noise types for instance
the COCO dataset. Both these classes resulted in significant segmentation.

VOLUME 11, 2023 25183


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

FIGURE 16. Cityscapes dataset instance segmentation per-class mAP


results.

of one extra person predicted correctly, which results in an


increase of 0.03%. Due to this class imbalance, the fitted
FIGURE 15. COCO dataset instance segmentation per-class mAP results. linear regression models would struggle to account for this
variance, which has resulted in lower adjusted R2 values for
the mAP components in comparison to the COCO counter-
For the Cityscapes dataset, when looking into the per-class parts. As the impact of each of the smaller class sizes would
AP results for both object detection as seen in Fig. 12 and impact each of the mAP calculations, a reduction across the
instance segmentation, as seen in Fig. 16, the variance is quite adjusted R2 values is expected. With all of these factors in
significant for the classes truck, bus, train and motorcycle. mind, it is important to note the results for the small class
An explanation for this variance is the small sample size of sizes should not be given great consideration.
the classes in the dataset, as seen in Table 11. As these classes Whereas for the COCO dataset, a strong linear relationship
are less than 1% of the number of instances in each of the between both noise types and mAP for object detection and
train, validation and test datasets, this in turn would result instance segmentation was observed. An explanation for this
in higher variances in the models. Small sample sizes can strong linear relationship in comparison to the Cityscapes
also skew the results due to the impact one instance can have results may be down to the more evenly distributed classes as
on the overall percentage. For example, on the Cityscapes seen in Table 10. An adjusted R2 of 0.978 and 0.956 for object
test dataset, getting one extra truck predicted correctly would detection and instance segmentation were recorded respec-
result in an increase of 1.1% in comparison to the impact tively for the uniform noise, as seen in Table 4 and Table 8.

25184 VOLUME 11, 2023


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

TABLE 9. Cityscapes dataset linear regression per-class model results for would only experience a marginal degradation. However, all
instance segmentation.
things considered, the results show as the degradation of
the annotation increases, a reduction in mAP performance is
observed. This reflects the need for accurate annotations for
supervised learning computer vision tasks.
Radial noise degradation for instance segmentation is
lower than object detection. An explanation for this may be
down to how the radial noise was implemented. As bounding
boxes only require four normally distributed data points to
introduce the noise; the first two update the x & y co-ordinates
for the bounding box starting position, the third updates the
width and the fourth and final point updates the height, there
is a possibility, due to the nature of the normal distribution,
for relatively high values being introduced. This in turn would
significantly degrade the bounding box annotation.
The findings of this study have to be seen in light of
some limitations. To the authors’ knowledge, no prior work
has modelled annotation uncertainty for object detection or
instance segmentation datasets. In light of this information,
the use of uniform noise and normally distributed radial noise
was selected to model annotation uncertainty. This work
allows us to quantify the degradation of mAP with respect
to modelled annotation uncertainty to better understand the
For radial noise, the adjusted R2 is 0.974 for object detection relationship between annotation quality and performance.
and 0.954 for instance segmentation. The β coefficient from
the linear regression models yields insight into quantify- VII. CONCLUSION AND FUTURE WORK
ing the performance degradation. For a one-unit increase in In this paper, the relationship between object detection and
pixel distance buffer size, on average the mAP0.50:0.05:0.95 instance segmentation annotation quality and mAP perfor-
will reduce by -0.0195[-0.022, -0.017] for object detection mance is studied. The observed results were attained by a
and -0.0221[-0.025, -0.019] for instance segmentation. For Mask-RCNN model with a ResNet-50 backbone on a subset
a one-unit increase in σ for radial noise, on average the of the COCO 2017 challenge and Cityscapes datasets. The
mAP0.50:0.05:0.95 will reduce by -0.0241[-0.029, -0.019] for ground truth annotations for both bounding boxes and poly-
object detection and -0.0135[-0.017, -0.010] for instance seg- gon masks had two separate types of noise introduced to the
mentation. annotations; uniform and radial.
A reduction across mAP for object detection and instance For object detection and instance segmentation, both types
segmentation when introducing annotation uncertainty is no of induced noise negatively affected the mAP. When investi-
surprise. As supervised-learning neural network performance gating the per-class AP0.50:0.05:0.95 performance, there was a
relies on the quality of the annotations, a degradation in the reduction seen in all classes but motorcycle used in the experi-
annotation quality will be reflected in a reduction of the mAP. ments, with the reductions between classes varying. This sug-
This work set out to investigate the relationship between gests that annotation quality and AP0.50:0.05:0.95 performance
annotation quality and mAP performance. For both types of is class-dependent. A strong linear relationship was observed
induced noise used in this research, the noise from the training between both noise types and mAP for the COCO dataset.
data has forced the model to include some noise around the An adjusted R2 of 0.978 for uniform noise and 0.974 for
objects of interest when inferencing with the model, thus radial noise was recorded for object detection, with instance
reducing the IoU with the ground truth annotation on the segmentation recording an adjusted R2 of 0.956 for uniform
test set, resulting in a reduction in mAP. This noise during noise and 0.954 for radial noise when using mAP0.50:0.05:0.95 .
inferencing was more apparent for the uniform noise models. For radially-induced noise for instance segmentation, there
However, the model predictions still allow for the identifica- is some robustness for σ = 1, as the degradation is less than
tion and localization of the objects of interest. 2% for all components of mAP. While the required accuracy
A direct comparison between uniform and radially- of mask predictions for instance segmentation is application
induced noise may not be a fair comparison, due to the dependent, this work has quantified the degradation in mAP
nature of the induced noise for both. The uniform noise sig- for varying annotation qualities to help inform any decisions
nificantly degrades each vertex for instance segmentation in on annotation labelling quality and the expected degradation.
comparison to the radial noise. The radial noise was normally This study has quantified empirically the performance
distributed and centred around 0, meaning some vertices between annotation quality and mAP when introducing two

VOLUME 11, 2023 25185


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

different noises to the ground truth annotations for a subset APPENDIX B


of the COCO 2017 and Cityscapes datasets. The reduction CITYSCAPES DATASET
in mAP across both noise measures for object detection and
instance segmentation reflects the need for accurate polygon TABLE 11. Cityscapes dataset.
and bounding boxes for fully supervised object detection and
instance segmentation tasks.
Future research should further develop and confirm these
initial findings by conducting experiments on more diverse
computer vision datasets, such as other benchmark datasets
used for object detection and instance segmentation with
different model architectures to investigate if the results
from these experiments generalize. Additionally, the use of
transfer learning with noisy annotations should be investi-
gated to determine if the results deviate from the current
experiments, which were trained from scratch. Finally, com-
bining the noise types used into a single dataset would
be of interest, as this may better reflect the annotation
uncertainty when multiple annotators are used to annotate a
dataset.

APPENDIX A
COCO DATASET

TABLE 10. COCO dataset.

REFERENCES
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro-
cess. Syst. (NIPS), vol. 25, 2012, pp. 1106–1114.
[2] V. Badrinarayanan, A. Kendall, and R. Cipolla, ‘‘SegNet: A deep con-
volutional encoder–decoder architecture for image segmentation,’’ IEEE
Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,
Dec. 2017.
[3] K. He, G. Gkioxari, P. Dollar, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc.
ICCV, 2017, pp. 2961–2969.
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit., Jun. 2016, pp. 779–788.
[5] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah,
‘‘Transformers in vision: A survey,’’ ACM Comput. Surv., vol. 54, no. 10,
pp. 1–41, 2022.
[6] P. Cao, Z. Zhu, Z. Wang, Y. Zhu, and Q. Niu, ‘‘Applications of graph con-
volutional networks in computer vision,’’ Neural Comput. Appl., vol. 34,
no. 16, pp. 13387–13405, 2022.
[7] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, ‘‘Revisiting unreasonable
effectiveness of data in deep learning era,’’ in Proc. IEEE Int. Conf.
Comput. Vis. (ICCV), Oct. 2017, pp. 843–852.
[8] D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, ‘‘Deep learning
with noisy labels: Exploring techniques and remedies in medical image
analysis,’’ Med. Image Anal., vol. 65, Oct. 2020, Art. no. 101759.
[9] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro,
G. S. Corrado, A. Davis, J. Dean, M. Devin, and S. Ghemawat, ‘‘Ten-
sorFlow: Large-scale machine learning on heterogeneous distributed sys-
tems,’’ 2016, arXiv:1603.04467.

25186 VOLUME 11, 2023


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

[10] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, [32] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, ‘‘Pyramid scene parsing
Z. Lin, N. Gimelshein, L. Antiga, and A. Desmaison, ‘‘PyTorch: An imper- network,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
ative style, high-performance deep learning library,’’ in Proc. Adv. Neural Jul. 2017, pp. 2881–2890.
Inf. Process. Syst. (NIPS), vol. 32, 2019, pp. 1–12. [33] J. F. Mullen, F. R. Tanner, and P. A. Sallee, ‘‘Comparing the effects of
[11] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, annotation type on machine learning detection performance,’’ in Proc.
and Z. Zhang, ‘‘MXNet: A flexible and efficient machine learning library IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW),
for heterogeneous distributed systems,’’ 2015, arXiv:1512.01274. Jun. 2019.
[12] T. Liang, H. Bao, W. Pan, and F. Pan, ‘‘Traffic sign detection via improved [34] F. Tanner, B. Colder, C. Pullen, D. Heagy, M. Eppolito, V. Carlan, C. Oertel,
sparse R-CNN for autonomous vehicles,’’ J. Adv. Transp., vol. 2022, and P. Sallee, ‘‘Overhead imagery research data set-an annotated data
pp. 1–16, Mar. 2022. library & tools to aid in the development of computer vision algorithms,’’
[13] C. Eising, J. Horgan, and S. Yogamani, ‘‘Near-field perception for low- in Proc. IEEE Appl. Imag. Pattern Recognit. Workshop (AIPR), Oct. 2009,
speed vehicle automation using surround-view fisheye cameras,’’ IEEE pp. 1–8.
Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 13976–13993, Sep. 2022. [35] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun,
[14] R. Zhang, L. Wu, Y. Yang, W. Wu, Y. Chen, and M. Xu, ‘‘Multi-camera ‘‘OverFeat: Integrated recognition, localization and detection using convo-
multi-player tracking with deep player identification in sports video,’’ lutional networks,’’ 2013, arXiv:1312.6229.
Pattern Recognit., vol. 102, Jun. 2020, Art. no. 107260. [36] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman,
‘‘The Pascal visual object classes (VOC) challenge,’’ Int. J. Comput. Vis.,
[15] G. Thomas, R. Gade, T. B. Moeslund, P. Carr, and A. Hilton, ‘‘Computer
vol. 88, no. 2, pp. 303–338, Sep. 2009.
vision for sports: Current applications and research topics,’’ Comput. Vis.
[37] D. Acuna, A. Kar, and S. Fidler, ‘‘Devil is in the edges: Learning semantic
Image Understand., vol. 159, pp. 3–18, Jun. 2017.
boundaries from noisy annotations,’’ in Proc. IEEE Conf. Comput. Vis.
[16] A. Esteva, K. Chou, S. Yeung, N. Naik, A. Madani, A. Mottaghi, Y. Liu,
Pattern Recognit., Jun. 2019, pp. 11075–11083.
E. Topol, J. Dean, and R. Socher, ‘‘Deep learning-enabled medical com-
[38] Z. Yu, C. Feng, M.-Y. Liu, and S. Ramalingam, ‘‘CASENet: Deep
puter vision,’’ Npj Digit. Med., vol. 4, no. 1, pp. 1–9, Jan. 2021.
category-aware semantic edge detection,’’ in Proc. IEEE Comput.
[17] J. Gao, Y. Yang, P. Lin, and D. S. Park, ‘‘Computer vision in healthcare Soc. Conf. Comput. Vis. Pattern Recognit., Venice, Italy, Jul. 2017,
applications,’’ J. Healthcare Eng., vol. 2018, Mar. 2018, Art. no. 5157020. pp. 5964–5973.
[18] R. Zhang, L. Xu, Z. Yu, Y. Shi, C. Mu, and M. Xu, ‘‘Deep-IRTarget: [39] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, ‘‘Seman-
An automatic target detector in infrared imagery using dual-domain tic contours from inverse detectors,’’ in Proc. Int. Conf. Comput. Vis.,
feature extraction and allocation,’’ IEEE Trans. Multimedia, vol. 24, Nov. 2011, pp. 991–998.
pp. 1735–1749, 2022. [40] D. Rolnick, A. Veit, S. Belongie, and N. Shavit, ‘‘Deep learning is robust
[19] X. Yang, Y. Ye, X. Li, R. Y. K. Lau, X. Zhang, and X. Huang, ‘‘Hyperspec- to massive label noise,’’ 2017, arXiv:1705.10694.
tral image classification with deep learning models,’’ IEEE Trans. Geosci. [41] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet:
Remote Sens., vol. 56, no. 9, pp. 5408–5423, Sep. 2018. A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput.
[20] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ [42] Y. LeCun. (1998). The MNIST Database of Handwritten Digits. [Online].
in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2014, Available: http://yann.lecun.com/exdb/mnist/
pp. 740–755. [43] A. Krizhevsky, ‘‘Learning multiple layers of features from tiny images,’’
[21] Y. Hu, Z. Ou, X. Xu, and M. Song, ‘‘A crowdsourcing repeated annotations Univ. Toronto, Toronto, ON, Canada, Tech. Rep. TR-2009, 2009.
system for visual object detection,’’ in Proc. 3rd Int. Conf. Vis., Image [44] S. Gillies. (2013). The Shapely User Manual. Accessed: Oct. 18, 2022.
Signal Process., Aug. 2019. [Online]. Available: https://pypi.org/project/Shapely
[22] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, [45] B. E. Moore and J. J. Corso. (2020). Fiftyone. [Online]. Available:
A. Karpathy, A. Khosla, M. Bernstein, and A. C. Berg, ‘‘ImageNet large https://github.com/voxel51/fiftyone
scale visual recognition challenge,’’ Int. J. Comput. Vis., vol. 115, no. 3, [46] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu,
pp. 211–252, Dec. 2015. J. Xu, and Z. Zhang, ‘‘MMDetection: Open MMLab detection toolbox and
[23] Y. Xu, L. Zhu, Y. Yang, and F. Wu, ‘‘Training robust object detectors from benchmark,’’ 2019, arXiv:1906.07155.
noisy category labels and imprecise bounding boxes,’’ IEEE Trans. Image [47] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for
Process., vol. 30, pp. 5782–5792, 2021. image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[24] H. Li, Z. Wu, C. Zhu, C. Xiong, R. Socher, and L. S. Davis, ‘‘Learning from Jul. 2016, pp. 770–778.
noisy anchors for one-stage object detection,’’ in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10588–10597.
[25] Z. Zhang and M. Sabuncu, ‘‘Generalized cross entropy loss for training
deep neural networks with noisy labels,’’ in Proc. Adv. Neural Inf. Process. CATHAOIR AGNEW received the B.S. degree
Syst. (NIPS), vol. 31, 2018, pp. 1–11. in financial mathematics and the M.S. degree in
[26] L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, ‘‘MentorNet: Learning artificial intelligence and machine learning from
data-driven curriculum for very deep neural networks on corrupted labels,’’ the University of Limerick, Limerick, Ireland, in
in Proc. Int. Conf. Mach. Learn., Jul. 2018, pp. 2304–2313. 2020 and 2021, respectively, where he is currently
[27] M. Ren, W. Zeng, B. Yang, and R. Urtasun, ‘‘Learning to reweight exam- pursuing the Ph.D. degree in electronic and com-
ples for robust deep learning,’’ in Proc. Int. Conf. Mach. Learn., Jul. 2018, puter engineering. His research interests include
pp. 4334–4343. artificial intelligence and computer vision.
[28] J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, ‘‘Meta-
weight-net: Learning an explicit mapping for sample weighting,’’ in Proc.
Adv. Neural Inf. Process. Syst. (NIPS), vol. 32, 2019, pp. 1–12.
[29] Microsoft COCO. (2021). Detection Evaluation Metrics. Accessed:
Oct. 18, 2022. [Online]. Available: https://cocodataset.org/#detection-eval
CIARÁN EISING (Member, IEEE) received the
B.E. degree in electronic and computer engineer-
[30] V. Taran, Y. Gordienko, A. Rokovyi, O. Alienin, and S. Stirenko, ‘‘Impact
of ground truth annotation quality on performance of semantic image ing and the Ph.D. degree from the National Uni-
segmentation of traffic conditions,’’ in Advances in Computer Science for versity of Ireland, Galway, in 2003 and 2010,
Engineering and Education II, Z. Hu, S. Petoukhov, I. Dychka, and M. He, respectively. From 2009 to 2020, he worked as a
Eds. Cham, Switzerland: Springer, 2020, pp. 183–193. Computer Vision Team Lead and an Architect with
[31] M. Cordts, M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, Valeo Vision Systems, where he also held the title
R. Benenson, U. Franke, S. Roth, and B. Schiele, ‘‘The cityscapes dataset of Senior Expert. In 2016, he held the position of
for semantic urban scene understanding,’’ in Proc. IEEE Conf. Com- Adjunct Lecturer with the National University of
put. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, Ireland, Galway. In 2020, he joined the University
pp. 3213–3223. of Limerick, as a Lecturer of artificial intelligence and computer vision.

VOLUME 11, 2023 25187


C. Agnew et al.: Quantifying the Effects of Ground Truth Annotation Quality

PATRICK DENNY (Member, IEEE) received the PEPIJN VAN DE VEN received the M.Sc. degree
B.Sc. degree in experimental physics and math- in electronic engineering from the Eindhoven Uni-
ematics from the National University of Ireland versity of Technology, The Netherlands, in 2000,
(NUI), Maynooth, Ireland, in 1993, and the M.Sc. and the Ph.D. degree in artificial intelligence for
degree in mathematics and the Ph.D. degree in autonomous underwater vehicles from the Univer-
physics from the University of Galway, Ireland, sity of Limerick (UL), in 2005. In 2018, he joined
in 1994 and 2000, respectively. He was with GFZ UL’s teaching staff, as a Senior Lecturer in arti-
Potsdam, Germany. From 1999 to 2001, he was an ficial intelligence. His research interests include
RF Engineer with AVM GmbH, Germany, devel- artificial intelligence and machine learning, with
oping the RF hardware for the first integrated a particular interest in medical applications.
GSM/ISDN/USB modem. After working in supercomputing with Compaq-
HP, from 2001 to 2002, he joined Connaught Electronics Ltd. (later Valeo),
Galway, Ireland, as a Team Leader of RF design. For more than 20 years,
he worked as a Lead Engineer, developing novel RF and imaging systems and
led the development of the first mass-production HDR automotive cameras
for leading car companies, including Jaguar Land Rover, BMW, and Daimler.
In 2010, he became an Adjunct Professor of engineering and informatics
with the University of Galway and a Lecturer of artificial intelligence with
the Department of Electronic and Computer Engineering, University of
Limerick, Ireland, in 2022. He is a Co-Founder and a Committee Member
of the IEEE P2020 Automotive Imaging Standards Group, the AutoSens
Conference on Automotive Imaging, and the IS&T Electronic Imaging
Autonomous Vehicles and Machines (AVM) Conference.
EOIN M. GRUA was born in Cork, Ireland,
ANTHONY SCANLAN received the B.Sc. degree in 1993. He received the B.S. degree in liberal arts
in experimental physics from the National Univer- and sciences from Amsterdam University College,
sity of Ireland, Galway, Galway, Ireland, in 1998, Amsterdam, The Netherlands, in 2015, the M.S.
and the M.Eng. and Ph.D. degrees in electronic degree in computer science from Swansea Uni-
engineering from the University of Limerick, versity, Swansea, Wales, in 2016, and the Ph.D.
Limerick, Ireland, in 2001 and 2005, respectively. degree in computer science from Vrije Universiteit
He is currently a Senior Research Fellow with Amsterdam, Amsterdam, in 2021. In 2021, he was
the Department of Electronic and Computer Engi- a Research Assistant with the University of Lim-
neering, University of Limerick, and has been a erick, Limerick, Ireland, where he is currently a
principal investigator for several research projects Postdoctoral Researcher with the Department of Electronic and Computer
in the areas of signal processing and data converter design. His current Engineering. His research interests include artificial intelligence, software
research interests include artificial intelligence, computer vision, and their engineering and architecture, and sustainability.
industrial and environmental applications.

25188 VOLUME 11, 2023

You might also like