0% found this document useful (0 votes)
38 views16 pages

SynthMorph - Learning Contrast-Invariant Registration Without Acquired Images (2022)

This document describes a new method called SynthMorph for learning MRI image registration without using acquired images during training. It uses generative models to synthesize diverse synthetic label maps and images, forcing networks to learn contrast-invariant features. Experiments show the method enables robust and accurate registration of arbitrary MRI contrasts, even when the target contrast is not seen during training, outperforming previous methods.

Uploaded by

t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views16 pages

SynthMorph - Learning Contrast-Invariant Registration Without Acquired Images (2022)

This document describes a new method called SynthMorph for learning MRI image registration without using acquired images during training. It uses generative models to synthesize diverse synthetic label maps and images, forcing networks to learn contrast-invariant features. Experiments show the method enables robust and accurate registration of arbitrary MRI contrasts, even when the target contrast is not seen during training, outperforming previous methods.

Uploaded by

t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO.

3, MARCH 2022 543

SynthMorph: Learning Contrast-Invariant


Registration Without Acquired Images
Malte Hoffmann , Benjamin Billot , Douglas N. Greve , Juan Eugenio Iglesias ,
Bruce Fischl , and Adrian V. Dalca

Abstract — We introduce a strategy for learning image accurately generalize to a broad array of MRI contrasts.
registration without acquired imaging data, producing pow- We present extensive experiments with a focus on 3D neu-
erful networks agnostic to contrast introduced by mag- roimaging, showing that this strategy enables robust and
netic resonance imaging (MRI). While classical registration accurate registration of arbitrary MRI contrasts even if the
methods accurately estimate the spatial correspondence target contrast is not seen by the networks during training.
between images, they solve an optimization problem for We demonstrate registration accuracy surpassing the state
every new image pair. Learning-based techniques are fast of the art both within and across contrasts, using a single
at test time but limited to registering images with contrasts model. Critically, training on arbitrary shapes synthesized
and geometric content similar to those seen during training. from noise distributions results in competitive performance,
We propose to remove this dependency on training data removing the dependency on acquired data of any kind.
by leveraging a generative strategy for diverse synthetic Additionally, since anatomical label maps are often available
label maps and images that exposes networks to a wide for the anatomy of interest, we show that synthesizing
range of variability, forcing them to learn more invariant images from these dramatically boosts performance, while
features. This approach results in powerful networks that still avoiding the need for real intensity images. Our code is
available at https://w3id.org/synthmorph.
Manuscript received July 28, 2021; revised September 20, 2021; Index Terms — Deformable image registration, data inde-
accepted September 28, 2021. Date of publication September 29, 2021; pendence, deep learning, MRI-contrast invariance.
date of current version March 2, 2022. This work was supported in
part by Alzheimer’s Research UK under Grant ARUK-IRG2019A-003;
in part by the European Research Council (ERC) under Starting Grant I. I NTRODUCTION
677697; and in part by the National Institutes of Health (NIH) under
Grant 1R01 AG070988-01, BRAIN Initiative Grant 1RF1MH123195-01,
Grant K99 HD101553, Grant U01 AG052564, Grant R56 AG064027,
Grant R01 AG064027, Grant R01 AG016495, Grant U01 MH117023,
I MAGE registration estimates spatial correspondences
between image pairs and is a fundamental component
of many neuroimaging pipelines involving data acquired
Grant P41 EB015896, Grant R01 EB023281, Grant R01 EB019956, across time, subjects, and modalities. Magnetic resonance
Grant R01 NS0525851, Grant R21 NS072652, Grant R01 NS083534,
Grant U01 NS086625, Grant U24 NS10059103, Grant R01 NS105820, imaging (MRI) uses pulse sequences to obtain images with
Grant S10 RR023401, Grant S10 RR019307, and Grant S10 RR023043. contrasts between soft tissue types. Different sequences can
(Corresponding author: Malte Hoffmann.) produce dramatically different appearance even for the same
This work involved human subjects or animals in its research. Approval
of all ethical and experimental procedures and protocols was granted anatomy. For neuroimaging, a range of contrasts is com-
by IRBs at Washington University in St. Louis (201603117) and Mass monly acquired to provide complementary information, such
General Brigham (2016p001689). as T1-weighted contrast (T1w) for inspecting anatomy or
Malte Hoffmann and Douglas N. Greve are with the
Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts T2-weighted contrast (T2w) for detecting abnormal fluids [1].
General Hospital, Charlestown, MA 02129 USA, and also with the Registration of such images is critical when combining infor-
Department of Radiology, Harvard Medical School, Boston, mation across acquisitions, for example to gauge the damage
MA 02115 USA (e-mail: [email protected]; dgreve@
mgh.harvard.edu). induced by a stroke or to plan a brain-tumor resection. While
Benjamin Billot is with the Centre for Medical Image Comput- rigid registration can be sufficient for aligning within-subject
ing, University College London, London WC1E 6BT, U.K. (e-mail: images acquired with the same sequence [2], images acquired
[email protected]).
Juan Eugenio Iglesias is with the Athinoula A. Martinos Center for with different sequences can undergo differential distortion
Biomedical Imaging, Massachusetts General Hospital, Charlestown, due to effects such as eddy currents and susceptibility artifacts,
MA 02129 USA, also with the Department of Radiology, Harvard Medical requiring deformable registration [3]. Deformable registration
School, Boston, MA 02115 USA, also with the Centre for Medical Image
Computing, University College London, London WC1E 6BT, U.K., and is also important for morphometric analyses [4]–[6], which
also with the Computer Science and Artificial Intelligence Laboratory, hinge on aligning images with an existing standardized atlas
MIT, Cambridge, MA 02139 USA (e-mail: [email protected]). that typically has a different contrast [7]–[9]. Given the central
Bruce Fischl and Adrian V. Dalca are with Athinoula A. Martinos Center
for Biomedical Imaging, Massachusetts General Hospital, Charlestown, importance of registration tasks within and across contrasts,
MA 02129 USA, also with the Department of Radiology, Harvard Medical and within and across subjects, the goal of this work is a
School, Boston, MA 02115 USA, and also with the Computer Science learning-based framework for registration agnostic to MRI
and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139 USA
(e-mail: [email protected]; [email protected]). contrast: we propose a strategy for training networks that excel
Digital Object Identifier 10.1109/TMI.2021.3116879 both within contrasts (e.g. between two T1w scans) as well as

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
544 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

across contrasts (e.g. T1w to T2w), even if the test contrasts fast registration. Supervised models learn to reproduce sim-
are not observed during training. ulated warps or deformation fields estimated by classical
Classical registration approaches estimate a deformation methods [21], [22], [24], [42]–[44]. In contrast, unsupervised
field between two images by optimizing an objective that bal- models minimize a loss similar to classical cost functions [17],
ances image similarity with field regularity [10]–[16]. While [45]–[47] such as normalized MI (NMI) [48] for cross-contrast
these methods provide a strong theoretical background and registration. In another cross-contrast registration paradigm,
can yield good results, the optimization needs to be repeated networks synthesize one contrast from the other, so that
for every new image pair, and the objective and optimization within-contrast losses can be used for subsequent nonlinear
strategy typically need to be adapted to the image type. registration [29], [49]–[53]. These methods all depend on
In contrast, learning-based registration uses datasets of images having training data of the target contrast. If no such data are
to learn a function that maps an image pair to a deforma- available during training, models generally predict inaccurate
tion field aligning the images [17]–[24]. These approaches warps at test time: a model trained on T1w-T1w pairs would
achieve sub-second runtimes on a GPU and have the poten- fail when applied within unseen contrasts (e.g. T2w-T2w) or
tial to improve accuracy and robustness to local minima. across unseen contrast combinations (e.g. T1w-T2w).
Unfortunately, they are limited to the MRI contrast available Recent approaches also use losses driven by label maps
during training and therefore do not generally perform well or sparse annotations (e.g. fiducials) for registering different
on unobserved (new) image types. For example, a model imaging modalities labeled during training, such as T2w
trained on pairs of T1w and T2w images will not accurately MRI and 3D ultrasound within the same subject [54], [55],
register T1w to proton-density weighted (PDw) images. With a or aiding existing formulations with auxiliary segmentation
focus on neuroimaging, we remove this constraint of learning data [17], [56]–[58]. While these label-driven methods can
methods and design an approach that generalizes to unseen boost registration accuracy compared to approaches using
MRI contrasts at test time. intensity-based loss functions, they are dependent on the lim-
ited annotated images available during training. Consequently,
these approaches do not perform well on unobserved MRI
A. Related Work contrasts.
1) Classical Methods: Deformable registration has been Data-augmentation strategies expose a model to a wider
widely studied [11], [12], [15], [16], [25]. Classical strategies range of variability than the training data encompasses, for
implement an iterative procedure that estimates an optimal example by randomly altering voxel intensities or applying
deformation field for each image pair. This involves maxi- deformations [59]–[62]. However, even these methods still
mizing an image-similarity metric, that compares the warped need to sample training data acquired with the target contrast.
moving and fixed images, and a regularization term that Similarly, transfer learning can be used to extend a trained
encourages desirable deformation properties such as preserva- network to new contrasts but does not remove the need for
tion of topology [10], [13]–[15]. Cost function and optimiza- training data with the target contrast [63]. Given the continuing
tion strategies are typically chosen to suit a particular task. development of new and improved MRI contrast types at ever
Simple metrics like mean squared error (MSE) or normalized higher field strengths, the reduction in accuracy evidenced
cross-correlation (NCC) [12] are widely used and provide by existing methods in the presence of novel image contrast
excellent accuracy for images of the same contrast [26]. becomes a limiting factor.
For registration across MRI contrasts, metrics such as
mutual information (MI) [27] and correlation ratio [28] are
often employed, although the accuracy achieved with them B. Contribution
is not on par with the within-contrast accuracy of NCC and In this work we present SynthMorph, a general strategy
MSE [29]. For some tasks, e.g. registering intra-operative for learning contrast-agnostic registration (Fig. 1). At test time,
ultrasound to MRI, estimating even approximate correspon- it can accurately register a wide variety of acquired images
dences can be challenging [30], [31]. While they are not with MRI contrasts unseen during training. SynthMorph
often used in neuroimaging, metrics based on patch similarity enables registration of real images both within and across
[32]–[36] and normalized gradient fields [37]–[39] out- contrasts, learning only from synthetic data that far exceed the
perform simpler metrics, e.g. on abdominal computer- realistic range of medical images. During training we synthe-
tomography (CT). Other methods convert images to a size images from label maps, whereas registration requires no
supervoxel representation, which is then spatially matched label maps at test time. First, we introduce a generative model
instead of the images [40], [41]. Our work also employs geo- for random label maps of variable geometric shapes. Second,
metric shapes, but instead of generating supervoxels from input conditioned on these maps, or optionally given other maps of
images, we synthesize arbitrary patterns (and images) from interest, we build on recent methods to synthesize images with
scratch during training to encourage learning contrast-invariant arbitrary contrasts, deformations, and artifacts [64]. Third,
features for spatial correspondence. the strategy enables us to use a contrast-agnostic loss that
2) Learning Approaches: Learning-based techniques mostly measures label overlap, instead of an image-based loss. This
use convolutional neural networks (CNNs) to learn a function leads to two SynthMorph network variants (sm) that yield
that directly outputs a deformation field given an image pair. substantial generalizability, both capable of registering any
After training, evaluating this function is efficient, enabling contrast combination tested without retraining: sm-shapes
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 545

Fig. 1. Unsupervised learning strategy for contrast-agnostic registration. At every mini batch, we synthesize a pair of 3D label maps {sm , sf } and
the corresponding 3D images {m, f } from noise distri butions. The label maps are incorporated into a loss that is independent of image contrast.

trains without acquired data of any kind, matches classi- data, but instead synthesizing arbitrary contrasts and shapes
cal state-of-the-art registration of neuroanatomical MRI, and from scratch (Fig. 1). We generate two paired 3D label
outperforms learning baselines at cross-contrast registration. maps {sm , s f } using a function gs (z) = {sm , s f } described
Variant sm-brains trains on images synthesized from brain below, given random seed z. However, if anatomical labels are
segmentations only and substantially outperforms all classical available, we can use these instead of synthesizing segmenta-
and learning-based baselines tested. tion maps. We then define another function g I (sm , s f , z̃) =
This work builds on and extends a preliminary conference {m, f } (described below) that synthesizes two 3D intensity
paper [65] presented at the IEEE International Symposium volumes {m, f } based on the maps {sm , s f } and seed z̃.
on Biomedical Imaging (ISBI) 2021. The extension includes This generative process resolves the limitations of exist-
a series of new experiments, new analyses of the framework ing methods as follows. First, training a registration net-
and regularization, and a substantially expanded discussion. work h θ (m, f ) using the generated images exposes it to
We also show that networks trained within the SynthMorph arbitrary contrasts and shapes at each iteration, removing the
strategy generalize to new image types with MRI contrasts dependency on a specific MRI contrast. Second, because we
unseen at training. Our contribution focuses on neuroimaging first synthesize label maps, we can use a similarity loss that
but provides a general learning framework that can be used measures label overlap independent of image contrast, thereby
to train models across imaging applications and machine- obviating the need for a cost function that depends on the
learning techniques. Our code is freely available as part of the contrasts being registered at that iteration. In our experiments,
VoxelMorph library [66] and at https://w3id.org/synthmorph. we use the (soft) Dice metric [67]
j j
2  |(sm ◦ φ)  s f |
J
II. M ETHOD
Ldis (φ, sm , s f ) =− j j
, (2)
j =1 |(sm ◦ φ) ⊕ s f |
A. Background J
Let m and f be a moving and a fixed 3D image, respec-
where s j represents the one-hot encoded label j ∈
tively. We build on unsupervised learning-based registration
{1, 2, . . . , J } of label map s, and  and ⊕ denote voxel-wise
frameworks and focus on deformable (non-linear) registration.
multiplication and addition, respectively.
These frameworks use a CNN h θ with parameters θ that
While the framework can be used with any parameterization
outputs the deformation φθ = h θ (m, f ) for image pair {m, f }.
of the deformation field φ, in this work we use a stationary
At each training iteration, the network h θ is given a pair of
velocity field (SVF) v, which is integrated within the network
images {m, f }, and parameters are updated by optimizing a
to obtain a diffeomorphism [11], [45], [68], that is invertible
loss function L(θ ; m, f, φθ ) similar to classical cost functions,
by design. We regularize φ using Lreg (φ) = 12 ∇u2 , where u
using stochastic gradient descent. Typically, the loss contains
is the displacement of the deformation field φ = I d + u.
an image dissimilarity term Ldis (m ◦ φθ , f ) that penalizes
differences in appearance between the warped image and the
fixed image, and a regularization term Lreg (φ) that encourages C. Generative Model Details
smooth deformations: 1) Label Maps: To generate input label maps with J labels
L(θ ; m, f, φθ ) = Ldis (m ◦ φθ , f ) + λLreg (φθ ), (1) of random geometric shapes, we first draw J smoothly varying
noise images p j ( j ∈ {1, 2, . . . , J }) by sampling voxels from
where φθ = h θ (m, f ) is the network output, and λ controls a standard distribution at lower resolution r p and upsampling
the weighting of the terms. Unfortunately, networks trained to full size (Fig. 2). Second, each image p j is warped with
this way only predict reasonable deformations for images a random smooth deformation field φ j (described below) to
with contrasts and shapes similar to the data observed during obtain images p̃ j = p j ◦ φ j . Third, we create an input
training. Our framework alleviates this dependency. label map s by assigning, for each voxel k of s, the label j
corresponding to image p̃ j that has the highest intensity, i.e.
B. Proposed Method Overview sk = arg max j ([ p̃ j ]k ).
We strive for contrast invariance and robustness to Given a selected label map s, we generate two new label
anatomical variability by requiring no acquired training maps. First, we deform s with a random smooth diffeomorphic
546 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

TABLE I
H YPERPARAMETERS . S PATIAL M EASURES A RE IN VOXELS . O UR I MAGES AND L ABEL M APS A RE 160 × 160 × 192 VOLUMES . F OR F IELDS
S AMPLED AT R ESOLUTION r, W E O BTAIN THE VOLUME S IZE BY M ULTIPLYING E ACH D IMENSION BY r AND R OUNDING U P. F OR E XAMPLE ,
A R ESOLUTION OF r = 1:40 R ELATIVE TO THE V OLUME S IZE 160 × 160 × 192 W OULD BE E QUIVALENT TO W ORKING
W ITH VOLUMES OF S IZE 4 × 4 × 5

and bσ are hyperparameters. To simulate partial volume


effects [73], we convolve the image using an anisotropic
Gaussian kernel K (σi=1,2,3 ) where σi=1,2,3 ∼ U(0, b K ).
We further corrupt the image with a spatially varying
intensity-bias field B [74], [75]. We independently sample
Fig. 2. Generation of input label maps. Smooth 3D noise images pj the voxels of B from a normal distribution N (0, σ B2 ) at
(j ∈ {1, 2, . . . , J}) are sampled from a standard distribution, then warped
by random deformations φj to cover a range of scales and shapes.
lower resolution r B relative to the full image size (described
We synthesize a label map s from the warped images p̃j = pj ◦ φj : for below), where σ B ∼ U(0, b B ). We upsample B to full size,
each voxel k of s, we assign label j corresponding to image p̃j where k and take the exponential of each voxel to yield non-negative
has the highest intensity j, i.e. sk =   j ([p̃j ]k ). We use J = 26. values before we apply B using element-wise multiplication.
We obtain the final images m and f through min-max nor-
malization and contrast augmentation through global expo-
nentiation, using a single normally distributed parameter
γ ∼ N (0, σγ2 ) for the entire image such that m = m̃ exp(γ ) ,
where m̃ is the normalized moving image, and similarly for
the fixed image (Fig. 3).
3) Random Transforms: We obtain the transforms φ j ( j =
1, 2, . . . , J ) for noise image p j by integrating random SVFs
Fig. 3. Data synthesis. Top: from random shapes. Bottom: if available, v j [11], [45], [46], [68]. We draw each voxel of v j as an
from anatomical labels. We generate a pair of label maps {sm , sf } and independent sample of a normal distribution N (0, σ j2 ) at lower
from them images {m, f } with arbitrary contrast. The registration network
then predicts the displacement um→f . If anatomical labels are used, resolution r p , where σ j ∼ U(0, b p ) is sampled uniformly, and
we generate {sm , sf } from separate subjects. each SVF is integrated and upsampled to full size. Similarly,
we obtain the transforms φm and φ f based on hyperparameters
transformation φm (described below) using nearest-neighbor rv and bv for sm-brains. For sm-shapes, we sample sev-
interpolation to produce the moving segmentation map sm = eral SVFs v m ∼ N (0, σv2 ) at resolutions rv ∈ {1:8, 1:16, 1:32},
s ◦ φm . An analogous process yields the fixed map s f . drawing a different σv for each to synthesize a more complex
Alternatively, if segmentations are available for the anatomy deformation, since the fixed and moving images are based
of interest, such as the brain, we randomly select and deform on the same input label map. The upsampled SVFs are then
input label maps instead of synthesizing them (Fig. 3). combined additively, and a similar procedure yields v f .
To generate two different images, we could start by using a
single segmentation twice, or two separate ones. In this work,
we sample separate brain label maps as this captures more
D. Implementation Details
realistic variability in the correspondences that the registration 1) Hyperparameters: The generative process requires a
network has to find. In contrast, for sm-shapes training, number of parameters. During training, we sample these based
we use a single label map s as input twice to ensure that on the hyperparameters presented in Table I. Their values are
topologically consistent correspondences exist. not chosen to mimic realistic anatomy or a particular MRI
2) Synthetic Images: From the pair of label maps {sm , s f }, contrast. Instead, we select hyperparameters visually to yield
we synthesize gray-scale images {m, f } building on genera- shapes and contrasts that far exceed the range of realistic
tive models of MR images used for Bayesian segmentation medical images, to force our networks to learn generalizable
[69]–[72] (Fig. 3). We extend a publicly available model [64] features that are independent of the characteristics of a specific
to make it suitable for registration, which, in contrast to contrast [59]. We thoroughly analyze the impact of varying
segmentation, involves the efficient generation of pairs of hyperparameters in our experiments.
images (Section II-D.3). Given a segmentation map s, we draw 2) Architecture: The models implement the network archi-
the intensities of all image voxels that are associated with tecture used in the VoxelMorph library [17], [45]: a convolu-
label j as independent samples from the normal distribution tional U-Net [60] predicts an SVF v θ from the input {m, f }.
N (μ j , σ j2 ). We sample the mean μ j and standard devia- As shown in Fig. 4, the encoder has 4 blocks consisting of
tion (SD) σ j for each label from continuous distributions a stride-2 convolution and a LeakyReLU layer (parameter
U(aμ , bμ ) and U(aσ , bσ ), respectively, where aμ , bμ , aσ , 0.2), that each halve the resolution relative to the inputs.
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 547

5 steps [11], [68]. Training uses the Adam optimizer [77] with
a batch size of one registration pair and an initial learning
rate of 10−4 , that we decrease to 10−5 in case of divergence.
We train each model until the Dice metric converges in the
synthetic training set, typically for 4 × 105 iterations.
To generate pairs of images with high variability for regis-
tration, we extend a model [64] implemented for a single-input
Fig. 4. U-Net architecture of φθ = hθ (m,f). Each block of the encoder segmentation task. First, we improve efficiency to meet the
features a 3D convolution with n = 256 filters and a LeakyReLU layer increased computational demand. For example, we replace
(0.2). Stride-2 convolutions each halve the resolution relative to the input.
In the decoder, each convolution is followed by an upsampling layer and a smoothing operations based on 3D convolutions by 1D con-
skip connection (long arrows). The SVF vθ is obtained at half resolution, volutions with separated Gaussian kernels. We also integrate
yielding the warp φθ after integration and upsampling. All kernels are of spatial augmentation procedures such as random axis flipping
size 3 × 3 × 3. The final layer uses n = 3 filters.
into a single deformation field, enabling their application
as part of one interpolation step. We also implement an
interpolation routine with fill-value-based extrapolation on the
GPU. The fill value enables extrapolating with zeros instead
of repeating voxels where the anatomy extends to the edge of
the image, making the spatial augmentation more realistic.
Second, we add to the data augmentation within the model
by expanding random axis flipping to all three dimensions, and
by drawing a separate smoothing kernel for each dimension
of space enabling randomized anisotropic blurring. We imple-
ment a more complex warp synthesis that generates and
combines SVFs at multiple spatial resolutions. We also extend
most augmentation steps to vary across batches, thereby
increasing variability.
Third, we simplify the code to improve its maintainability
and reusability. We use the external VoxelMorph and Neurite
libraries to avoid code duplication. We update the model to
support the latest TensorFlow version to benefit from the full
set of features including batch profiling and debugging in eager
execution mode.

Fig. 5. Synthetic training data. Top: random geometric shapes synthe- III. E XPERIMENTS
sized from noise distributions. Center: arbitrary contrasts synthesized We evaluate network variants trained with the proposed
from brain segmentations. Bottom: hybrid synthesis requiring acquired
MRI for contrast augmentation using smooth random lookup tables. strategy and compare their performance to several baselines.
The test sets include a variety of image contrasts and lev-
els of processing to assess method robustness. Our goal is
for SynthMorph to achieve unprecedented generalizabil-
The decoder features 3 blocks that each include a stride-
ity to new contrasts among neural networks, matching or
1 convolution, an upsampling layer, and a skip connection to
exceeding the accuracy of all classical and learning methods
the corresponding encoder block. We obtain the SVF v θ after
tested.
3 further convolutions at half resolution, and the warp φθ after
integration and upsampling.
All convolutions use 3 × 3 × 3 kernels. We use a default A. Data
network width of n = 256 unless stated otherwise. While the While SynthMorph training involves data synthesized
last layer of all networks employs n = 3 filters, we reduce the from label maps that vary widely beyond realistic ranges, all
width to n = 64 for the parameter sweeps of Section III-G and tests and method comparisons use only acquired MRI data.
the analysis of feature maps in Fig. 10 and Fig. 12, to lower the 1) Datasets:
computational burden and memory requirements and thereby a) OASIS, HCP-A, BIRN: We compile 3D brain-MRI
enable us to perform the analyses within our computational datasets from the Open Access Series of Imaging Studies
resources. We expect the results to be generally applicable as (OASIS) [78] and the Human Connectome Aging Project
we use the same synthesis and registration architecture, while (HCP-A) [79], [80]. OASIS includes T1w MPRAGE acquired
higher network capacities typically improve accuracy as long at 1.5 T with ∼(1 mm)3 resolution. HCP-A includes both T1w
as the training set is large enough. MEMPRAGE [81] and T2w T2SPACE [82] scans acquired at
3) Implementation: We implement our networks using Ten- 3 T with 0.8 mm isotropic resolution. We also use PDw 1.5-T
sorFlow/Keras [76]. We integrate SVFs using a GPU ver- BIRN [83] scans from 8 subjects, which include manual brain
sion [45], [46] of the scaling and squaring algorithm with segmentations. Fig. 6 shows typical image examples.
548 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

TABLE II
T EST R EGISTRATION S ETS C OMPILED F ROM OASIS, HCP-A AND
BIRN FOR E XPERIMENTS 1 AND 3. T HE S UPERSCRIPTS B AND X
I NDICATE S KULL -S TRIPPING AND R EGISTRATION ACROSS
D ATASETS ( E . G . B ETWEEN OASIS
AND HCP-A), R ESPECTIVELY

space [4], [91] at 1 mm isotropic resolution. Unless manual


segmentations are available, we derive brain and non-brain
labels for skull-stripping and evaluation using the contrast-
adaptive SAMSEG [6] method.
For each subject of the multi-FA and multi-TI datasets,
Fig. 6. Typical results for sm-brains and classical methods. Each we derive brain labels from a single acquired T1w image
row shows an image pair from the datasets indicated on the left. The using FreeSurfer [4], ensuring identical labels across all MRI
letters b and x mark skull-stripping and registration across datasets
(e.g. OASIS and HCP-A), respectively. We show the best classical contrasts obtained for the subject.
baseline: NiftyReg on the 1st , ANTs on the 2nd , and deedsBCV on We resample all cardiac frames to 256 × 256 × 112 volumes
all other rows. with isotropic 1-mm voxels and transfer the manual contours
into the same space.
b) UKBB, GSP: We obtain 7000 skull-stripped T1w
3) Dataset Use:
scans acquired at 3 T field strength. Of these, we source a) Training: We use the Buckner40 label maps for data
5000 MPRAGE images with 1 mm isotropic resolution from synthesis (Fig. 5) and SynthMorph training. For the learning
the UK Biobank (UKBB) [84] and 2000 MEMPRAGE [81] baselines, we use T1w and T2w images from 100 HCP-A
scans with 1.2 mm isotropic resolution from the Brain subjects, and all T1w images from GSP and UKBB.
Genomics Superstruct Project (GSP) [85]. b) Validation: For hyperparameter tuning and monitoring
c) Multi-FA, Multi-TI: We compile a series of spoiled model training, we use 10 registration pairs for each of the
gradient-echo (FLASH) [86] images for flip angles (FA) varied OASIS, HCP-A and BIRN contrast pairings described below.
between 2◦ and 40◦ in 2◦ steps. For each of 20 subjects, These subjects do not overlap with the training set.
we obtain contrasts ranging from PDw to T1w using the c) Test: Table II provides an overview of the contrast
steady-state signal equation with acquired parametric maps combinations compiled from OASIS, HCP-A, and BIRN.
(T1, T2∗ , PD) and sequence parameters: repetition time (TR) Except for the 8 PDw BIRN images, the subjects do not
20 ms, echo time (TE) 2 ms. Equivalently, we compile a overlap with the training or validation sets. We also use the
series of MPRAGE images for inversion times (TI) varied multi-FA, multi-TI and cardiac images for testing; none of
between 300 ms and 1000 ms in steps of 20 ms. For these data are used in training or validation.
each of 20 subjects, we fit MPRAGE contrasts based on
MP2RAGE [87] echoes acquired with parameters: TR/TE
B. Baselines
5000/2.98 ms, TI1 /TI2 700/2500 ms, FA 4◦ . Fig. 9 shows
typical examples of these data. We test classical registration with ANTs (SyN) [12]
d) Buckner40: We derive 40 distinct-subject segmentations using recommended parameters [92] for the NCC similar-
with brain and non-brain labels from T1w MPRAGE scans of ity metric within contrast and MI across contrasts. We test
the Buckner40 dataset [88], a subset of the fMRIDC structural NiftyReg [13] with the default cost function (NMI) and
data [89]. recommended parameters, and we enable its diffeomorphic
e) Cardiac MRI: We gather cine-cardiac MRI datasets from model with SVF integration as in our approach. Both ANTs
33 subjects [90]. Each frame is a stack of thick 6-13 mm slices and NiftyReg are optimized for neuroimaging applications,
with ∼(1.5 mm)2 in-plane resolution. The data include man- leading to appropriate parameters for our tasks. We also
ually drawn contours outlining the endocardial and epicardial run the deedsBCV [93] patch-similarity method, which we
walls of the left ventricle. Fig. 15 shows representative frames. tune for neuroimaging. To match the spatial scales of brain
2) Processing: As we focus on deformable registration, structures, we reduce the default grid spacing, search radius
we map all brain images into a common 160×160×192 affine and quantization step to 6 × 5 × 4 × 3 × 2, 6 × 5 × 4 × 3 × 2,
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 549

and 5 × 4 × 3 × 2 × 1, respectively, improving registration in another network on the Buckner40 anatomical labels instead of
our experiments. shapes (sm-brains). In this case, we sample {sm , s f } from
As a learning baseline, we train VoxelMorph (vm), two distinct label maps at each iteration and further deform
using an image-based NCC loss and the same architecture them using synthetic warps. We optimize the J = 26 largest
as SynthMorph, on 100 skull-stripped T1w images from brain labels in Ldis , similar to what VoxelMorph does for
HCP-A that do not overlap with the validation set. Similarly, validation [17] (see below).
we train another model with NMI as a loss on random
combinations of 100 T1w and 100 T2w images. This exposes D. Validation Metrics
each model to 9900 different cross-subject registration pairs,
To measure registration accuracy, we propagate the mov-
and vm-nmi to T1w-T1w, T1w-T2w and T2w-T2w con-
ing labels using the predicted warps and compute the Dice
trast pairings (both contrasts were acquired from the same
metric D [94] across a representative set of brain structures:
100 subjects). Following the original VoxelMorph imple-
amygdala, brainstem, caudate, ventral DC, cerebellar white
mentation [17], we train these baseline networks without data
matter and cortex, pallidum, cerebral white matter (WM)
augmentation, with the exception of randomized axis flipping.
and cortex, hippocampus, lateral ventricle, putamen, thalamus,
While we compare to learning baselines following their
3rd and 4th ventricle, and choroid-plexus. We average scores
original implementation [17], we also investigate if the per-
of bilateral structures. In addition to volumetric Dice overlap,
formance of these methods can be further improved. First,
we evaluate the mean symmetric surface distance S (MSD)
we retrain the baseline model adding a further 7000 T1w
between contours of the same moved and fixed labels. We also
images from UKBB and GSP to the training set to evaluate
compute the proportion of voxels where the warp φ folds, i.e.
whether the original finding that 100 images are sufficient [17]
det(Jφ ) ≤ 0 for voxel Jacobian Jφ .
holds true in our implementation, or whether the greater
anatomical variability would promote generalizability across
contrasts or datasets (vm-ncc-7k). E. Experiment 1: Baseline Comparison
Second, we explore to what extent augmentation can 1) Setup: For each contrast, we run experiments on 50 held-
improve accuracy, by retraining vm-ncc with 100 T1w out image pairs, where each image is from a different subject,
images, while augmenting the input images with random except for T1w-PDw pairs, of which we have eight. To assess
deformations as for sm-brains training (vm-ncc-aug). robustness to non-brain structures, we evaluate registration
Third, we train a new hybrid method using extreme within and across datasets, with and without skull-stripping,
contrast augmentation to explore if more variability in the using datasets of the same size. We determine whether the
training contrasts would help the network generalize (Fig. 5). mean differences between methods are significant using paired
At every iteration, we sample a registration pair from 100 T1w two-sided t-tests.
images and pass it to the similarity loss, while the network 2) Results: Fig. 5 shows examples of SynthMorph train-
inputs each undergo an arbitrary gray-scale transformation: ing data, and Fig. 6 shows typical registration results. Fig. 7
we uniformly sample a random lookup table (LUT) from compares registration accuracy across structures to all base-
U(0, 255), remapping the intensities {0, . . . , 255} to new val- lines, in terms of Dice overlap (Fig. 7a) and MSD (Fig. 7b).
ues of the same set. We smooth this LUT using a Gaussian By exploiting the anatomical information in a set of brain
kernel L(σ L = 64). labels, sm-brains achieves the best accuracy across all
Fourth, the synthesis enables supervised training if the datasets, even though no real MR images are used during
moving and fixed label maps {sm , s f } are generated from training. First, sm-brains outperforms classical methods on
the same input label map, so that the net warp is known. all tasks by at least 2.4 Dice points, and often much more
We analyze whether knowledge of the synthetic net warp ( p < 0.0003 for T1w-PDw, p < 4×10−15 for all other tasks).
can improve accuracy, by training models with the same Second, it exceeds the state-of-the-art accuracy of vm-ncc
architecture using an MSE loss between the synthesized for T1w-T1w registration, which is trained on T1w images,
and predicted SVFs v (sup-svf) or deformation fields φ by at least 0.6 Dice points ( p < 6 × 10−6 ). Importantly,
(sup-def), respectively. As for sm-shapes, we draw the across contrasts sm-brains outperforms all other methods,
SVFs {v m , v f } at several resolutions rv ∈ {1:8, 1:16, 1:32} to demonstrating its ability to generalize to contrasts. Compared
synthesize a more complex deformation since we use a single especially to baseline learning methods, which cannot gener-
brain segmentation map as input to ensure that a topologically alize to contrasts unseen during training, sm-brains leads
consistent spatial correspondence exists. by up to 45.1 points ( p < 6 × 10−7 for all cross-contrast
tasks). Compared to classical methods, the proposed method
outperforms by 2.9 or more points ( p < 0.0003 for T1w-PDw,
C. SynthMorph Variants p < 2 × 10−17 for other cross-contrast tasks).
For image-data and shape-agnostic training (sm-shapes), The shape and contrast-agnostic network sm-shapes
we sample {sm , s f } by selecting one of 100 random-shape matches the performance of the best classical method for
segmentations s at each iteration, and synthesizing two sepa- each dataset except T1w-T1w registration, where it slightly
rate image-label pairs from it. Each s contains J = 26 labels underperforms ( p < 8 × 10−11 ), despite never having been
that we include in the loss Ldis . Since brain segmentations are exposed to either imaging data or even neuroanatomy. Like
often available, even if not for the target MRI contrast, we train sm-brains, sm-shapes generalizes well to multi-contrast
550 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

Fig. 7. Registration accuracy compared to baselines as (a) volume overlap D using the Dice metric, and (b) mean symmetric surface distance S
between label contours. Each box shows mean accuracy over anatomical structures for 50 test-image pairs across distinct subjects (8 for PD). The
letters b and x indicate skull-stripping and registration across datasets (e.g. OASIS-HCP), respectively. Arrows indicate values off the chart.

registration, matching or exceeding the accuracy of all base- increases by 0.1 point ( p < 0.04), with no significant change
lines, and by significant margins compared to learning base- for T2w-to-T2w registration, but overall these 0.13% changes
lines ( p < 8 × 10−7 for T1w-PDw, p < 2 × 10−17 otherwise). are negligible. Similar to vm-ncc, these models do not gen-
The baseline learning methods vm-ncc and vm-nmi per- eralize to unseen pairings across contrasts, under-performing
form well and clearly match or outperform classical meth- sm-brains by 42.9 or more points (Fig. 8a, p < 10−8 ).
ods at contrasts similar to those used in training. However, Augmenting T1w image contrast using random LUTs
as expected, these approaches break down when tested on (Fig. 5) substantially enhances performance across contrasts
a pair of new contrasts that were not sampled during train- for hybrid compared to vm-ncc ( p < 2×10−7 ), exceeding
ing, such as T1w-PDw. Similarly, vm-ncc and vm-nmi the supervised models by up to 6.1 Dice points ( p < 0.009
achieve slightly lower accuracy on image pairs that are not for T1w-PDw, p < 3 × 10−15 for all other tasks). However,
skull-stripped. the increased contrast robustness comes at the expense of
While MSD can be more sensitive than Dice overlap at a drop of 0.5-1.9 Dice points within contrasts relative to
structure boundaries, our analysis of surface distances yields vm-ncc ( p < 0.0002), while sm-brains outperforms
a similar overall ranking between methods (Fig. 7b). Impor- hybrid by at least 2.4 points within ( p < 6 × 10−17 )
tantly, sm-brains achieves the lowest MSD for all contrasts, and 4.5 points across contrasts ( p < 10−5 for T1w-PDw,
typically 0.7 mm or less, which is below the voxel size. Within otherwise p < 10−23 ). We also investigate lower kernel widths
contrasts, sm-brains outperforms classical methods by at σ L < 64, but find these to negatively impact accuracy and
least 0.06 mm ( p < 2 × 10−9 ), surpassing all baselines tested therefore do not include them in the graph: reducing σ L intro-
across contrasts ( p < 0.04 for T1w-PDw, p < 10−10 for the duces noise in the image, indicating the importance of LUT
other tasks). smoothness.
Exposing the baseline models to a much larger space of Finally, the supervised networks sup-def and sup-vel
deformations at training does not result in a statistically achieve the lowest accuracy for within-contrast registration
significant increase of accuracy for T1w-to-T1w registration ( p < 0.02) and consistently under-perform their unsupervised
within OASIS (Fig. 8a). For vm-ncc-aug, accuracy across counterpart sm-brains by 6.8-10.7 points across all contrast
T1w datasets (OASIS-HCP, p < 0.007) and T2w-to-T2w combinations ( p < 0.0001 for T1w-PDw, p < 3 × 10−26
accuracy ( p < 0.03) decrease by 0.1 Dice point relative to for all other tasks). As for the main baseline comparison,
vm-ncc. For vm-ncc-7k, accuracy across T1w datasets measurements of the mean surface distance in Fig. 8b result
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 551

Fig. 8. Registration accuracy of method variations as (a) volume overlap D using the Dice metric, and (b) mean symmetric surface distance S
between label contours. Each box shows mean accuracy over anatomical structures for 50 test-image pairs across distinct subjects (8 for PD). The
letters b and x indicate skull-stripping and registration across datasets (e.g. OASIS-HCP), respectively. Arrows indicate values off the chart.

Fig. 9. Real MRI-contrast pairs used to assess network invariance. Top: Fig. 10. Representative features of the last network layer before the
we obtain FLASH images progressing from PDw (top left) to T1w for the stationary velocity field is formed, in response to evolving MRI contrasts
same brain by varying FA using the steady-state signal equation with from the same subject. Left: VoxelMorph using normalized mutual
acquired parametric maps (T1, T2∗ , PD). Bottom: we obtain MPRAGE information (NMI) exhibits high variability of the same feature response
contrasts with varying TI by fitting intensities based on a dual-echo across different input contrasts for the same brain, e.g. in the red box.
MP2RAGE scan (TI1 /TI2 700/2500 ms). For each of 10 subject pairs, Right: contrast-invariant SynthMorph (sm-brains). For this analysis,
we register a range of moving contrasts to a fixed T1w image. both networks use the same architecture with n = 64 filters per layer.

in a similar ranking between method variations, at comparable F. Experiment 2: Contrast Invariance


significance levels. In this experiment we evaluate registration accuracy as
In our experiments, learning-based models require less than a function of gradually varying MRI contrast and measure
1 second per 3D registration on an Nvidia Tesla V100 GPU. robustness to new image types by analyzing the variability of
Using the recommended settings, NiftyReg and ANTs typ- network features across these contrasts.
ically take ∼0.5 h and ∼1.2 h on a 3.3-GHz Intel Xeon CPU, 1) Setup: To assess network feature invariance to MRI
respectively, whereas deedsBCV requires ∼3 min. contrast, we perform the following procedure for 10 pairs
552 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

Fig. 11. Accuracy as a function of moving-image contrast across


10 realistic (a) FLASH and (b) MPRAGE image pairs. In each registration, Fig. 12. Feature variability for registration across (a) FLASH and
the fixed image has the same T1w contrast. The moving image becomes (b) MPRAGE contrasts from 10 distinct subject pairs. We use normalized
decreasingly T1w towards the right. Being comparable across methods, RMSD d between each contrast and the most T1w-like, averaged over
error bars are shown for ANTs only and indicate the standard error of the contrasts, features, subjects. All models use the same architecture with
mean over subjects. n = 64 filters per layer. SynthMorph variants exhibit the least variability
in the deeper layers (red boxes). Error bars show the standard error of
of separate subjects, where each subject is only considered the mean over features.
once and, thus, registered to a different fixed image. Given
each such pair, we run a separate registration between each
of the multi-FA contrasts for the moving subject and the
T1w pairs that do not overlap with the test set. First, we train
most T1w-like contrast (FA 40◦) of the fixed subject. For
with regularization weights λ ∈ [0, 10] and evaluate accuracy
each pair of subjects, we measure accuracy with all tested
across: (1) all brain labels and (2) only the largest 26 (bilateral)
methods as well as the variability of the features of the last
structures optimized in Ldis . Second, we train variants of
network layer, before the SVF is formed, across input pairs.
our model with varied deformation range bv , image smooth-
Specifically, we compute the root-mean-square difference d
ness b K , number of features n per layer (network width), bias-
(RMSD) between the layer outputs of the first and all other
field range b B , gamma-augmentation strength σγ and relative
contrast pairs over space, averaged over contrasts, features,
resolutions r . Third, for the case that brain segmentations are
and subjects. For efficiency, we restrict the moving images for
available (sm-brains), we analyze the effect of training with
this analysis to the subsets of FAs and TIs that undergo the
full-head labels, brain labels only, or a mixture of both. Unless
largest changes in contrast, i.e. FAs from 2 to 30◦ (4◦ steps)
indicated, we test all hyperparameters using n = 64 convolu-
and TIs from 300 to 600 ms (40-ms steps).
tional filters per layer. For comparability, both SynthMorph
2) Results: Fig. 11 compares registration accuracy as a func-
variants use SVFs {v m , v f } sampled at a single resolution rv .
tion of the moving-image MRI contrast for baseline methods
and SynthMorph. In both the multi-FA and the multi-TI 2) Results: Fig. 13 shows registration performance for var-
data, we obtain broadly comparable results for all methods ious training settings. Variant sm-brains performs best
when the moving and fixed image have T1w-like contrast. at low deformation strength bv , when label maps s from
However, the performance of ANTs, NiftyReg and learn- two different subjects are used at each iteration (Fig. 13a),
ing baselines decreases with increasing contrast differences, likely because the differences between distinct subjects already
whereas SynthMorph remains largely unaffected. provide significant variation. For {sm , s f }, a larger value of
Fig. 12 shows the variability of the response of each bv = 3 is optimal due to the lacking inter-subject deformation,
network layer to varying MRI contrast of the same anatomy since we generate {sm , s f } from a single segmentation s.
(shown in Fig. 9). Compared to VoxelMorph, the feature Random blurring of the images {m, f } improves robustness
variability within the deeper layers is significantly lower for to data with different smoothing levels, with optimal accuracy
the SynthMorph models. Fig. 10 illustrates this result, con- at b K ≈ 1 (Fig. 13b). Higher numbers of filters n per
taining example feature maps extracted from the last network convolutional layer boost the accuracy at the cost of increasing
layer before the SVF is formed. training times (Fig. 13c), indicating that richer networks better
Overall, SynthMorph models exhibit substantially less capture and generalize from synthesized data. We identify
variability in response to contrast changes than all other the optimum bias-field cap and gamma-augmentation SD as
methods tested, indicating that the proposed strategy does b B = 0.3 and σγ = 0.25, respectively. We obtain the highest
indeed encourage contrast invariance. accuracy when we sample the SVF and bias field at relative
resolutions rv = 1:16 and r B = 1:40, respectively (Fig. 13f).
Finally, training on full-head as compared to skull-stripped
G. Experiment 3: Hyperparameter Analyses images has little impact on accuracy (not shown).
1) Setup: We explore the effect of various hyperparameters Fig. 14a shows that with decreasing regularization, accuracy
on registration performance using 50 skull-stripped HCP-A increases for the large structures used in Ldis . When we
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 553

TABLE III
E FFECT OF 3D R EGISTRATION ON M EAN S YMMETRIC S URFACE D ISTANCE (MSD) B ETWEEN M ANUALLY D RAWN C ONTOURS OF E ND -S YSTOLIC
AND E ND -D IASTOLIC C ARDIAC MRI. T HE TABLE C OMPARES THE S Y N T H M O R P H ( S M -S H A P E S ) AND V O X E L M O R P H ( V M - N C C ) M ODELS
P ERFORMING B EST AT T HIS TASK D ESPITE O PTIMIZATION FOR B RAIN R EGISTRATION , W ITHOUT R ETRAINING . A R EDUCTION IN MSD
T RANSLATES TO B ETTER A LIGNMENT OF THE L EFT V ENTRICULAR S TRUCTURES . SD A BBREVIATES S TANDARD D EVIATION ,
AND W E H IGHLIGHT THE B EST R ESULT FOR E ACH S ET OF C ONTOURS IN B OLD

H. Experiment 4: Cine-Cardiac Application


In this experiment we test SynthMorph and
VoxelMorph on cine-cardiac MRI to assess how these
models transfer to a domain with substantially different
image content. The goal is to analyze whether already trained
models extend beyond neuroimaging, rather than claiming
their outperformance over methods specifically developed for
the task. We choose the dataset because the trained networks
assume affine registration of the input images, which can be
challenging in non-brain applications, whereas cardiac frames
from the same subject are largely aligned. This provides an
opportunity for testing registration of images with structured
background within contrast; we test cross-contrast registration
Fig. 13. Effect of training settings on median registration accuracy: in Section III-E and Section III-F.
(a) Maximum velocity-field SD bv . (b) Maximum image-smoothing SD bK . 1) Setup: Non-rigid registration of cardiac images from the
(c) Number of filters n per convolutional layer. (d) Maximum bias-field
SD bB . (e) Gamma-augmentation SD σγ . (f) Resolution r. Error bars are same subject is an important tool that can help assess car-
comparable across methods and indicate SD over subjects. diovascular health. Some approaches choose an end-diastolic
frame as the fixed image, as it is easily identified [95], [96].
Thus, we pair an end-systolic with an end-diastolic frame
for each of 33 subjects, corresponding to maximum cardiac
contraction and expansion. For 3D registration of these pairs,
we use already trained SynthMorph and VoxelMorph
models without optimizing for the new task.
2) Results: Table III compares the effect on mean sym-
Fig. 14. Regularization analysis. (a) Median accuracy. Error bars are metric surface distance for the best-performing SynthMorph
comparable across label sets and indicate SD over subjects. (b) Propor- (sm-shapes) and VoxelMorph (vm-ncc) models. Regis-
tion of voxels where the warp φ folds, i.e.  (Jφ )≤ for voxel Jacobian tration with sm-shapes reduces MSD between the epicardial
Jφ (0 for λ>1; of 4.9 × 106 voxels). (c) Average Jacobian determinant. contours by S/S = (11.6 ± 1.5)% on average, improving
For λ ≥ 1, the deviation from the ideal value 1 is less than 2 × 10−3 .
MSD for 85% of pairs (lower MSD is better). The mean
reduction for vm-ncc is only S/S = (4.3 ± 1.4)%. While
the pairs that do not improve appear visually unchanged,
include smaller structures, the mean overlap D reduces for MSD increases slightly: for example, the most substantial
λ < 1, as the network then focuses on optimizing the decrease for sm-shapes is 35.4%, but the least accurate
training structures. This does not apply to sm-shapes, which registration only results in a 3.9% increase. While the per-
is agnostic to anatomy since we train it on all synthetic formance gap between the models is smaller for endocardial
labels present in the random maps. Fig. 14b shows a small MSD, sm-shapes still outperforms vm-ncc. The mod-
proportion of locations where the warp field folds, decreasing els sm-brains and vm-nmi underperform sm-shapes
with increasing λ. For test results, we use λ = 1, where the and vm-ncc in terms of MSD, respectively. Fig. 15 shows
proportion of folding voxels is below 10−6 at our numerical exemplary cardiac frames before and after registration with
precision. At fixed λ = 1, increasing the number of integration sm-shapes along with the displacement fields, illustrating
steps reduces voxel folding, about 6-fold for 10 instead of 5 how SynthMorph leaves most anatomy intact while focusing
steps, after which further increases have no effect. on dilation of the heart to match its late-diastolic shape.
554 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

grid search of Fig. 13 illustrates the dependency of accu-


racy on hyperparameter values, SynthMorph performance is
robust over the ranges typical of medical imaging modalities,
e.g. smoothing kernels with SD σ K ∈ [0, 2].
We select SynthMorph hyperparameters for all experi-
ments based on the analysis of Fig. 13, using validation data
that do not overlap with the test sets. The chosen parameters
(Table I) enable robust registration across six different test
sets in Section III-E and over a landscape of continually
evolving MRI contrasts in Section III-F, demonstrating their
generalizability across datasets.

B. Baseline Comparison
Networks trained within the SynthMorph framework do
not have access to the MRI contrasts of the test set nor indeed
to any MRI data at all. Yet sm-shapes matches state-of-
the-art classical performance within contrasts and provides
substantial improvements in cross-contrast performance over
ANTs and NiftyReg, all while being substantially faster.
Registration accuracy varies with the particular contrast
pairings, likely because anatomical structures appear different
Fig. 15. Cine-cardiac registration results. Each row shows an image pair on images acquired with different MRI sequences. There is no
from a different subject: we register frames corresponding to maximum guarantee that a structure will have contrast with neighboring
cardiac contraction and expansion, respectively. Despite the thick slices
and more diverse image content than typical of neuroimaging data, structures and can be registered well to a scan of a partic-
sm-shapes clearly dilates the contracted anatomy as indicated by the ular MRI contrast (e.g. PDw). Nevertheless, SynthMorph
displacement fields in the rightmost column. outperforms both classical and learning-based methods across
contrasts, demonstrating that it can indeed register new image
types, to the extent permitted by the intrinsic contrast.
IV. D ISCUSSION If brain segmentations are available, including these in
We propose SynthMorph, a general strategy for learning the image synthesis enables the sm-brains network to
contrast-invariant registration that does not require any imag- outperform all methods tested by a substantial margin–at any
ing data during training. We remove the need for acquired data contrast combination tested–although this model still does not
by synthesizing images randomly from noise distributions. require any acquired MR images during training.
Visual inspection of typical deformation fields in Fig. 6 pro-
vides an interesting insight: the sm-brains network appears
A. Generalizability to learn to identify the structures of interest optimized in the
loss. Thus, it focuses on registering these brain regions and
A significant challenge in the deployment of neural net-
their close neighbors, while leaving the background and struc-
works is their generalizability to image types unseen dur-
tures such as the skull unaffected. This anatomical knowledge
ing training. Existing learning methods like VoxelMorph
enables registration of skull-stripped images to data including
achieve good registration performance but consistently fail
the full head. While the resulting deformations may appear
for new MRI contrasts at test time. For example, vm-ncc is
less regular than those estimated by classical methods, our
trained on T1w pairs and breaks down both across contrasts
analysis of the Jacobian determinant demonstrates comparable
(e.g. T1w-T2w) and within new contrasts (e.g. T2w-T2w). The
field regularity across methods.
SynthMorph strategy addresses this weakness and makes
networks resilient to contrast changes by exposing them to a
wide range of synthetic images, far beyond the shapes and C. Dice-Loss Sensitivity
contrasts typical of MRI. This approach obviates the need for When training on synthesized structures with arbitrary
retraining to register images acquired with a new sequence. geometry, the network learns to generally match shapes based
Training conventional VoxelMorph with a loss evalu- on contrast. The sm-shapes model does not learn to register
ated on T1w images while augmenting the input contrasts specific human anatomical structures or sub-structures since
enables the transfer of domain-specific specific knowledge we never expose it to specific neuroanatomy and instead
to cross-contrast registration tasks. However, the associated sample random shapes of all sizes during training. In the exper-
decrease in within-contrast performance indicates the benefit iment trained on brain anatomy, the model matches substruc-
of SynthMorph: learning to match anatomical features inde- tures within labels if they manifest contrast. If substructures
pendent of their appearance in the gray-scale images. are not discernible, the smooth regularization yields reasonable
The choice of optimum hyperparameters also is an impor- predictions. This can be observed with sm-brains for
tant problem for many deep learning applications. While the smaller structures that are not included in the dissimilarity
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 555

loss Ldis but for which we obtain competitive validation Dice MRI contrast and show that the deep layers of SynthMorph
scores, e.g. the 3rd and 4th ventricle. models exhibit a greater degree of invariance to contrast
changes than networks trained in a conventional fashion.
D. Supervised or Unsupervised? We present qualitative and quantitative analyses demonstrating
that the enhanced contrast invariance leads to highly robust
Since ground-truth deformation fields are available for registration across wide spectra of MR images simulated for
sm-shapes, we also train baseline models in a supervised two commonly used pulse sequences, FLASH and MPRAGE.
manner. This approach consistently under-performs its unsu-
pervised counterpart, for which we propose three possible
explanations. First, several different deformations can result in G. Cardiac Registration
the same warped brain, which has the potential to introduce The cine-cardiac experiment demonstrates the viability and
a level of ambiguity into the registration problem that makes potential of SynthMorph applied to a domain with sub-
it challenging to train a reliable predictor. Second, related to stantially different image content than neuroimaging. While
this point, image areas with little intensity variation such as we do not claim to outperform dedicated cardiac registration
the background or central parts of the white matter offer no methods, sm-shapes reduces the MSD metric between the
guidance for the supervised network to match the arbitrary fixed and moving frames in the majority of subjects, to a
ground-truth deformation, compared to unsupervised models, greater extent than any of the sm-brains, vm-ncc, and
that are driven by the regularization term in those areas. Third, vm-nmi models. The network achieves this result without
the synthesized transforms may not represent an exact identifi- any optimization for the anatomy or image type considered,
able mapping between the source and target image because of using weights obtained with generation hyperparameters tuned
errors introduced by nearest-neighbor interpolation of the input for isotropic 3D brain registration. In contrast, the cardiac
label maps and further augmentation steps including image data are volumes resampled from stacks of slices with thick-
blurring and additive noise. nesses exceeding the voxel dimension of our neuroimaging
test sets by 9-fold on average. Although sm-shapes is not
E. Further Work an optimized registration tool for cardiac MRI, its weights
While SynthMorph addresses important drawbacks of provide a great choice for initializing networks when training
within and between-contrast registration methods, it can be application-specific registration, since the model produces rea-
expanded in several ways. sonable results and is unbiased towards any particular anatomy.
First, we plan to extend our framework to incorporate affine
registration [47], [54], [55], [97]. We will explore whether the H. Domain-Specific Knowledge
simultaneous estimation of affine and deformable transforms
can improve accuracy and thoroughly investigate the appropri- The comparisons between sm-brains and sm-shapes
ateness of architectures for doing this in heterogeneous data. in neuroimaging datasets indicate that SynthMorph performs
In the current work, the input images {m, f } need prior affine substantially better when exploiting domain-specific knowl-
alignment for optimal results. Although this preprocessing step edge. For the cardiac application, this could be achieved in
is beyond the focus of our current contribution, the code we the following ways. First, if the amplitude of cardiac motion
make available includes an optimization-based affine solution, exceeds the deformations sampled during sm-shapes train-
thus providing full registration capabilities independent of ing, increasing hyperparameter bv will be beneficial. Second,
third-party tools. The optimization estimates 12 affine para- a lower regularization weight λ may be favorable for cardiac
meters for each new pair of 3D images in ∼10 seconds, with motion, which is characterized by considerable displacements
accuracy comparable to ANTs and NiftyReg. within a small portion of space. Third, anatomical segmen-
Second, our approach promises to be extensible to tations in fields other than neuroimaging often include fewer
unprocessed images acquired with any MRI sequence, of any different labels. To overcome this challenge and synthesize
body part, possibly even beyond medical imaging. While this images complex enough for networks to learn anatomy-
is an exciting area of research, the present work focuses on specific registration, these label maps could be augmented by
neuroimaging applications since the breadth of the analyses including arbitrary geometric shapes as diverse backgrounds.
required is beyond the scope of a single solid contribution. Qualitatively, our experience is that generation hyperpa-
Third, an obvious extension is to combine the simulation rameters represent a trade-off between (1) sampling from a
strategy with existing image data that might already be avail- distribution large enough to include the features of a target
able. We plan to investigate whether including real MRI scans dataset while promoting network robustness by exposure to
would aid, or instead bias the network and reduce its ability broad variability, and (2) ensuring that the network capacity is
to generalize to unseen contrast variations. adequate for capturing the sampled variation. As an alternative
to making domain-specific informed changes to the generation
hyperparameters and retraining networks, recent work suggests
F. Invariant Representations to optimize hyperparameter values efficiently at test time using
We investigate why the SynthMorph strategy enables sub- hypernetworks [98]. In addition to a registration pair, such
stantial improvements in registration performance. In particu- hypernetworks take as input a set of hyperparameters and
lar, we evaluate how accuracy responds to gradual changes in output the weights of a registration network, thus modeling a
556 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

continuum of registration networks each trained with different [2] M. Hoffmann, T. A. Carpenter, G. B. Williams, and S. J. Sawiak,
hyperparameter values. “A survey of patient motion in disorders of consciousness and opti-
mization of its retrospective correction,” Magn. Reson. Imag., vol. 33,
no. 3, pp. 346–350, Apr. 2015.
[3] J. Hajnal and D. Hill, Medical Image Registration (Biomedical Engi-
I. Data Requirements for Registration neering). Boca Raton, FL, USA: CRC Press, 2001.
The baseline comparison reveals that neither augmenting [4] B. Fischl, “Freesurfer,” NeuroImage, vol. 62, no. 2, pp. 774–781, 2012.
nor adding data in VoxelMorph training boosts performance. [5] R. S. Frackowiak, Human Brain Function. Amsterdam,
The Netherlands: Elsevier, 2004.
While counter-intuitive to intuitions about deep learning in [6] O. Puonti, J. E. Iglesias, and K. V. Leemput, “Fast and sequence-
classification tasks, this result is consistent with recent findings adaptive whole-brain segmentation using parametric Bayesian model-
confirming that large datasets are not necessary for tasks like ing,” NeuroImage, vol. 143, pp. 235–249, Dec. 2016.
[7] R. Sridharan et al., “Quantification and analysis of large mul-
deformable registration and segmentation, that have sizable timodal clinical image studies: Application to stroke,” in Multi-
input and output spaces [58], [59], [99]: in effect, every image modal Brain Image Analysis. Cham, Switzerland: Springer, 2013,
voxel can be thought of as a data sample, although these are, pp. 18–30.
of course, not independent. For example, reasonable segmen- [8] M. Goubran et al., “Multimodal image registration and connectivity
analysis for integration of connectomic data from microscopy to MRI,”
tation performance can be achieved with only a handful of Nature Commun., vol. 10, no. 1, pp. 1–17, Dec. 2019.
annotated images [58]. For registration, our analysis shows that [9] B. C. Lee, M. K. Lin, Y. Fu, J. Hata, M. I. Miller, and P. P. Mitra,
SynthMorph training with label maps from only 40 subjects “Multimodal cross-registration and quantification of metric distortions
in marmoset whole brain histology using diffeomorphic mappings,”
enables outperformance of all other methods tested. J. Comput. Neurol., vol. 529, no. 2, pp. 281–295, 2020.
We train the VoxelMorph baseline using images from [10] M. Lorenzi, N. Ayache, G. Frisoni, and X. Pennec, “LCC-Demons:
100 subjects, randomly flipping the axes of each input A robust and accurate symmetric diffeomorphic registration algorithm,”
NeuroImage, vol. 81, pp. 470–483, Nov. 2013.
pair, which already gives rise to 79,200 different cross-
[11] J. Ashburner, “A fast diffeomorphic image registration algorithm,”
subject image combinations. An analysis in the VoxelMorph Neuroimage, vol. 38, no. 1, pp. 113–195, 2007.
paper [17] comparing training sets of size 100 and 3231 with- [12] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee, “Symmetric
out randomly flipping axes provides further evidence that diffeomorphic image registration with cross-correlation: Evaluating
automated labeling of elderly and neurodegenerative brain,” Med.
larger datasets do not necessarily lead to significant perfor- Image Anal., vol. 12, no. 1, pp. 26–41, 2008.
mance gains. [13] M. Modat et al., “Fast free-form deformation using graphics processing
units,” Comput. Meth. Prog. Biomed., vol. 98, no. 3, pp. 278–284,
2010.
V. C ONCLUSION [14] K. Rohr, H. S. Stiehl, R. Sprengel, T. M. Buzug, J. Weese, and
M. Kuhn, “Landmark-based elastic registration using approximat-
Our study establishes the utility of training on synthetic data ing thin-plate splines,” IEEE Trans. Med. Imag., vol. 20, no. 6,
only and indicates a novel way of thinking about feature invari- pp. 526–534, Jun. 2001.
ance in the context of registration. SynthMorph enables [15] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. Hill, M. O. Leach, and
users to build on the strengths of deep learning, including D. J. Hawkes, “Nonrigid registration using free-form deformations:
Application to breast mr images,” IEEE Trans. Med. Imag., vol. 18,
rapid execution, increased robustness to local minima and no. 8, pp. 712–721, Aug. 1999.
outliers, and flexibility in the choice of loss functions, by now [16] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache, “Diffeomorphic
having the previously-missing ability to generalize to any MRI demons: Efficient non-parametric image registration,” NeuroImage,
vol. 45, no. 1, pp. S61–S72, Mar. 2009.
contrast at test time. This leads us to believe the strategy can [17] G. Balakrishnan, A. Zhao, M. Sabuncu, J. Guttag, and A. V. Dalca,
be broadly applied to networks to limit the need for training “Voxelmorph: A learning framework for deformable medical image
data while vastly improving applicability. registration,” IEEE Trans. Med. Imag., vol. 38, no. 8, pp. 1788–1800,
Feb. 2019.
[18] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, and
ACKNOWLEDGMENT I. Išgum, “End-to-end unsupervised deformable image registration
with a convolutional neural network,” in Deep Learning in Medical
The authors thank Danielle F. Pace for help with surface Image Analysis and Multimodal Learning for Clinical Decision. Cham,
distances. Data are provided in part by OASIS Cross-Sectional Switzerland: Springer, 2017, pp. 204–212.
(PIs D. Marcus, R. Buckner, J. Csernansky, J. Morris; [19] C. Guetter, C. Xu, F. Sauer, and J. Hornegger, “Learning based non-
rigid multi-modal image registration using kullback-leibler divergence,”
NIH grants P50 AG05681, P01 AG03991, P01 AG026276, in Medical Image Computing and Computer-Assisted Intervention.
R01 AG021910, P20 MH071616, U24 R021382). HCP-A: Berlin, Germany: Springer, 2005, pp. 255–262.
Research reported in this publication is supported by Grant [20] H. Li and Y. Fan, “Non-rigid image registration using fully convolu-
tional networks with deep self-supervision,” 2017, arXiv:1709.00799.
U01 AG052564 and Grant AG052564-S1 and by the 14 NIH [Online]. Available: http://arxiv.org/abs/1709.00799
Institutes and Centers that support the NIH Blueprint for [21] M. Rohé, M. Datar, T. Heimann, M. Sermesant, and X. Pennec,
Neuroscience Research, by the McDonnell Center for Systems “SVF-Net: Learning deformable image registration using shape match-
ing,” in Proc. Int. Conf. Medical Image Comput. Comput.-Assist.
Neuroscience at Washington University, by the Office of the Intervent., 2017, pp. 266–274.
Provost at Washington University, and by the University of [22] H. Sokooti, B. de Vos, F. Berendsen, B. P. Lelieveldt, I. Išgum,
Minnesota Medical School. and M. Staring, “Nonrigid image registration using multi-scale 3D
convolutional neural networks,” in Medical Image Computing and
Computer Assisted Intervention. Cham, Switzerland: Springer, 2017,
R EFERENCES pp. 232–239.
[23] G. Wu, M. Kim, Q. Wang, B. C. Munsell, and D. Shen, “Scalable
[1] D. W. McRobbie, E. A. Moore, M. J. Graves, and M. R. Prince, MRI high-performance image registration framework by unsupervised deep
From Picture to Proton. Cambridge, U.K.: Cambridge Univ. Press, feature representations learning,” IEEE Trans. Biomed. Eng., vol. 63,
2017. no. 7, pp. 1505–1516, Jul. 2016.
HOFFMANN et al.: SynthMorph: LEARNING CONTRAST-INVARIANT REGISTRATION 557

[24] X. Yang, R. Kwitt, M. Styner, and M. Niethammer, “Quicksilver: Fast [47] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring,
predictive image registration—A deep learning approach,” NeuroImage, and I. Išgum, “A deep learning framework for unsupervised affine
vol. 158, pp. 378–396, Jun. 2017. and deformable image registration,” Med. Image. Anal., vol. 52,
[25] M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes, “Computing large pp. 128–143, Feb. 2019.
deformation metric mappings via geodesic flows of diffeomorphisms,” [48] C. K. Guo, “Multi-modal image registration with unsupervised deep
Int. J. Comput. Vis., vol. 61, no. 2, pp. 139–157, 2005. learning,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Massa-
[26] A. Klein et al., “Evaluation of 14 nonlinear deformation algorithms chusetts Inst. Technol., Cambridge, MA, USA, 2019.
applied to human brain MRI registration,” NeuroImage, vol. 46, no. 3, [49] M. Chen, A. Carass, A. Jog, J. Lee, S. Roy, and J. L. Prince, “Cross
pp. 786–802, 2009. contrast multi-channel image registration using image synthesis for MR
[27] P. Viola and W. M. Wells III, “Alignment by maximization of brain images,” Med. Image Anal., vol. 36, pp. 2–14, Feb. 2017.
mutual information,” Int. J. Comput. Vis., vol. 24, no. 2, pp. 137–154, [50] S. Roy, A. Carass, A. Jog, J. L. Prince, and J. Lee, “MR to CT
1997. registration of brains using image synthesis,” Proc. SPIE Med. Imag.
[28] A. Roche, G. Malandain, X. Pennec, and N. Ayache, “The correlation Process., vol. 9034, Mar. 2014, Art. no. 903419.
ratio as a new similarity measure for multimodal image registration,” in [51] C. Bhushan, J. P. Haldar, S. Choi, A. A. Joshi, D. W. Shattuck, and
Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., 1998, R. M. Leahy, “Co-registration and distortion correction of diffusion
pp. 1115–1124. and anatomical images based on inverse contrast normalization,” Neu-
[29] J. E. Iglesias, E. Konukoglu, D. Zikic, B. Glocker, K. Van Leemput, roImage, vol. 115, pp. 269–280, Jul. 2015.
and B. Fischl, “Is synthesizing MRI contrast useful for inter-modality [52] C. Tanner, F. Ozdemir, R. Profanter, V. Vishnevsky, E. Konukoglu, and
analysis,” in Medical Image Computing and Computer-Assisted Inter- O. Goksel, “Generative adversarial networks for MR-CT deformable
vention. Berlin, Germany: Springer, 2013, pp. 631–638. image registration,” 2018, arXiv:1807.07349. [Online]. Available:
[30] Y. Xiao et al., “Evaluation of MRI to ultrasound registration methods https://arxiv.org/abs/1807.07349
for brain shift correction: The CuRIOUS2018 challenge,” IEEE Trans. [53] C. Qin, B. Shi, R. Liao, T. Mansi, D. Rueckert, and A. Kamen, “Unsu-
Med. Imag., vol. 39, no. 3, pp. 777–786, Mar. 2020. pervised deformable registration for multi-modal images via disen-
[31] B. van Ginneken, S. Kerkstra, and J. Meakin. (2019). Medical Image tangled representations,” in Proc. IPMI. Cham, Switzerland: Springer,
Computing and Computer-Assisted Intervention Curious. [Online]. 2019, pp. 249–261.
Available: https://curious2019.grand-challenge.org [54] Y. Hu et al., “Weakly-supervised convolutional neural networks for
[32] E. Haber and J. Modersitzki, “Intensity gradient based registration multimodal image registration,” Med. Image Anal., vol. 49, pp. 1–13,
and fusion of multi-modal images,” in Medical Image Computing Oct. 2018.
and Computer-Assisted Intervention. Berlin, Germany: Springer, 2006, [55] Y. Hu et al., “Label-driven weakly-supervised learning for multimodal
pp. 726–733. deformarle image registration,” in Proc. IEEE 15th Int. Symp. Biomed.
[33] M. P. Heinrich et al., “MIND: Modality independent neighbourhood Imag. (ISBI), Apr. 2018, pp. 1070–1074.
descriptor for multi-modal deformable registration,” Med. Image. Anal., [56] A. Hering, S. Kuckertz, S. Heldmann, and M. P. Heinrich, “Enhanc-
vol. 16, no. 7, pp. 1423–1435, 2012. ing label-driven deep deformable image registration with local dis-
tance metrics for state-of-the-art cardiac motion tracking,” in Bild-
[34] M. Mellor and M. Brady, “Phase mutual information as a similarity
verarbeitung Medizin. Wiesbaden, Germany: Springer Vieweg, 2019,
measure for registration,” Med. Image. Anal., vol. 9, no. 4, pp. 330–343,
pp. 309–314.
2005.
[57] L. Mansilla, D. H. Milone, and E. Ferrante, “Learning deformable
[35] C. Wachinger and N. Navab, “Entropy and Laplacian images: Struc-
registration of medical images with anatomical constraints,” Neural
tural representations for multi-modal registration,” Med. Image Anal.,
Netw., vol. 124, pp. 269–279, Oct. 2020.
vol. 16, no. 1, pp. 1–17, Jan. 2012.
[58] H. W. Lee, M. R. Sabuncu, and A. V. Dalca, “Few labeled atlases
[36] Z. Xu et al., “Evaluation of six registration methods for the human are necessary for deep-learning-based segmentation,” in Proc. Mach.
abdomen on clinically acquired CT,” IEEE Trans. Biomed. Eng., Learn. Health, 2019, pp. 1–9.
vol. 63, no. 8, pp. 1563–1572, Aug. 2016.
[59] K. Chaitanya, N. Karani, C. F. Baumgartner, A. Becker, O. Donati, and
[37] L. König and J. Rühaak, “A fast and accurate parallel algorithm for E. Konukoglu, “Semi-supervised and task-driven data augmentation,” in
non-linear image registration using normalized gradient fields,” in Proc. Proc. Int. Conf. Inf. Process. Med. Imag. Cham, Switzerland: Springer,
ISBI, 2014, pp. 580–583. 2019, pp. 29–41.
[38] L. Konig, A. Derksen, M. Hallmann, and N. Papenberg, “Parallel and [60] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolu-
memory efficient multimodal image registration for radiotherapy using tional networks for biomedical image segmentation,” CoRR,
normalized gradient fields,” in Proc. IEEE 12th Int. Symp. Biomed. vol. abs/1505.04597, pp. 1–8, May 2015.
Imag. (ISBI), Apr. 2015, pp. 734–738. [61] J. Xu et al., “Fetal pose estimation in volumetric MRI using a
[39] J. Rühaak et al., “A fully parallel algorithm for multimodal image 3D convolution neural network,” in Medical Image Computing and
registration using normalized gradient fields,” in Proc. ISBI, 2013, Computer-Assisted Intervention. Cham, Switzerland: Springer, 2019,
pp. 572–575. pp. 403–410.
[40] F. Kanavati et al., “Supervoxel classification forests for estimating pair- [62] A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V.
wise image correspondences,” Pattern Recognit., vol. 63, pp. 561–569, Dalca, “Data augmentation using learned transforms for one-shot
Mar. 2017. medical image segmentation,” CoRR, vol. abs/1902.09383, pp. 1–4,
[41] M. P. Heinrich, I. J. A. Simpson, B. W. Papież, M. Brady, and Feb. 2019.
J. A. Schnabel, “Deformable image registration by combining uncer- [63] K. Kamnitsas et al., “Unsupervised domain adaptation in brain
tainty estimates from supervoxel belief propagation,” Med. Image Anal., lesion segmentation with adversarial networks,” in Proc. IPMI, 2017,
vol. 27, pp. 57–71, Jun. 2016. pp. 597–609.
[42] K. A. J. Eppenhof and J. P. W. Pluim, “Pulmonary CT registration [64] B. Billot, D. Greve, K. Van Leemput, B. Fischl, J. E. Iglesias, and
through supervised learning with convolutional neural networks,” IEEE A. V. Dalca, “A learning strategy for contrast-agnostic mri segmen-
Trans. Med. Imag., vol. 38, no. 5, pp. 1097–1105, May 2019. tation,” in Proc. PMLR, vol. 121, Montreal, QC, Canada, Jul. 2020,
[43] J. Krebs et al., “Robust non-rigid registration through agent-based pp. 75–93.
action learning,” in Proc. Int. Conf. Med. Image Comput. Comput.- [65] M. Hoffmann, B. Billot, J. E. Iglesias, B. Fischl, and A. V. Dalca,
Assist. Intervent., 2017, pp. 344–352. “Learning MRI contrast-agnostic registration,” in Proc. ISBI, 2021,
[44] X. Yang, R. Kwitt, M. Styner, and M. Niethammer, “Fast predictive pp. 899–903.
multimodal image registration,” in Proc. IEEE 14th Int. Symp. Biomed. [66] A. V. Dalca. (2018). VoxelMorph: Learning-Based Image Registration.
Imag. (ISBI), Apr. 2017, pp. 48–57. [Online]. Available: https://voxelmorph.net
[45] A. V. Dalca, G. Balakrishnan, J. Guttag, and M. Sabuncu, “Unsu- [67] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional
pervised learning of probabilistic diffeomorphic registration for neural networks for, volumetric medical image segmentation,” in Proc.
images and surfaces,” Med. Image. Anal., vol. 57, pp. 226–236, 3DV, 2016, pp. 565–571.
Oct. 2019. [68] V. Arsigny, O. Commowick, X. Pennec, and N. Ayache, “A log-
[46] J. Krebs, H. Delingette, B. Mailhé, N. Ayache, and T. Mansi, “Learning euclidean framework for statistics on diffeomorphisms,” in Medical
a probabilistic model for diffeomorphic registration,” IEEE Trans. Med. Image Computing and Computer-Assisted Intervention. Berlin,
Imag., vol. 38, no. 9, pp. 2165–2176, Sep. 2019. Germany: Springer, 2006, pp. 924–931.
558 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022

[69] J. Ashburner and K. J. Friston, “Unified segmentation,” NeuroImage, [85] A. J. Holmes et al., “Brain genomics superstruct project initial data
vol. 26, pp. 839–851, Oct. 2005. release with structural, functional, and behavioral measures,” Sci. Data,
[70] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “Auto- vol. 2, no. 1, Dec. 2015, Art. no. 150031.
mated model-based tissue classification of MR images of the brain,” [86] R. B. Buxton, R. R. Edelman, B. R. Rosen, G. L. Wismer, and
IEEE Trans. Med. Imag., vol. 18, no. 10, pp. 897–908, Oct. 1999. T. J. Brady, “Contrast in rapid mr imaging: T1- and t2-weighted
[71] W. M. Wells, W. E. L. Grimson, R. Kikinis, and F. A. Jolesz, “Adaptive imaging,” J Comput Assist Tomogr, vol. 11, no. 1, pp. 7–16, 1987.
segmentation of MRI data,” IEEE Trans. Med. Imag., vol. 15, no. 4, [87] J. P. Marques, T. Kober, G. Krueger, W. van der Zwaag,
pp. 429–442, Aug. 1996. P.-F. Van de Moortele, and R. Gruetter, “MP2RAGE, a self bias-field
[72] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR images corrected sequence for improved segmentation and T1-mapping at high
through a hidden Markov random field model and the expectation- field,” NeuroImage, vol. 49, no. 2, pp. 1271–1281, 2010.
maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1, [88] B. Fischl et al., “Whole brain segmentation: Automated labeling of
pp. 45–57, Jan. 2001. neuroanatomical structures in the human brain,” Neuron, vol. 33, no. 3,
[73] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “A unify- pp. 341–355, 2002.
ing framework for partial volume segmentation of brain MR images,” [89] J. D. Van Horn et al., “The functional magnetic resonance imaging
IEEE Trans. Med. Imag., vol. 22, no. 1, pp. 105–119, Apr. 2003. data center (fMRIDC): The challenges and rewards of large–scale
[74] B. Belaroussi, J. Milles, S. Carme, Y. M. Zhu, and H. Benoit-Cattin, databasing of neuroimaging studies,” Phil. Trans. Roy. Soc. London.
“Intensity non-uniformity correction in MRI: Existing methods and Ser. B, Biol. Sci., vol. 356, no. 1412, pp. 1323–1339, Aug. 2001.
their validation,” Med. Image. Anal., vol. 10, no. 2, pp. 234–246, 2006. [90] A. Andreopoulos and J. K. Tsotsos, “Efficient and generalizable
[75] Z. Hou, “A review on MR image intensity inhomogeneity correction,” statistical models of shape and appearance for analysis of cardiac MRI,”
Int. J. Biomed. Imag., vol. 2006, pp. 1–11, Oct. 2006. Med. Image Anal., vol. 12, no. 3, pp. 335–357, 2008.
[76] M. Abadi et al., “Tensorflow: A system for large-scale machine [91] M. Reuter, H. D. Rosas, and B. Fischl, “Highly accurate inverse
learning,” in Proc. USENIX Symp. Oper. Syst. Design Implement., 2016, consistent registration: A robust approach,” NeuroImage, vol. 53, no. 4,
pp. 265–283. pp. 1181–1196, 2010.
[77] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- [92] D. Pustina and P. Cook. (2017). Anatomy of an Registration Call.
mization,” 2014, arXiv:1412.6980. [Online]. Available: http://arxiv. [Online]. Available: https://github.com/ANTsX/ANTs/wiki/Anatomy-
org/abs/1412.6980 of-an-antsRegistration-call
[78] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, [93] M. P. Heinrich, M. Jenkinson, M. Brady, and J. A. Schnabel, “MRF-
and R. L. Buckner, “Open Access Series Of Imaging Studies (OASIS): based deformable registration and ventilation estimation of lung CT,”
Cross-sectional MRI data in young, middle aged, nondemented, IEEE Trans. Med. Imag., vol. 32, no. 7, pp. 1239–1248, Jul. 2013.
and demented older adults,” J. Cogn. Neurosci., vol. 19, no. 9, [94] L. R. Dice, “Measures of the amount of ecologic association between
pp. 1498–1507, 2007. species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945.
[79] S. Y. Bookheimer et al., “The lifespan human connectome project [95] W. Shi et al., “A comprehensive cardiac motion estimation framework
in aging: An overview,” NeuroImage, vol. 185, pp. 335–348, using both untagged and 3-D tagged MR images based on nonrigid
Oct. 2019. registration,” IEEE Trans. Med. Imag., vol. 31, no. 6, pp. 1263–1275,
[80] M. P. Harms et al., “Extending the human connectome project across Jun. 2012.
ages: Imaging protocols for the lifespan development and aging [96] M. J. Ledesma-Carbayo et al., “Spatio-temporal nonrigid registration
projects,” NeuroImage, vol. 183, pp. 972–984, Dec. 2018. for ultrasound cardiac motion estimation,” IEEE Trans. Med. Imag.,
[81] A. J. W. van der Kouwe, T. Benner, D. H. Salat, and B. Fischl, “Brain vol. 24, no. 9, pp. 1113–1126, Sep. 2005.
morphometry with multiecho MPRAGE,” NeuroImage, vol. 40, no. 2, [97] E. Chee and Z. Wu, “AIRNet: Self-supervised affine registration for
pp. 559–569, Apr. 2008. 3D medical images using neural networks,” 2018, arXiv:1810.02583.
[82] J. P. Mugler, “Optimized three-dimensional fast-spin-echo MRI,” [Online]. Available: http://arxiv.org/abs/1810.02583
J. Magn. Reson. Imag., vol. 39, no. 4, pp. 745–767, Apr. 2014. [98] A. Hoopes, M. Hoffmann, B. Fischl, J. Guttag, and A. V. Dalca,
[83] D. B. Keator et al., “A national human neuroimaging collaboratory “Hypermorph: Amortized hyperparameter learning for image reg-
enabled by the biomedical informatics research network (BIRN),” istration,” in Proc. Int. Conf. Inf. Process. Med. Imag. Cham,
IEEE Trans. Inf. Technol. Biomed., vol. 12, no. 2, pp. 162–172, Switzerland: Springer, 2021, pp. 3–17.
Mar. 2008. [99] A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca,
[84] C. Sudlow et al., “UK biobank: An open access resource for identifying “Data augmentation using learned transformations for one-shot medical
the causes of a wide range of complex diseases of middle and old age,” image segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
PLOS Med., vol. 12, no. 3, Mar. 2015, Art. no. e1001779. Recognit., Oct. 2019, pp. 8543–8553.

You might also like