Sulcal Pattern Matching with the Wasserstein Distance
Abstract
We present the unified computational framework for modeling the sulcal patterns of human brain obtained from the magnetic resonance images. The Wasserstein distance is used to align the sulcal patterns nonlinearly. These patterns are topologically different across subjects making the pattern matching a challenge. We work out the mathematical details and develop the gradient descent algorithms for estimating the deformation field. We further quantify the image registration performance. This method is applied in identifying the differences between male and female sulcal patterns.
1 Introduction
The concave regions in the highly convoluted cerebral cortex of the human brain are referred to as the sulci (Fig. 1). These complex tree-shaped sulcal curves are highly variable in length, area, depth, curvature and topology across different subjects [2]. There have been extensive studies that connect the variabilities of such biomarkers with the differences in cognitive or pathological characteristics between populations [10]. However, since each subject have different topological patterns, it is difficult to match the sulcal patterns across subjects [9]. One approach of reducing the difficulty of matching is to smooth the sulcal patterns and then match the smoothed patterns. Such smoothed representations enable pattern matching in a continuous probabilistic fashion. Thus, we propose to use the Wasserstein distance, which minimizes the optimal transport cost of moving between probability distributions.
The Wasserstein distances have been previously applied in various imaging applications. [21] computed the Wasserstein distance using the hyperbolic metric for cortical brain morphometry. [15] proposed the sliced Wasserstein distance in speeding up the computation in pattern recognition. [19] derived a cycle-consistent generative adversarial network based on the optimal transport for MRI reconstruction. Despite the advance of various algorithms for Wasserstein distance based pattern matching [3, 15], the rational of involving smoothing in such algorithms is unclear. In this paper, we work out the mathematical and implementation details on the Wasserstein distance on heat kernel smoothing, the kernel version of diffusion. In the experiment, we demonstrated the Wasserstein distance based registration reduces the variability of sulcal patterns across subjects.
2 Methods
2.1 Sulcal pattern data
We used the processed T1-weighted MRI of 456 subjects (age-matched 274 females and 182 males) in the Human Connectome Project (HCP) [23]. The MRI were obtained using a Siemens 3T Connectome Skyra scanner with a 32-channel head coil [8, 22]. The MRI were registered to the MNI space with a FLIRT affine and FNIRT nonlinear registration [12]. FreeSurfer’s recon-all pipeline was used on the distortion- and bias-corrected MRI [6] that includes the segmentation of white matter and pial surfaces as well as FreeSurfer’s folding-based surface registration to the template. TRACE algorithm was used for automatic sulcal curve extraction from surface meshes [9, 17] (Fig. 1) and then projected on the unit sphere, which is further parameterized by spherical angles . Since the data is spherical, it is periodic over . Fig. 1 displays the sulcal curves of two different subjects on , where sulcal curves are assigned value (colored red) while all other parts of the brain are assigned value 0.
2.2 Heat kernel smoothing of sulcal patterns
The sulcal pattern is smoothed with heat kernel to reduce high frequency noise. The smoothing increases the signal-to-noise ratio (SNR) and increases the statistical power in the population study [9]. We model the sulcal pattern as
where is the underlying true signal and is the noise. For observed data, we assign value 1 to the sulcal curves and 0 otherwise (Fig. 1). We estimate by smoothing with isotropic heat kernel. The smoothed estimate is given by the solution of the isotropic diffusion
with the initial condition and the boundary conditions
The amount of smoothing is determined empirically. The solution is given as the weighted Fourier series [4]:
| (1) |
The eigenvalues and the eigenfunctions corresponding to Laplacian are
where if and 0 otherwise. The coefficients and are estimated in the least square fashion using the initial data [4]. The smoothing results are shown in Fig. 1 with diffusion time .
The smooth representation enables alignment of different sulcal patterns on a common grid. For each subject, we first obtain the Fourier coefficients and via smoothing. The value on the grid point is then given by in the expansion (1).
2.3 Wasserstein distance on heat kernel smoothing
Let and be the empirical distribution on the two sets of scatter points that define the vertices of sulcal curves and :
| (2) |
After algebraic derivations involving Choquest’s and Birkhoff’s theorems [20], the 2-Wasserstein distance between the empirical distributions is given as the Monge formulation
| (3) |
where is the permutation group of order .
The diffused sulcal pattern can be written as the kernel convolution on initial condition as . with heat kernel [4]. In , the heat kernel is simply Gaussian kernel
The heat kernel smoothing on sulcal patterns and is given by (Fig. 1 Bottom)
| (4) |
Let be the set of all possible joint density functions corresponding to and . The Wasserstein distance between diffused sulcal pattern can be written as [7]
| (5) |
If we restrict the joint distribution to be a linear combination of multivariate normals, we can compute the expression (5) exactly. We will denote the Wasserstein distance with such a restriction as . Then we have the following equivalence.
Theorem 1
For heat kernel smoothing and , we have
| (6) |
Proof. We provide the sketch of proof. The joint density in (5) can be expressed as the mixture of multivariate normals as where the mixing proportions are doubly stochastic and satisfies . The multivariate normals follow
where , , is the identity matrix and is the positive semi-definite covariance matrix. Then after lengthy derivations, we can obtain
Subsequently, it suffices to minimize each term in separately. Minimizing the first term leads to the same result as (3) while the second term results in zero [7].
Since , we have
indicating that heat kernel smoothing reduces the Wasserstein distance between the sulcal patterns and their variabilities.
2.4 Gradient descent on the dual formulation
Given two smooth patterns and , the computation of the Wasserstein distance involves mapping a point from the first pattern to the corresponding point in the second pattern (Fig. 2). The deformation is required to have the smallest possible total displacement and satisfies
This leads to solving the optimization problem
| (7) |
which can be further reduced to minimizing
| (8) |
with constraint [3].
Theorem 2
The solution is given implicitly. However, the derivative of (8) can be explicitly found.
Theorem 3
[3] Suppose is the convex conjugate of for some and is Lipschitz. Then it follows is convex, Lipschitz and
where is he Hessian matrix of .
Subsequently, we solve for gradient descent on as [3]:
| (9) |
The overall computational complexity for the algorithm is [14], which is more scalable than the Hungarian algorithm with [18] and fluid mechanics with [1]. In this study, rectangular grid with vertices were used. The grid values serve as the input for the gradient descent. We display the result of the gradient descent on estimating the displacement field as arrows in Fig. 2. Due to the convexity of , the algorithm is expected to recover the global optimum (Remark 3.3 in [3]).
3 Experiments
3.1 Validation against the Hungarian Algorithm
We matched the two sets of random generated scatter points in . Fig.3 displays the result of one realization with . The Wasserstein distance can be computed exactly through (3) using the Hungarian Algorithm [18]. We then applied heat kernel kernel smoothing with bandwidth on the scatter points and compute the Wasserstein distance using the gradient descent (9). The average percentage reduction of distance for 100 independent simulations for are , and respectively. This is consistent with our theoretical result in section 2.3. The computer code for performing Wasserstein distance based matching is provided in https://github.com/laplcebeltrami/sulcaltree.
3.2 Reduction of image registration variability
The sulcal pattern has been already nonlinearly aligned using FreeSurfer’s standard folding-based surface registration to the surface atlas [6]. Wasserstein distance based matching significantly reduces the sulcal pattern variability further from the FreeSurfer alignment. We first compute the average of smoothed sulcal pattern maps on smoothed estimates (1). This serves as the reference template pattern for subsequent registration. We then align each subject to the average pattern using gradient descent (9). We initialize with so that initial mapping becomes the identity. Such initialization is data-free and has demonstrated efficacy in various applications including facial [14] and brain images [3]. We set as the maximum number of iterations at 200. The average runtime for one subject is approximately 8 seconds on a desktop computer.
For deformation for subject , the deformed sulcal patterns are obtained as . The statistical variability before registration is computed as the sample variance of sulcal patterns at each point across subjects (Fig. 4). The variability is significantly reduced after deformation. The mean variability over all the points in the brain is 0.0021 for the smoothed original pattern while it is , the significant reduction of variability by .
3.3 Sexual dimorphism
The method is subsequently used to determine the sulcal pattern differences between 240 females and 172 males. The two sample -statistics were constructed on the deformed smoothed sulcal patterns (Fig. 5). The threshold -statistic value greater than or smaller than correspond to the -value of 0.05 after multiple comparisons correction through the permutation test (half millions) for the deformed pattern [5]. We obtained numerous significant differences all over the brain regions including the temporal lobe, which is responsible for sensory processing and known for sex difference [9]. The negative -statistics in most of the brain regions indicates the presence of less sulci for females. This seems to be related to findings in [16], where females have larger gray matter volumes in a number of areas including the left temporal gyrus. In [11], cortical thickening is found extensively in the left superior parietal gyrus and postcentral gyrus, which is consistent with less sulci in our study.
4 Discussion
In this paper, we presented the new framework of matching sulcal patterns of human brain across subjects. We provided the theoretical justification for performing heat kernel smoothing before computing the Wasserstein distance. Smoothing reduces the Wasserstein distance between the sulcal patterns and spatial pattern variabilities. It is also possible to further refine the registration performance via the multi-resolution on heat kernel smoothing [3] or iteratively using the average of the deformed sulcal patterns as the template in the next iteration. These are left as future studies.
The FreeSurfer uses the irregular triangle meshes while our smoothing and the gradient descent uses the regular pixel grid. This may introduce potential interpolation artifacts. A better approach would be to perform smoothing and the gradient descent on triangle meshes. This is left as a future study.
5 Acknowledgement
We would like to thank Soheil Kolouri of Vanderbilt University and Tahmineh Azizi of University of Wisconsin-Madison for discussion on the Wasserstein distance, and Ilwoo Lyu of Ulsan National Institute of Science and Technology for assistant with the TRACE algorithm.
References
- [1] (2000) A computational fluid mechanics solution to the monge-kantorovich mass transfer problem. Numerische Mathematik 84 (3), pp. 375–393. External Links: Document, ISBN 0945-3245, Link Cited by: §2.4.
- [2] (2003) A primal sketch of the cortex mean curvature: a morphogenesis based approach to study the variability of the folding patterns. IEEE Trans. Med. Imag. 22 (6), pp. 754–765. Cited by: §1.
- [3] (2009-01) A gradient descent solution to the monge-kantorovich problem. Applied Mathematical Sciences 3, pp. 1071–1080. Cited by: §1, §2.4, §2.4, §2.4, §3.2, §4, Theorem 3.
- [4] (2007) Weighted fourier series representation and its application to quantifying the amount of gray matter. IEEE Transactions on Medical Imaging 26 (4), pp. 566–581. External Links: Document Cited by: §2.2, §2.2, §2.3.
- [5] (2019) Rapid acceleration of the permutation test via transpositions. International Workshop on Connectomics in Neuroimaging 11848, pp. 42–53. Cited by: §3.3.
- [6] (2007) Cortical folding patterns and predicting cytoarchitecture. Cerebral Cortex 18 (8), pp. 1973–1980. Cited by: §2.1, §3.2.
- [7] (1984) A class of wasserstein metrics for probability distributions.. Michigan Mathematical Journal 31 (2), pp. 231–240. Cited by: §2.3, §2.3.
- [8] (2013) The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, pp. 105–124. Cited by: §2.1.
- [9] (2020) Fast polynomial approximation of heat kernel convolution on manifolds and its application to brain sulcal and gyral graph pattern analysis. IEEE Transactions on Medical Imaging 39, pp. 2201–2212. Cited by: §1, §2.1, §2.2, §3.3.
- [10] (2019) Sulcal pits and patterns in developing human brains. NeuroImage 185, pp. 881–890. Cited by: §1.
- [11] (2006) Gender difference analysis of cortical thickness in healthy young adults with surface-based methods. NeuroImage 31, pp. 31–38. Cited by: §3.3.
- [12] (2002) Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage 17 (2), pp. 825–841. Cited by: §2.1.
- [13] (1984) On the optimal mapping of distributions. Journal of Optimization Theory and Applications 43 (1), pp. 39–49. External Links: ISBN 1573-2878 Cited by: Theorem 2.
- [14] (2017) Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Processing Magazine 34 (4), pp. 43–59. External Links: Document Cited by: §2.4, §3.2.
- [15] (2018) Sliced wasserstein distance for learning gaussian mixture models. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 3427–3436. External Links: Document Cited by: §1.
- [16] (2009) Why sex matters: brain size independent differences in gray matter distributions between men and women. Journal of Neuroscience 29, pp. 14265–14270. Cited by: §3.3.
- [17] (2018) TRACE: a topological graph representation for automatic sulcal curve extraction. IEEE Trans. Med. Imag. 37 (7), pp. 1653–1663. Cited by: §2.1.
- [18] (1957) Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5 (1), pp. 32–38. External Links: ISSN 03684245, Link Cited by: §2.4, §3.1.
- [19] (2020) Unpaired deep learning for accelerated mri using optimal transport driven cyclegan. IEEE Transactions on Computational Imaging 6 (), pp. 1285–1296. External Links: Document Cited by: §1.
- [20] (2019) Computational optimal transport. Foundations and Trends in Machine Learning 11 (5-6), pp. 355–607. Cited by: §2.3.
- [21] (2016) Shape analysis with hyperbolic wasserstein distance. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 5051–5061. External Links: Document Cited by: §1.
- [22] (2013) Resting-state fMRI in the Human Connectome Project. NeuroImage 80, pp. 144–168. Cited by: §2.1.
- [23] (2012) The human connectome project: a data acquisition perspective. NeuroImage 62, pp. 2222–2231. Cited by: §2.1.