License: arXiv.org perpetual non-exclusive license
arXiv:2604.23755v1 [stat.AP] 26 Apr 2026

Sparse Reduced-rank Regression Methods for Spatially Misaligned Data with Application to Spatial Transcriptomics

Zitian Wu
Department of Biostatistics, University of Florida
   Susmita Datta
Department of Biostatistics, University of Florida
   Arkaprava Roy
Department of Biostatistics, University of Florida
Abstract

Understanding the spatiotemporal dynamics of disease progression in relation to transcriptomic profiles provides key insights into complex conditions such as Alzheimer’s disease. To enable such investigations, STARmap PLUS technology offers joint profiling of high-resolution spatial transcriptomics and protein detection within the same tissue section. Motivated by data from Zeng et al. (2023), we develop a novel kernel-weighted regression framework that models plaque size as a collective effect of the spatial transcriptomics of neighboring cells, automatically integrating across cell types and tissue samples from different disease states. To further strengthen interpretability and efficiency, we incorporate a sparse low-rank factorization that enables gene selection while borrowing strength across genes, cell types, and time points. The proposed approach is implemented in a fully automated manner with data-driven specification of key model components. Through simulation studies, we demonstrate the robustness of the proposed method and its superiority across a range of specification scenarios. Applied to Alzheimer’s data, the proposed framework uncovers biologically meaningful associations, highlighting its potential for advancing the understanding of disease mechanisms.

footnotetext: Equal Contribution

Keywords: Cellular microenvironment; Kernel-weighted regression; Sparse Low-rank factorization; STARmap PLUS

1 Introduction

Alzheimer’s disease (AD) comprises interacting pathologies that vary across space, cell types, and time (De Strooper and Karran, 2016; Jack Jr et al., 2018, 2024). Amyloid-β\beta plaques and hyperphosphorylated tau accumulate within a dynamic microenvironment shaped by microglia, astrocytes, oligodendrocyte-lineage cells, and neurons (Long and Holtzman, 2019; Hampel et al., 2021). Spatially resolved studies in mouse and human reveal reproducible plaque-adjacent niches and layer-specific molecular patterns, linking microglial/astrocytic remodeling and tau-aligned neuronal changes to local pathology (Chen et al., 2020, 2022; Zeng et al., 2023; Mallach et al., 2024; Miyoshi et al., 2024). Motivated by these spatially organized niches, we adopt a lesion-centric formulation: we treat plaque centers as outcome locations and quantify how nearby cellular gene expression relates to the plaque size (mean radius).

Traditional transcriptomic designs struggle to capture this heterogeneity. Bulk RNA-seq averages across mixed cell types and loses cellular heterogeneity and the spatial context of the cells, while single-cell RNA-seq restores cellular resolution but still discards in-tissue organization. Spatially resolved transcriptomics (SRT) maps gene expression to spatial coordinates via two broad classes: (i) barcoded capture arrays (Spatial Transcriptomics/Visium) that provide transcriptome‐wide profiles at multi-cell spot resolution (Ståhl et al., 2016; Vickovic et al., 2019; Maynard et al., 2021); and (ii) imaging-based assays (e.g., MERFISH(Chen et al., 2015), seqFISH(Shah et al., 2016), STARmap(Wang et al., 2018)) that achieve single-cell or subcellular resolution with targeted gene panels.

Among imaging-based SRT, STARmap PLUS (Zeng et al., 2023) co-registers targeted RNAs with amyloid-β\beta and p-tau immunostains within the same tissue sample, yielding single-cell expression aligned to plaque segmentations and centers, and enabling direct linkage between molecular readouts and neuropathology in situ. Each tissue section contains many plaques and thousands of cells with known coordinates; plaque size varies markedly within a section, and this variability is embedded in distinct cellular neighborhoods defined by which cells lie nearby and what genes they express. This provides exactly the data structure needed for a plaque-centric analysis.

Beyond Alzheimer’s disease, co-registering histopathology with SRT enables tumor-region delineation, quantification of tissue architecture and burden, and pathology-informed disease modeling (Ni et al., 2022; Arora et al., 2023; Bassiouni et al., 2023; Xun et al., 2023; Jin et al., 2024). Related statistical questions also arise in environmental applications (Alexeeff et al., 2016). This breadth underscores the generality and translational relevance of our framework for modeling disease-related spatial outcomes from spatial molecular measurements.

The primary statistical challenges are: (1) spatial misalignment between plaque locations and SRT measurement locations, (2) the high dimensionality of SRT data, (3) effect heterogeneity across collection times and cell types, i.e., regression coefficients linking plaque size to neighborhood features vary by time point and cell type, with gene-level coefficients exhibiting shared latent structure, (4) the need for a spatially structured characterization of the collective influence of different neighboring cells on plaque size, and (5) between-sample heterogeneity in plaque-adjacent microenvironments (e.g., shifts in local cell-type composition and activation states).

For regressing spatially varying outcomes on predictors from nearby locations, spatial measurement error models have been proposed, such as two-stage correction approaches (Szpiro and Paciorek, 2013), and spatial simulation extrapolation methods (Alexeeff et al., 2016). Local/kernel regressions and geographically weighted regression (GWR) model local heterogeneity by weighting nearby observations more heavily; multiscale GWR further allows each predictor have its own spatial scale parameter. However, they typically treat constructed predictors as error-free and do not scale or pool information effectively in high-dimensional, structured SRT settings (Nadaraya, 1964; Fotheringham et al., 2017; Fan, 2018; Oshan et al., 2019). Change-of-support and downscaler models propagate aggregation/registration uncertainty, yet classical formulations remain low-dimensional and do not accommodate high-dimensional features or latent commonalities across groups (Gotway and Young, 2002; Berrocal et al., 2010). Together, these approaches address parts of the problem but do not provide a unified, high-dimensional regression that learns predictor-specific spatial scales while quantifying the collective impact of neighboring cells. This gap motivates our framework.

In an asynchronous longitudinal regression setting, Cao et al. (2015); Li et al. (2022, 2023) proposed a kernel-weighted approach to handle mismatched observation times between the response and covariate within a given subject. We follow a similar direction by introducing a kernel function to borrow information from all possible pairs of outcomes and the predictors in the misaligned spatial setting. This particularly helps to characterize the collective effect of different cells from varying cell types on the plaque size. An isotropic proximity kernel with sample-specific bandwidths aggregates local signals, accommodating section-to-section enrichment and sampling differences without overfitting directionality. Gene effects are parameterized by a low-rank coefficient tensor over gene ×\times cell type ×\times time, together with gene-mode regularization to capture shared structure and heterogeneity across samples and time points under high dimensionality. To achieve interpretable gene-level discovery, we apply a LASSO penalty only on the gene mode of this factorization: genes with weak signal are shrunk toward zero, while the associated cell-type and time patterns for the retained genes remain flexible. All tuning parameters (e.g. kernel bandwidths, rank, and penalty levels) are selected in data-driven manner; details are provided in Section 3.1. The resulting framework yields interpretable component-level profiles and signed effects, where the sign of the estimated regression coefficient indicates direction (positive = larger plaque size; negative = smaller) across cell types and time points.

Applied to STARmap PLUS AD sections, the model identifies 13-month multigene expression patterns dominated by a Microglia–Oligodendrocyte axis: a microglial Apoe/Tyrobp module and an oligodendrocyte-lineage (OPC-like) pattern are positively associated with plaque size, whereas an astrocytic transport-related signature is negatively associated with plaque size (see Section 5).

We first describe the STARmap PLUS data and exploratory analyses motivating a plaque-centric estimand and isotropic neighborhood kernel. We then develop the kernel-weighted regression with sample-specific bandwidths and low-rank, gene-mode selection), assess performance in spatially realistic simulations, analyze AD sections, and close with limitations and extensions.

2 Data and Exploratory Analysis

2.1 Data

In this study, we use the single-cell spatial transcriptomics data integrated with amyloid-β\beta plaque histopathology from a mouse model of Alzheimer’s disease (AD). Specifically, data were obtained from the brains of TauPS2APP transgenic mice at two critical stages of disease progression: 8 months (8 mo) and 13 months (13 mo) of age, providing a temporal snapshot of AD pathology and cellular heterogeneity. Each of the four samples analyzed includes comprehensive gene expression data at single-cell resolution from all identified cells, alongside spatial localization of extracellular amyloid-β\beta plaques.

The SRT data employed the STARmap PLUS method (Zeng et al., 2023), which simultaneously captures spatially resolved single-cell expression and protein-level histopathological markers within the same tissue section at subcellular resolution. In this study, STARmap PLUS profiled 2,766 genes while simultaneously labeling extracellular amyloid-β\beta plaques (X-34 dye) and hyperphosphorylated tau (AT8 immunostaining), enabling precise co-registration of cell bodies, mRNA, and pathology. Although both amyloid-β\beta and tau signals are available, the present study focuses exclusively on amyloid-β\beta plaques. Voxel size was 95×95×35095\times 95\times 350 nm with 566456-64 fields of view per sample. The method preserves morphology and allows joint RNA-protein mapping at subcellular scales, unlike chip-based whole-transcriptome approaches with coarser pixels (100 µm100\text{\,}\mathrm{\SIUnitSymbolMicro m} pixel size) (Chen et al., 2020).

In the original dataset, STARmap PLUS yielded a high-resolution spatial atlas across cortex and hippocampus and identified 1313 major cell types over 7272k cells. Our analysis uses the diseased subset to focus specifically on plaque-proximal biology. Two features make this dataset uniquely informative for plaque-centric analyses: (i) exact spatial location of cells, gene expression, and plaque pathology within the same section, and (ii) clear distance-dependent patterns in cellular composition around plaques.

Refer to caption
Figure 1: Co-registered spatial maps from two distinct tissue sections collected at two time points: (a) 8 months and (b) 13 months. Cells are shown as gray points, and amyloid plaques are overlaid as red filled circles with radii encoding plaque sizes. Gene expression is measured for each cell at its observed spatial coordinate.

From Zeng et al. (2023), we obtain the plaque size data. In that, they first identified the plaques’ centers and then computed their mean radii. Fig 1 shows co-registered spatial maps of two mouse brain tissue sections collected at (a) 8 months and (b) 13 months. Gray points denote the spatial locations of all cells, and amyloid plaques are shown as red filled circles with circle radius equal to plaque sizes in µm\mathrm{\SIUnitSymbolMicro m}.

2.2 Exploring the Spatial Variability

To characterize plaque-adjacent microenvironments, we re-examined plaque-centric organization using concentric rings around plaques (0–30, 30–70, 70–150 µm) in Figs. 2 and 3. Motivated by the concentric-ring analyses of Zeng et al. (2023), who defined distances with respect to plaque boundaries, we use the Euclidean distances from the cell centroids to the nearest plaque center 𝐔{\mathbf{U}} in our analysis to define the plaque microenvironment.

We focus on three cell-types: Microglia, Astrocytes, and Oligodendrocyte-lineage cells, because both our exploratory analyses (Figs. 2 and 3) and prior work by Zeng et al. (2023) revealed robust, plaque-centric organization for these types with sufficient counts for stable stratification. Microglia are enriched right at plaques center with “core-shell” (as defined by Zeng et al. (2023) concerning the nearest concentric-ring) architecture closely contacting amyloid-β\beta plaques (\qtyrange[range-phrase = –, range-units = single]030\micro from the plaque center), while Astrocytes and Oligodendrocytes are enriched in surrounding shells (\qtyrange[range-phrase = –, range-units = single]30150\micro from the plaque center). Across sections and between 8 and 13 months, enrichment magnitudes generally increase, while beyond 150 µm no systematic enrichment is detected.

Refer to caption
Figure 2: Plaque-centric distance maps. Shaded annuli show Euclidean distance bands from the nearest plaque center: 0–30, 30–70, 70–150, and \geq 150 µm. Points are cell locations colored by three cell types (legend). Each cell is placed in the ring at which it falls; these maps define the radial stratification used in downstream summaries. The two tissue sections are slightly different in dimension.
Refer to caption
Figure 3: Cell density by distance to the nearest plaque. Bars show mean cell density (cells/mm2) within each radial band (0–30, 30–70, 70–150, \geq150 µm150\text{\,}\mathrm{\SIUnitSymbolMicro m}) and overall, for Microglia, Astrocyte, and Oligodendrocyte-lineage cells. Densities are computed per sample and averaged across samples. Enrichment near plaques is most pronounced for Microglia in \qtyrange[range-phrase = –, range-units = single]030\micro and for Astrocyte/Oligodendrocyte-lineage cells in \qtyrange[range-phrase = –, range-units = single]30150\micro.

Guided by these spatial cell enrichments, we next construct bulk-like plaque-level summaries of near-plaque gene expression stratified by plaque size for descriptive visualization, to assess whether near-plaque expression varies with plaque size in Fig 4. The nearest plaque for each cell is identified by sorting Euclidean distances from the cell centroid to different plaque centers. A cell is considered near-plaque if its distance to the nearest plaque center is \leq 150 µm. For visualization only, plaque size is dichotomized based on the global median radius, namely Small: plaques with radii \leq median; Large: plaques with radii >> median. For each gene, heatmap values in Fig 4 are computed via two-stage averaging: first, we compute Plaque-level (bulk-like) mean for each plaque as the average gene expression over all near-plaque cells, yielding one value per (plaque, gene); and subsequently, we compute the average of plaque-level means for each time (8 mo vs 13 mo), plaque-size (Small vs Large). This finally gets us one value per time, size, and gene, which forms the heatmap entries.

Fig 4 tells us: (i) the difference-in-differences for each gene expression, showing how the 8\rightarrow13 mo change near Large plaques compares with the change near Small plaques (positive bars = larger increase or smaller decrease near Large plaques), (ii) a heatmap of the per-size change from 8 mo to 13 mo shown separately for Small and Large, and (iii-iv) heatmaps of expression at 8 mo and 13 mo (columns = Small vs Large). Row centering (within time) is applied only to the expression heatmaps (iii–iv); the bar plot and the change heatmap show raw contrasts. Across the 30 most varying genes over time, these summaries reveal size-associated differences in near-plaque expression and motivate our regression problem below, where plaque radius is regressed on the local gene expression levels.

Refer to caption
Figure 4: Near-plaque gene expression by plaque size. Left: bar plot of ΔLargeΔSmall\Delta_{\text{Large}}-\Delta_{\text{Small}} where Δ\Delta denotes the 13 mo minus 8 mo change in near-plaque expression within each plaque-size group (positive indicates a larger temporal increase near Large plaques). Middle-left: heatmap of Δ\Delta shown separately for Small and Large plaques. Middle-right / right: heatmaps of near-plaque expression at 8 months and 13 months, respectively (columns: Small vs Large). Rows are the 30 genes with the largest temporal variation. Row centering (within time) is applied only to the expression heatmaps; the bar plot and change heatmap show raw contrasts.

3 Model

We propose a novel kernel-weighted regression framework to analyze the effect of spatial transcriptomic profiles on amyloid-β\beta plaque deposition. For clarity, we first introduce some notations that are used throughout the article.

Let nn stands for the total number of tissue samples. For each sample i=1,,ni=1,\dots,n, let MiM_{i} denotes the number of outcome locations (e.g., plaques) and {𝐔i,j}j=1Mi\{{\mathbf{U}}_{i,j}\}_{j=1}^{M_{i}} be their spatial coordinates. At each of these locations, we observe spatially indexed scalar outcomes, such as plaque sizes, denoted by {yi,jy(𝐔i,j),j=1,,Mi}\{y_{i,j}\coloneqq y({\mathbf{U}}_{i,j}),\,j=1,\dots,M_{i}\}. Let NiN_{i} stands for the number of cells in ii-thth sample and {𝐕i,k}k=1Ni\{{\mathbf{V}}_{i,k}\}_{k=1}^{N_{i}} be their spatial coordinates. The spatial transcriptomics data, recording gene expression levels of pp genes, are available at individual cell locations, denoted by {𝒙i,k𝒙(𝐕i,k)p,k=1,,Ni}\{{\bm{x}}_{i,k}\coloneqq{\bm{x}}({\mathbf{V}}_{i,k})\in\mathbb{R}^{p},\,k=1,\dots,N_{i}\}. Furthermore, the samples are obtained at different stages of the disease progression, where tit(i)t_{i}\coloneqq t(i) represents the corresponding collection time of the ii-thth sample. Note that all tissue samples are independent, as they come from different mice. Each cell further belongs to one of the CC cell-types. For clarity, let ci,kc(𝐕i,k)c_{i,k}\coloneqq c({\mathbf{V}}_{i,k}) represent the cell type associated with the cell at location 𝐕i,k{\mathbf{V}}_{i,k} in ii-thth sample.

Our goal is to estimate the effect of gene expression levels on plaque sizes, measured at different sample collection times. Specifically, we aim to estimate 𝜷c,tp\mbox{$\beta$}_{c,t}\in\mathbb{R}^{p}, whose jj-thth value quantifies the effect of the jj-thth gene’s expression (among the pp genes) in the cc-thth cell type on plaque sizes collected at the tt-thth time point. However, to run this analysis, we must take into account the following three characteristics in our proposed model:

1) The spatial locations of the cells and plaques do not match in general.

2) Each plaque is surrounded by cells with different cell types.

3) The gene-specific regression effects for different cell types may share some commonalities.

4) Samples collected at different time points are independent, but gene-specific effects may share commonalities through underlying latent factors that capture shared biological or regulatory dependencies.

In order to resolve these issues, we utilize the inherent spatial dependence in the data. Specifically, aligning with the realistic scenarios arising from integrating single-cell spatial transcriptomics data with amyloid-beta plaque data in Alzheimer’s disease, spatial proximity informs similarities in gene expression levels and biological response patterns, using sample-specific, density-adaptive neighborhoods.

Thus, to characterize the relationship between spatially mismatched 𝒙i,k\bm{x}_{i,k} and yi,jy_{i,j}, we first propose the following kernel-weighted objective function,

i=1nk=1Nij=1MiKi,j,k(yi,j𝒙i,k𝜷ci,k,ti)2,\displaystyle\sum_{i=1}^{n}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot\left(y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}_{c_{i,k},t_{i}}\right)^{2}, (1)

where ci,k{1,,C}c_{i,k}\in\{1,\dots,C\} standing for the cell-type of the cell present at location 𝐕i,k{\mathbf{V}}_{i,k}, ti{1,,T}t_{i}\in\{1,\dots,T\} is the collection time for sample ii, and Ki,j,kKhi(𝐔i,j𝐕i,k2)K_{i,j,k}\coloneqq K_{h_{i}}\left(\|{\mathbf{U}}_{i,j}-{\mathbf{V}}_{i,k}\|_{2}\right) are nonnegative kernel weights with possibly sample-specific bandwidth hih_{i}. For the kernel Kh()K_{h}(\cdot), we consider the Epanechnikov kernel to assign weights based on the Euclidean distance between 𝐔i,j{\mathbf{U}}_{i,j} and 𝐕i,k{\mathbf{V}}_{i,k}. This compactly supported kernel determines the contribution of each observation to the local fit. The Epanechnikov kernel is a widely used choice due to its optimality in minimizing mean squared error. The unscaled kernel is defined as K(d)=0.75(1d2)+K(d)=0.75(1-d^{2})_{+}, where ()+(\cdot)_{+} denotes the positive part. Given a bandwidth parameter h>0h>0, the kernel is scaled as Kh(d)=K(d/h)/hK_{h}(d)=K(d/h)/h. This formulation ensures that observations farther than hh units apart receive zero weight, while closer observations have smoothly decreasing influence as their distance approaches hh. Now, there are pGTpGT unknown parameters to be estimated in {𝜷c,tp}c=1Ct=1T\{\mbox{$\beta$}_{c,t}\in\mathbb{R}^{p}\}_{c=1}^{C}{}_{t=1}^{T}. Direct estimation of these parameters is computationally infeasible and fails to exploit the inherent dependencies among genes, cell types, and time. Moreover, sparsity-inducing structures are required to identify important genes. Thus, we propose a sparse reduced-rank factorization of the regression coefficient to identify important genes while borrowing information across different cell types and time.

Low-rank factorization of the regression coefficient:

In order to judiciously estimate the regression coefficient 𝜷c,t\bm{\beta}_{c,t}, we utilize a reduced-rank factorization that leverages shared structure across cell types, time, and genes. In that, we first rewrite it as a tensor p×C×T\mathcal{B}\in\mathbb{R}^{p\times C\times T} with three dimensions: Genes ×\times Cell-types ×\times Times, where the (l,c,t)(l,c,t)-thth entry is β(l,c,t)\beta(l,c,t), for l=1,,pl=1,\dots,p, c=1,,Cc=1,\dots,C, and t=1,Tt=1\dots,T. One may then apply the tensor-factorization methods, such as CANDECOMP/PARAFAC (CP) Carroll and Chang (1970); Harshman (1970); Kiers (2000), or more generally the Tucker decomposition Tucker (1966). In this work, we employ a CP decomposition on \mathcal{B} as:

β(l,c,t)\displaystyle\beta(l,c,t) =r=1Rwrql,r(1)qc,r(2)qt,r(3)\displaystyle=\sum_{r=1}^{R}w_{r}q_{l,r}^{(1)}q_{c,r}^{(2)}q_{t,r}^{(3)}
\displaystyle\mathcal{B} =r=1Rwr𝒒r(1)𝒒r(2)𝒒r(3)\displaystyle=\sum_{r=1}^{R}w_{r}{\bm{q}}^{(1)}_{r}\circ{\bm{q}}^{(2)}_{r}\circ{\bm{q}}^{(3)}_{r}
𝒒r(1)\displaystyle{\bm{q}}^{(1)}_{r} (ql,r(1))l=1p,𝒒r(2)(qc,r(2))c=1C,𝒒r(3)(qt,r(3))t=1T,\displaystyle\coloneqq\left(q_{l,r}^{(1)}\right)_{l=1}^{p},{\bm{q}}^{(2)}_{r}\coloneqq\left(q_{c,r}^{(2)}\right)_{c=1}^{C},{\bm{q}}^{(3)}_{r}\coloneqq\left(q_{t,r}^{(3)}\right)_{t=1}^{T},
such that wr>0,𝒒r(m)2=1,for all r=1,,R;m=1,2,3,\displaystyle\text{such that }w_{r}>0,\|{\bm{q}}^{(m)}_{r}\|_{2}=1,\text{for all }r=1,\ldots,R;m=1,2,3,

where 2\|\cdot\|_{2} stands for the 2\ell_{2} norm which is defined as 𝐚=iai2\|\mathbf{a}\|=\sqrt{\sum_{i}a^{2}_{i}}. By restricting 𝒒r(1){\bm{q}}^{(1)}_{r}, 𝒒r(2){\bm{q}}^{(2)}_{r}, and 𝒒r(3){\bm{q}}^{(3)}_{r}, to be normalized individually, it ensures that wrw_{r} captures the scale for each rank. This further helps in rank selection and introducing sparsity, as discussed later. The CP decomposition reduces parameter dimensionality while enabling information sharing across genes, cell types, and time. Specifically, we represent {𝜷c,t}\{\mbox{$\beta$}_{c,t}\} as a sum of RR components; r=1,,Rr=1,\ldots,R indexes the component (“rank”). Note that under the CP decomposition, the same rank RR appears across all modes by construction, since RR denotes the number of rank-one components. For greater flexibility, Tucker factorization with mode-specific ranks may be considered. However, for sparse tensors, CP decompositions are usually adequate.

In the gene direction, we place a LASSO penalty on 𝒒r(1)={ql,r(1)}l=1p\bm{q}^{(1)}_{r}=\{q^{(1)}_{l,r}\}_{l=1}^{p} for each rank rr, inducing sparsity in \mathcal{B}. The resulting objective function is:

F()=12i=1nk=1Nij=1MiKi,j,k(yi,j𝒙i,k𝜷ci,k,ti)2+λRpr=1R𝒒r(1)1,\displaystyle F(\mathcal{B})=\frac{1}{2}\sum_{i=1}^{n}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot\left(y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}_{c_{i,k},t_{i}}\right)^{2}+\frac{\lambda}{Rp}\sum_{r=1}^{R}\|{\bm{q}}_{r}^{(1)}\|_{1}, (2)

where 1\|\cdot\|_{1} stands for the 1\ell_{1} norm which is defined as 𝒂1=i|ai|\|{\bm{a}}\|_{1}=\sum_{i}|a_{i}|. It allows us to obtain a sparse solution and select the important genes.

3.1 Estimation

The overall implementation is fully automated with data-driven efficient tuning. We update all the parameters using a blocked coordinate descent algorithm. To do that, we first partition the full parameter set into RR blocks with rr-thth block: {𝒒r(1),𝒒r(2),𝒒r(3),wr}\{{\bm{q}}^{(1)}_{r},{\bm{q}}^{(2)}_{r},{\bm{q}}^{(3)}_{r},w_{r}\}. In each iteration, we cycle through r=1,,Rr=1,\dots,R, and update the components sequentially in each block. The blocked coordinate descent updates are given below. After each update, we normalize 𝒒r(k){\bm{q}}^{(k)}_{r} by its corresponding 2\ell_{2} norm and adjust wrw_{r} accordingly.

We minimize Eq.(2), where 𝒙i,k=(x1,ik,,xp,ik){\bm{x}}_{i,k}=(x_{1,ik},\dots,x_{p,ik})^{\top}. Write the CP form

𝜷c,t=s=1Rws𝒒s(1)qc,s(2)qt,s(3),𝜷c,t(r)=𝜷c,twr𝒒r(1)qc,r(2)qt,r(3).\mbox{$\beta$}_{c,t}=\sum_{\begin{subarray}{c}s=1\end{subarray}}^{R}w_{s}\,{\bm{q}}^{(1)}_{\cdot s}\,q^{(2)}_{c,s}\,q^{(3)}_{t,s},\qquad\mbox{$\beta$}_{c,t}^{(-r)}=\mbox{$\beta$}_{c,t}-w_{r}\,{\bm{q}}^{(1)}_{\cdot r}\,q^{(2)}_{c,r}\,q^{(3)}_{t,r}.

For a fixed component rr define

zik,r(1)wrqci,k,r(2)qti,r(3),sik,r𝒙i,k𝒒r(1),Ri,j,k(r)yi,j𝒙i,k𝜷ci,k,ti(r).z_{ik,r}^{(1)}\coloneqq w_{r}q^{(2)}_{c_{i,k},r}q^{(3)}_{t_{i},r},\qquad s_{ik,r}\coloneqq{\bm{x}}_{i,k}^{\top}{\bm{q}}_{\cdot r}^{(1)},\qquad R_{i,j,k}^{(-r)}\coloneqq y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}_{c_{i,k},t_{i}}^{(-r)}.

We update ql,r(1)q^{(1)}_{l,r} by holding all others fixed, and use the leave-one-coordinate-out residual

R~i,j,k(l,r)Ri,j,k(r)zik,r(1)(sik,rxl,ikql,r(1)).\tilde{R}_{i,j,k}^{(l,r)}\coloneqq R_{i,j,k}^{(-r)}-z_{ik,r}^{(1)}\left(s_{ik,r}-x_{l,ik}\cdot q^{(1)}_{l,r}\right).

Define the sufficient statistics

al,ri,k,jKi,j,kzik,r(1)xl,ikR~i,j,k(l,r),bl,ri,k,jKi,j,k(zik,r(1)xl,ik)2.a_{l,r}\coloneqq\sum_{i,k,j}K_{i,j,k}\cdot z_{ik,r}^{(1)}\cdot x_{l,ik}\cdot\tilde{R}_{i,j,k}^{(l,r)},\qquad b_{l,r}\coloneqq\sum_{i,k,j}K_{i,j,k}\left(z_{ik,r}^{(1)}\cdot x_{l,ik}\right)^{2}.

Let Sτ(u)=sign(u)(|u|τ)+S_{\tau}(u)=\mathrm{sign}(u)(|u|-\tau)_{+} be the soft-ththresholding operator where τ=λ/(Rp)\tau=\lambda/(Rp). Then the closed-from update is

ql,r(1)Sλ/(Rp)(al,r)bl,r,\displaystyle q_{l,r}^{(1)}\leftarrow\frac{S_{\lambda/(Rp)}(a_{l,r})}{b_{l,r}}, (3)

with the convention ql,r(1)0q_{l,r}^{(1)}\leftarrow 0 if bl,r=0b_{l,r}=0. For fixed (l,r)(l,r), the objective function reduces to 12bl,rq2al,rq+λRp|q|\frac{1}{2}b_{l,r}q^{2}-a_{l,r}q+\frac{\lambda}{Rp}|q|, whose minimizer is the soft-ththresholding operator above.

For updating the entry qc,r(2)q_{c,r}^{(2)} in the cell-type loadings, we set zik,r(2)wrqti,r(3)sik,rz_{ik,r}^{(2)}\coloneqq w_{r}q_{t_{i},r}^{(3)}s_{ik,r}. This block is unpenalized weighted least squares:

qc,r(2)i,k:ci,k=cj=1MiKi,j,kzik,r(2)Ri,j,k(r)i,k:ci,k=cj=1MiKi,j,k(zik,r(2))2.\displaystyle q_{c,r}^{(2)}\leftarrow\dfrac{\displaystyle\sum_{i,k:\,c_{i,k}=c}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot z_{ik,r}^{(2)}\cdot R_{i,j,k}^{(-r)}}{\displaystyle\sum_{i,k:\,c_{i,k}=c}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot\left(z_{ik,r}^{(2)}\right)^{2}}. (4)

Similarly for the time loadings, we set zik,r(3)wrqci,k,r(2)sik,rz_{ik,r}^{(3)}\coloneqq w_{r}q_{c_{i,k},r}^{(2)}s_{ik,r}. Then

qt,r(3)i:ti=tk=1Nij=1MiKi,j,kzik,r(3)Ri,j,k(r)i:ti=tk=1Nij=1MiKi,j,k(zik,r(3))2.\displaystyle q_{t,r}^{(3)}\leftarrow\dfrac{\displaystyle\sum_{i:\,t_{i}=t}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot z_{ik,r}^{(3)}\cdot R_{i,j,k}^{(-r)}}{\displaystyle\sum_{i:\,t_{i}=t}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot\left(z_{ik,r}^{(3)}\right)^{2}}. (5)

To update entry wr0w_{r}\geq 0, we set zik,r(2)qci,k,r(2)qti,r(3)sik,rz_{ik,r}^{(2)}\coloneqq q_{c_{i,k},r}^{(2)}q_{t_{i},r}^{(3)}s_{ik,r}. Then

wrmax{0,i,k,jKi,j,kzik,r(w)Ri,j,k(r)i,k,jKi,j,k(zik,r(w))2}.\displaystyle w_{r}\leftarrow\max\left\{0,\cdot\dfrac{\displaystyle\sum_{i,k,j}K_{i,j,k}\cdot z_{ik,r}^{(w)}\cdot R_{i,j,k}^{(-r)}}{\displaystyle\sum_{i,k,j}K_{i,j,k}\cdot\left(z_{ik,r}^{(w)}\right)^{2}}\right\}. (6)

Computational efficiency and stability:

In our implementation, we keep track of the residuals Ri,j,k=yi,j𝒙i,k𝜷ci,k,tiR_{i,j,k}=y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}_{c_{i,k},t_{i}} and update them incrementally. When a coordinate ql,r(1)q_{l,r}^{(1)} changes by Δ\Delta, set Ri,j,kRi,j,kzik,r(1)xl,ikΔR_{i,j,k}\leftarrow R_{i,j,k}-z_{ik,r}^{(1)}x_{l,ik}\Delta; analogous updates are used for qc,r(2)q_{c,r}^{(2)}, qt,r(3)q_{t,r}^{(3)} and wrw_{r}. This reduces the per-coordinate cost from O(piMiNi)O(p\sum_{i}M_{i}N_{i}) for recomputation to O(iMiNi)O(\sum_{i}M_{i}N_{i}), yielding substantial speed-ups without any approximation.

After updating the factors for each rank, we rescale them to unit 2\ell_{2} norm and absorb the entire scale into the scaler wrw_{r}. Let αm=𝒒r(m)2\alpha_{m}=\big\|{\bm{q}}^{(m)}_{\cdot r}\big\|_{2} for m{1,2,3}m\in\{1,2,3\}. We set

𝒒r(m)𝒒r(m)αmfor m=1,2,3,wrwrα1α2α3,{\bm{q}}^{(m)}_{\cdot r}\leftarrow\frac{{\bm{q}}^{(m)}_{\cdot r}}{\alpha_{m}}\quad\text{for }m=1,2,3,\qquad w_{r}\leftarrow w_{r}\,\alpha_{1}\alpha_{2}\alpha_{3},

so that the contribution from the rr-thth rank, wr𝒒r(1)qc,r(2)qt,r(3)w_{r}\,{\bm{q}}^{(1)}_{\cdot r}\,q^{(2)}_{c,r}\,q^{(3)}_{t,r}, (and thus fitted values) is unchanged. This prevents factor columns from arbitrarily exploding/vanishing with compensating shrink/expansion in wrw_{r}, keeps the coordinate-descent curvature well conditioned, and makes the 1\ell_{1} penalty on 𝒒(1){\bm{q}}^{(1)} act consistently rather than depending on arbitrary scaling of 𝒒(1){\bm{q}}^{(1)}.

Stopping rules:

One may continue the above iterative process up to a fixed number of iterations. Instead, we continue until the following predefined convergence criterion is met:

1)\displaystyle 1) maxc,t𝜷c,t(new)𝜷c,t(old)2𝜷c,t(old)2<106.\displaystyle\max\limits_{c,t}\dfrac{\|\mbox{$\beta$}^{(new)}_{c,t}-\mbox{$\beta$}^{(old)}_{c,t}\|_{\infty}^{2}}{\|\mbox{$\beta$}_{c,t}^{(old)}\|_{\infty}^{2}}<10^{-6}.
2)\displaystyle 2) maxr,m{𝒒r,new(m)𝒒r,old(m)}104.\displaystyle\max_{r,m}\left\{\|{\bm{q}}^{(m)}_{r,new}-{\bm{q}}^{(m)}_{r,old}\|\right\}\leq 10^{-4}. (7)

The second criterion in Eq.(7) was also used in Anandkumar et al. (2015) and Sun et al. (2017) in the context of tensor-valued response regression.

Automatic CP-rank update:

In our approach, the procedure of selecting CP rank is seamlessly integrated into the blocked Coordinate Descent algorithm. We do not need to specify or tune the rank during model fitting manually. Instead, the method begins by initializing the model with a sufficiently large CP rank, which we recommend setting to the product of the number of cell type (CC) and time points (TT), i.e., C×TC\times T. The general upper bound on the maximum CP rank for any three-way tensor 𝓧I×J×K\bm{\mathcal{X}}\in\mathbb{R}^{I\times J\times K} satisfies (Kruskal, 1989; Kolda and Bader, 2009)

rank(𝓧)min{IJ,IK,JK}.\displaystyle\mathrm{rank}(\bm{\mathcal{X}})\leq\min\{IJ,IK,JK\}. (8)

A rank-dropping strategy is then applied during the first 500 iterations, and the algorithm actively prunes unnecessary components in these steps. This strategy ensures that the subsequent updates focus on stabilizing the estimation across iterations based on the selected rank after the rank-dropping. Specifically, the CP rank of \mathcal{B} is reduced from RR to R1R-1 by discarding rr-thth component if any of the following conditions are met:

1)wriwi<105,2)wr=0,3)𝒒r(1)>0.99.\displaystyle 1)\frac{w_{r}}{\sum_{i}w_{i}}<10^{-5},\qquad 2)w_{r}=0,\qquad 3)\|{\bm{q}}_{r}^{(1)}\|_{\infty}>0.99. (9)

By automatically discarding redundant or negligibly small components, the rank-dropping strategy allows the model to achieve both good estimation accuracy and computational efficiency by selecting a final rank that balances model complexity and goodness-of-fit automatically.

Lambda selection:

We first fit the method for different choices of λ\lambda and then select the optimal λ\lambda based on a BIC-like criteria. To ensure computational efficiency, we begin with the penalty parameter λ\lambda that yields the all-zero solution and progressively decrease it, as done in the path solution algorithms (Efron et al., 2004; Friedman et al., 2010). The successive penalty parameters are then set recursively as λk+1=0.9λk\lambda_{k+1}=0.9\lambda_{k}. Finally, for each penalty, we evaluate the following BIC-like criterion to do the final selection:

Nlog[1Ni=1nk=1Nij=1MiKi,j,k(yi,j𝒙i,k𝜷ci,k,ti)2]+νlog(N),\displaystyle N^{*}\log\left[\frac{1}{N^{*}}\sum_{i=1}^{n}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot\left(y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}_{c_{i,k},t_{i}}\right)^{2}\right]+\nu\log(N^{*}), (10)

where N=i=1nk=1Nij=1Mi𝟏{Ki,j,k>0}N^{*}=\sum_{i=1}^{n}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}\mathbf{1}\{K_{i,j,k}>0\} and ν\nu is the number of nonzero entries in 𝒒r(1){\bm{q}}^{(1)}_{r}, 𝒒r(2){\bm{q}}^{(2)}_{r}, and 𝒒r(3){\bm{q}}^{(3)}_{r}. Although one can choose a smaller decay factor than 0.9 for a finer exploration, in our experiments, this choice works reasonably well.

Bandwidth selection:

The kernel bandwidth hih_{i} is set based on an elbow rule, as discussed below. First, we set the candidate bandwidths as the medians of the distances from each plaque to its LL-thth closest neighboring cell for different choices of LL’s. Let the median distance based on LL-thth closest neighboring cell in ii-thth sample be Hi(L)=Median({ei,j(L)}j=1Ni)H_{i}(L)=\text{Median}(\{e^{(L)}_{i,j}\}_{j=1}^{N_{i}}), where ei,j(L)=D(L)(𝐔i,j)e^{(L)}_{i,j}=D_{(L)}({\mathbf{U}}_{i,j}) where D(1)(𝐔i,j)D(2)(𝐔i,j)D(Mi)(𝐔i,j)D_{(1)}({\mathbf{U}}_{i,j})\leq D_{(2)}({\mathbf{U}}_{i,j})\leq\cdots\leq D_{(M_{i})}({\mathbf{U}}_{i,j}) stands for the sorted sequence of the distances of different cells from jj-thth plaque i.e. {𝐔i,j𝐕i,k2}k=1Mi\{\|{\mathbf{U}}_{i,j}-{\mathbf{V}}_{i,k}\|_{2}\}_{k=1}^{M_{i}}.

Finally, we plot the normalized loss

1Ni=1nk=1Nij=1MiKHi(L)(𝐔i,j𝐕i,k2)(yi,j𝒙i,k𝜷ci,k,ti)2\dfrac{1}{N^{*}}\sum_{i=1}^{n}\sum_{k=1}^{N_{i}}\sum_{j=1}^{M_{i}}K_{H_{i}(L)}(\|{\mathbf{U}}_{i,j}-{\mathbf{V}}_{i,k}\|_{2})\left(y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}_{c_{i,k},t_{i}}\right)^{2}

as a function of the neighborhood size LL and select LL using the elbow (point-of-diminishing-returns) criterion, defined as the value of LL maximizing the perpendicular distance to the line connecting the endpoints of the loss curve (Satopaa et al., 2011). Even with the same LL for all samples, the sample-specific bandwidths (Hi(L))(H_{i}(L)) may be different.

3.2 Initialization

To initialize the model parameters wrw_{r}, 𝒒r(1){\bm{q}}^{(1)}_{r}, 𝒒r(2){\bm{q}}^{(2)}_{r}, and 𝒒r(3){\bm{q}}^{(3)}_{r}, we apply a weighted Ridge regression without the CP decomposition structure. For each pair (c,t)(c,t), define the objective

c,t(𝜷)=i:ti=tk:ci,k=cj=1MiKi,j,k(yi,j𝒙i,k𝜷)2+λc,t𝜷22,\mathcal{L}_{c,t}(\mbox{$\beta$})=\sum_{i:t_{i}=t}\sum_{k:c_{i,k}=c}\sum_{j=1}^{M_{i}}K_{i,j,k}\cdot\left(y_{i,j}-{\bm{x}}_{i,k}^{\top}\mbox{$\beta$}\right)^{2}+\lambda_{c,t}\|\mbox{$\beta$}\|_{2}^{2},

and the estimator

^𝜷c,t(Ridge)argmin𝜷pc,t(𝜷).\hat{}\mbox{$\beta$}_{c,t}^{(\text{Ridge})}\in\arg\min_{\mbox{$\beta$}\in\mathbb{R}^{p}}\mathcal{L}_{c,t}(\mbox{$\beta$}).

Here, we put the l2l_{2} penalty on the coefficient vector. A CP decomposition with the initial rank described above is then applied to the resulting ^(Ridge)\hat{\mathcal{B}}^{(\text{Ridge})} to obtain the initial model parameters, where (^(Ridge))l,c,t=(^𝜷c,t(Ridge))l\left(\hat{\mathcal{B}}^{(\text{Ridge})}\right)_{l,c,t}=\left(\hat{}\mbox{$\beta$}_{c,t}^{(\text{Ridge})}\right)_{l}, for l=1,,pl=1,\dots,p.

Algorithm 1 Comprehensive Implementation Algorithm for the Proposed Method

Step 1: Select a sequence of λ\lambda and LL. For each choice of (λ,L)(\lambda,L) run the following:

1:{y(𝐔i,j)}j=1,,Ni;i=1,,n\{y({\mathbf{U}}_{i,j})\}_{j=1,\ldots,N_{i};i=1,\ldots,n},{𝒙(𝐕i,k)}k=1,,Mi;i=1,,n\{{\bm{x}}({\mathbf{V}}_{i,k})\}_{k=1,\ldots,M_{i};i=1,\ldots,n}, RmaxR_{\max}, λ\lambda, hi(L)h_{i}(L)
2:(Initialization) Run a weighted Ridge regression applying glmnet and get ^(Ridge)\hat{\mathcal{B}}^{(\text{Ridge})} as described in Sec 3.2, where λRidge\lambda_{\text{Ridge}} is selected via cross-validation (Friedman et al., 2010).
3:while Rmax1R_{\max}\geq 1 do
4:  RRmaxR\leftarrow R_{\max}
5:  Initialize wrw_{r}, 𝒒r(1){\bm{q}}^{(1)}_{r}, 𝒒r(2){\bm{q}}^{(2)}_{r}, and 𝒒r(3){\bm{q}}^{(3)}_{r} by applying CP on ^(Ridge)\hat{\mathcal{B}}^{(\text{Ridge})} with initial rank RR.
6:  repeat
7:   Update parameters as equations from Eq.(3) to Eq.(6).
8:   wrwr𝒒r(1)𝒒r(2)𝒒r(3)w_{r}\leftarrow w_{r}\|{\bm{q}}^{(1)}_{r}\|\|{\bm{q}}^{(2)}_{r}\|\|{\bm{q}}^{(3)}_{r}\|; 𝒒r(m)𝒒r(m)/𝒒r(m){\bm{q}}^{(m)}_{r}\leftarrow{\bm{q}}^{(m)}_{r}/\|{\bm{q}}^{(m)}_{r}\| for m=1,2,3m=1,2,3.
9:   Update CP-rank RR automatically by Eq.(9).
10:  until Stopping rule Eq.(7) satisfied, or R=0R=0.
11:  if R=0R=0 then
12:   RmaxRmax1R_{\max}\leftarrow R_{\max}-1; Repeat Steps 2-9
13:  else
14:   RfinalRR_{\mathrm{final}}\leftarrow R; break \triangleright converged at current rank RR   
15:Obtain final estimate as λ,L(final)=r=1Rfinalwr𝒒r(1)𝒒r(2)𝒒r(3)\mathcal{B}_{\lambda,L}^{(\mathrm{final})}=\sum_{r=1}^{R_{\mathrm{final}}}w_{r}{\bm{q}}^{(1)}_{r}\circ{\bm{q}}^{(2)}_{r}\circ{\bm{q}}^{(3)}_{r} for a given λ\lambda and LL.

Step 2: Select the optimal λ^\hat{\lambda} from the pre-specified sequence for each choice of LL using the criteria in Eq.(10).

Step 3: Select the optimal L^\hat{L} applying the elbow rule, and get the final estimate λ^,L^(final)\mathcal{B}^{(\mathrm{final})}_{\hat{\lambda},\hat{L}}.

4 Simulations

To evaluate our method’s ability to recover the important genes and quantify gene-plaque relationships, we design a simulation setting relying on simulated spatial transcriptomic data using the STRsim package (Zhu et al., 2023). Following the real data, the plaque sites are set well-separated in the simulation too.

We compare our estimates with those from a LASSO-regularized linear regression model. Since fitting LASSO requires the data in a paired form as (outcome, predictors), we first organize the data as (yi,𝒙i)(y_{i},{\bm{x}}_{i}), where 𝒙i{\bm{x}}_{i} is the gene expression of the cell nearest to ii-thth outcome yiy_{i}. Then, we fit the LASSO regression model.

4.1 Data Generation

While generating the simulated data, we mimic our real data and consider two time points. Thus, for each replicated dataset, we generate two square-shaped SRT samples using SRTsim package, one for each time point. For each sample i{1,2}i\in\{1,2\}, let 𝒮i\mathcal{S}_{i} denote the set of spatial spots. For each sample ii, we simulate a square section with an expected |𝒮i|5000|\mathcal{S}_{i}|\approx 5000 spots and C=3C=3 cell-type groups (SRTsim “number of groups”=3). SRTsim returns a gene-by-spot expression matrix 𝒙ip×|𝒮i|{\bm{x}}_{i}\in\mathbb{R}^{p\times|\mathcal{S}_{i}|}, 2D coordinates {𝐬i,j}j𝒮i\{{\mathbf{s}}_{i,j}\}_{j\in\mathcal{S}_{i}}, and group labels cij{1,2,3}c_{ij}\in\{1,2,3\}. Each spot has a unique location within the section.

For each sample ii, we designate a plaque set 𝒫i𝒮i\mathcal{P}_{i}\subset\mathcal{S}_{i} of size Mi{50,100,200}M_{i}\in\{50,100,200\}, chosen to be well separated in space and balanced across cell types (i.e., |𝒫i,c|is the same for c=1,,C|\mathcal{P}_{i,c}|\;\;\text{is the same for }c=1,\dots,C, where 𝒫i,c={j𝒫i:cij=c}\mathcal{P}_{i,c}=\{j\in\mathcal{P}_{i}:c_{ij}=c\}). The remaining spots 𝒞i=𝒮i𝒫i\mathcal{C}_{i}=\mathcal{S}_{i}\setminus\mathcal{P}_{i} serve as the pool of neighboring predictor cells, reflecting the spatial misalignment in real data where outcomes are observed at plaques while predictors are measured at nearby SRT locations.

For each plaque j𝒫ij\in\mathcal{P}_{i}, we generate the outcom using the linear model:

yi,j\displaystyle y_{i,j} =𝒙(si,j)𝜷c,t+ei,j,\displaystyle={\bm{x}}(s_{i,j})^{\top}\mbox{$\beta$}_{c,t}+e_{i,j},

where 𝜷c,tp\mbox{$\beta$}_{c,t}\in\mathbb{R}^{p} is the regression coefficient for cc-thth cell-type at time tt. Let 𝐞i=(ei,j)j𝒫i\mathbf{e}_{i}=(e_{i,j})_{j\in\mathcal{P}_{i}}. Rather than independent noise, we add spatially correlated errors,

𝐞i𝒩(0,σe2𝐊i),σe2{1,5,10,100,200},\displaystyle\mathbf{e}_{i}\sim\mathcal{N}(0,\sigma^{2}_{e}\mathbf{K}_{i}),\qquad\sigma^{2}_{e}\in\{1,5,10,100,200\},

with Ki(j,j)=exp{di(j,j)/ϕ^}K_{i}(j,j^{\prime})=\exp\{-d_{i}(j,j^{\prime})/\hat{\phi}\}. We estimate ϕ^\hat{\phi} once from the real plaque by fitting an exponential variogram using geoR (variog/variofit; Ribeiro Jr and Diggle (2025)) and then hold ϕ^\hat{\phi} fixed across samples and replicates. Although yi,jy_{i,j} is generated from the expression at the same plaque location 𝒙(si,j){\bm{x}}(s_{i,j}), in the downstream analysis we treat plaque-site expression as unobserved and remove {𝒙(si,j):j𝒫i}\{{\bm{x}}(s_{i,j}):j\in\mathcal{P}_{i}\} from the predictor set, so {yi,j:j𝒫i}\{y_{i,j}:j\in\mathcal{P}_{i}\} are modeled using only neighboring non-plaque spots {𝒙(si,j):j𝒞i}\{{\bm{x}}(s_{i,j}):j\in\mathcal{C}_{i}\}.

We set the total number of predictors at 50, with 5 out of 50 having non-zero effects, and assume a CP-based low-rank form with rank 4. In order to achieve this, we set a randomly selected 5 rows in 𝒒1{\bm{q}}_{1} to be non-zero and the rest all zero, with non-zero entries generated from 𝒩(0,22)\mathcal{N}(0,2^{2}) The entries in 𝒒2{\bm{q}}_{2} and 𝒒3{\bm{q}}_{3} are generated from 𝒩(5,22)\mathcal{N}(5,2^{2}) and 𝒩(0,0.52)\mathcal{N}(0,0.5^{2}) respectively.

4.2 Estimation

We set the RmaxR_{\text{max}} at 66 following maximum rank characterization in Eq.(8), and vary bandwidth over choosing L{5,10,12,15,20,25,30,35,40,45,50,70}L\in\{5,10,12,15,20,25,30,35,40,45,50,70\}. We specify λmax=1010×0.930\lambda_{\text{max}}=10^{10}\times 0.9^{-30}. We then apply the procedure described in Algorithm 1, which performs data-driven selection of these parameters and yields the final estimate.

To assess variable selection performance, we compute the true positive rate (TPR) and false positive rate (FPR) at threshold τ0\tau\geq 0 in Table 1 as

TPR(τ)\displaystyle\text{TPR}(\tau) =𝕀{|β^lgt|τ,βlgt0}𝕀{βlgt0};FPR(τ)=𝕀{|β^lgt|τ,βlgt=0}𝕀{βlgt=0}.\displaystyle=\frac{\sum\mathbb{I}\left\{|\hat{\beta}_{lgt}|\geq\tau,\ \beta_{lgt}\neq 0\right\}}{\sum\mathbb{I}\left\{\beta_{lgt}\neq 0\right\}};\quad\text{FPR}(\tau)=\frac{\sum\mathbb{I}\left\{|\hat{\beta}_{lgt}|\geq\tau,\ \beta_{lgt}=0\right\}}{\sum\mathbb{I}\left\{\beta_{lgt}=0\right\}}.

By thresholding the estimated coefficients β^(i,c,t)\hat{\beta}(i,c,t) at a sequence of cutoff values, we trace out an ROC curve in the TPR–FPR plane. We augment it with the extreme points (0,0) and (1,1) to ensure a full range of rate values, and compute the area under the curve (AUC) via the trapezoidal rule.

To quantify overall estimation accuracy, we calculate the mean squared error (MSE) of the coefficient estimates:

MSE=meanl,c,t[β^(l,c,t)β(l,c,t)]2.\displaystyle MSE=\underset{l,c,t}{\mathrm{mean}}\left[\widehat{\beta}(l,c,t)-\beta(l,c,t)\right]^{2}.

Across 30 replicates over plaque numbers Mi{50,100,200}M_{i}\in\{50,100,200\} and error variances σe2{1,5,10,100,200}\sigma_{e}^{2}\in\{1,5,10,100,200\}, the proposed method consistently delivers better discrimination than the paired LASSO: AUC = 0.54–0.68 (mean 0.606) versus 0.50–0.54 (mean 0.521), and typically lower MSE. The gains persist under high noise and grow with larger MiM_{i}, indicating accurate recovery of the underlying cell-type-specific gene effects. As shown in Table 1, the estimates from our proposed method are better both in estimation accuracy and in selecting the genes with non-zero effects, whereas the paired LASSO often collapses to all-zero solutions.

MiM_{i} σe2\sigma_{e}^{2} Proposed MSE Paired MSE Proposed AUC Paired AUC
50 1 6.333 7.086 0.574 0.485
50 5 6.367 7.014 0.563 0.485
50 10 6.392 6.985 0.544 0.484
50 100 6.891 8.029 0.495 0.501
50 200 7.359 9.169 0.510 0.507
100 1 6.256 7.455 0.647 0.514
100 5 6.253 7.422 0.629 0.524
100 10 6.275 7.380 0.650 0.525
100 100 6.403 8.330 0.565 0.539
100 200 6.586 9.256 0.568 0.546
200 1 6.278 6.706 0.626 0.484
200 5 6.235 6.717 0.607 0.489
200 10 6.273 6.705 0.606 0.494
200 100 6.319 6.815 0.565 0.501
200 200 6.486 8.356 0.568 0.498
Table 1: Performance over 3030 replicates across choices of plaque number MiM_{i} and error variance σe2\sigma_{e}^{2}, when the data is generated following Section 4.1. For each condition, penalty parameter λ\lambda and bandwidth hh were selected as described in Section 3. Note that paired LASSO often yields all-zero estimates.

5 A𝜷\beta Plaque Size Analysis

In this section, we perform the integrative spatial transcriptomic analysis to examine the relationship between gene expression profiles and Amyloid-β\beta (Aβ\beta) plaque pathology in Alzheimer’s disease (AD), focusing on two different stages of disease progression and three specific cell types. We analyze four SRT tissue sections (i=1,2,3,4i=1,2,3,4) from STARmap PLUS, with two sections at each time point (8 and 13 months). Specifically, sections i=1,2i=1,2 correspond to the two 8-month replicates, and sections i=3,4i=3,4 correspond to the two 13-month replicates. The cell counts are (N1,N2,N3,N4)=(8,186, 8,202, 10,372, 9,634)(N_{1},N_{2},N_{3},N_{4})=(8{,}186,\ 8{,}202,\ 10{,}372,\ 9{,}634), and plaque counts are (M1,M2,M3,M4)=(88, 99, 192, 136)(M_{1},M_{2},M_{3},M_{4})=(88,\ 99,\ 192,\ 136). We use the normalized expression provided by Zeng et al. (2023) to place all p=2,766p=2{,}766 genes on a common scale across sections and time points so that estimated coefficients are comparable across genes and over time.

We set the RmaxR_{\text{max}} at 66, and λmax=105\lambda_{\text{max}}=10^{5}. The final selected values are always found to be smaller than these pre-specified upper bounds. We consider the discrete grid 1,,20,25,30{1,\dots,20,25,30} for the number of neighbors LL, while setting the bandwidth Hi(L)H_{i}(L) for the ii-thth sample; the bandwidth selection procedure is described in Algorithm 1.

Prior to fitting the plaque-size regressions, we apply an expression filter independent of plaque morphology to exclude transcripts with insufficient detection for stable estimation. Specifically, within each of the three target cell types and for each tissue sample collected at 8 and 13 months, we retain genes detected (nonzero) in >20%>20\% of near-plaque cells (within 150μ150\mum of the plaque center). We then augment the retained set with (i) 64 marker genes from Zeng et al. (2023) and (ii) plaque-induced genes (PIGs) from Chen et al. (2020) (32 of the 57 PIGs are present in our dataset). This procedure yields a final gene panel of 182 transcripts for modeling plaque morphology.

From our proposed model, we estimate a 3-dimensional coefficient tensor p×C×T\mathcal{B}\in\mathbb{R}^{p\times C\times T} (gene ×\times cell type ×\times time point), with a CP structure to relate plaque size to local, cell-type-resolved expression across time. The final rank is R=4R=4 with penalty λ=105(0.9)211.094×104\lambda=10^{5}(0.9)^{21}\approx 1.094\times 10^{4}. For the bandwidth selection, elbow rule give us the choice of L=5L=5 for all four samples. Accordingly, we set the sample-specific bandwidth for the kernel function as hi=Hi(5),i=1,2,3,4h_{i}=H_{i}(5),i=1,2,3,4, where Hi(L)H_{i}(L) denotes the median distance to the LL-thth nearest neighboring cell in sample ii (defined in Sec 3.1). This yields h=(41.77,45.42,36.89,37.76)μh=(41.77,45.42,36.89,37.76)\mum. Here, each coefficient β(l,c,t)\beta(l,c,t) is associated with the normalized expression of gene ll, while setting other variables in the model fixed for cell type cc and time tt.

For each cell type cc and time tt, we summarize the overall strength of association between gene expression in cell type cc and the plaque size at time tt by Ac,t=βct2,A_{c,t}=\|\beta_{\cdot ct}\|_{2}, the 2\ell_{2} norm of all pp gene-effects. We summarize the averaged direction of association by β¯c,t=p1l=1pβ(l,c,t),\bar{\beta}_{c,t}=p^{-1}\sum_{l=1}^{p}\beta(l,c,t), i.e., the mean gene effect across all pp genes for cell type cc at time tt. At 8 months, Ac,tA_{c,t} is modest (Astrocyte: 2.30; Oligodendrocyte: 6.20; Microglia: 3.04), but it increases sharply by 13 months (Astrocyte: 19.40; Oligodendrocyte: 39.74; Microglia: 21.81), indicating substantially stronger plaque–expression associations over time. The signed means β¯c,t\bar{\beta}_{c,t} are near zero at 8 months (Astrocyte: 0.0049-0.0049, Oligodendrocyte 0.0018-0.0018), while Microglia shows a small positive mean (0.02850.0285), meaning that higher microglial gene expression tends to be associated with larger plaques on average at 8 months. By 13 months, the average effects are predominantly positive for Oligodendrocyte (0.0379) and especially Microglia (0.1536), while Astrocyte remains near zero (0.0017). Overall, these results point to strong, increasing plaque-proximal gene-expression effects in oligodendrocyte-lineage cells and microglia over time, with microglia showing the clearest positive directional signal at 13 months.

Following the CP structure, we pursue a component-level interpretation of the results here. CP Decomposes the coefficient tensor into four rank-1 latent factors that summarize the gene–cell-type–time patterns in the association with plaque size (Table 2). The first three components have substantially larger weights than the fourth. For component rr with factors 𝒒r(1){\bm{q}}_{r}^{(1)} (genes), 𝒒r(2){\bm{q}}_{r}^{(2)} (cell types), 𝒒r(3){\bm{q}}_{r}^{(3)} (times) and weight wrw_{r}, we orient it so that its top-loading genes have positive entries in the gene factor, and then summarize its overall direction using the signed-average product

NetDirr=(lql,r(1)l|ql,r(1)|)(cqc,r(2)c|qc,r(2)|)(tqt,r(3)t|qt,r(3)|)[1,1].\text{NetDir}_{r}=\left(\frac{\sum_{l}q_{l,r}^{(1)}}{\sum_{l}|q_{l,r}^{(1)}|}\right)\left(\frac{\sum_{c}q_{c,r}^{(2)}}{\sum_{c}|q_{c,r}^{(2)}|}\right)\left(\frac{\sum_{t}q_{t,r}^{(3)}}{\sum_{t}|q_{t,r}^{(3)}|}\right)\in[-1,1].

We refer to 13 months as the late disease stage. As shown in Table 2, each component is dominated by the late time point, with the 13-month time loading having magnitude close to 1, indicating that the estimated associations concentrate their mass at 13 months (with direction determined by the joint sign across modes). Among them, Component 3 shows a clearly positive direction (Net direction +0.47\approx+0.47) with largest loadings on Oligodendrocytes and Microglia at 13 months; top genes include Trem2, Tmsb4x, Tyrobp, and Plp1, indicating a late microglia–oligodendrocyte component-derived signature in which higher expression is associated with larger plaques. In contrast, Components 1 and 2 carry negative temporal loadings at 13 months and split cell-type emphasis–one tilting toward Oligodendrocytes (Oligodendrocyte ↑, Microglia ↓) and the other toward Microglia (Microglia ↑, Astrocyte ↓). Despite substantial overlap in top genes (e.g., Trem2, C1qa, Aplp1), their opposite cell-type signs imply opposing directions of association across components at the same time point, consistent with heterogeneity between microglia-enriched expression patterns and oligodendrocyte/myelin related pathways. Finally, Component 4 is uniformly negative (Net direction 1\approx-1), dominated by Oligodendrocyte and Microglia loadings at 13 months with genes such as Tmsb4x, Plp1, C1qa, and Cst3, suggesting a late-stage component pattern in which higher expression accompanies smaller plaques. Thus, in summary, the estimated associations concentrate at 13 months and along a Microglia–Oligodendrocyte axis, but with mixed directions across components: Component 3 aligns with larger plaques, whereas Components 1, 2, and 4 align with smaller plaques, suggesting distinct plaque-proximal expression patterns.

These multigene expression patterns arising from the CP-component-based results align well with the known biological knowledge: microglial activation modules (e.g., Trem2, Tyrobp, C1q complex) rise near plaques (Hong et al., 2016; Keren-Shaul et al., 2017; Krasemann et al., 2017), while oligodendrocyte/myelin genes (e.g., Plp1, Mbp) mreflect white-matter processes implicated in AD progression (McKenzie et al., 2017; Nasrabady et al., 2018). CP combines these signals into a small number of interpretable, signed components that are localized by cell type and disease stage, enabling component-level summaries rather than gene-by-gene effects. For instance, a component dominated by microglia at 13 months shows a positive association with plaque size, whereas a component dominated by oligodendrocytes at 13 months shows a negative association.

Component,rr Weight, wrw_{r} NetDir Top cells Top times Top genes
11 98.2798.27 0.1319-0.1319 Oligodendrocyte (+0.75+0.75), Microglia (0.51-0.51) 13 mo (0.99-0.99), 8 mo (0.15-0.15) Trem2(+0.91+0.91), Tyrobp(0.24-0.24), C1qa(0.19-0.19), Aplp1(+0.13+0.13), Mbp(+0.08+0.08)
22 86.3486.34 0.0287-0.0287 Microglia (+0.83+0.83), Astrocyte (0.55-0.55) 13 mo (0.99-0.99), 8 mo (0.13-0.13) Trem2(+0.92+0.92), C1qa(0.20-0.20), Aspa(0.15-0.15), Aplp1(+0.13+0.13), Tyrobp(0.11-0.11)
33 66.1366.13 +0.4662+0.4662 Oligodendrocyte (+0.93+0.93), Microglia (+0.35+0.35) 13 mo (+0.99+0.99), 8 mo (+0.12+0.12) Trem2(+0.85+0.85), Tmsb4x(+0.31+0.31), Tyrobp(+0.22+0.22), Plp1(+0.17+0.17), Aplp1(+0.15+0.15)
44 19.4919.49 1.0000-1.0000 Oligodendrocyte (0.85-0.85), Microglia (0.52-0.52) 13 mo (+1.00+1.00), 8 mo (+0.09+0.09) Tmsb4x(+0.85+0.85), Plp1(+0.40+0.40), C1qa(+0.29+0.29), Cst3(+0.19+0.19)
Table 2: CP component summary for the estimated coefficient tensor. For each component rr, “Top cells/times/genes” are those with the largest-magnitude loadings in the corresponding mode. Component signs are oriented per component by top genes. The direction of association is determined by the reconstructed coefficient contribution (positive = larger plaques with higher expression; negative = smaller plaques with higher expression).

Gene-level examples consistent with prior literature (associational). To summarize temporal changes in estimated regression coefficients (associational gene effects), we write aba\!\to\!b for gene ll in cell type cc, where β^l,c,8=a\hat{\beta}_{l,c,8}=a (8 months) and β^l,c,13=b\hat{\beta}_{l,c,13}=b (13 months). Astrocytes: transport-related markers show increasingly negative associations with plaque size (Aqp4 0.271.85-0.27\!\to\!-1.85, Slc13a3 0.261.67-0.26\!\to\!-1.67), while alarmin/stress signals become more positively associated (Il33 +0.27+1.99+0.27\!\to\!+1.99, S100a6 +0.28+1.82+0.28\!\to\!+1.82), consistent with reactive astrocyte frameworks and IL-33–microglia crosstalk (Escartin et al., 2021; Vainchtein et al., 2018). Oligodendrocyte lineage: larger plaques are associated with stronger precursor-like signal (Gpr17 +0.29+2.39+0.29\!\to\!+2.39) and reduced mature oligodendrocyte identity (Sox10 0.160.50-0.16\!\to\!-0.50, Aspa 0.20\approx\!-0.20). These patterns are consistent with reported links between oligodendrocyte function and iron/transferrin biology (Todorich et al., 2009), and OPC–vasculature coupling provides spatial context (Tsai et al., 2016). Microglia: an early Trem2-associated effect attenuates and reverses (Trem2 +0.640.37+0.64\!\to\!-0.37), while a later Apoe/Tyrobp-centered signature becomes strongly positively associated with plaque size (Apoe +0.21+1.78+0.21\!\to\!+1.78; Tyrobp 0.11+1.20-0.11\!\to\!+1.20), mirroring reported DAM-like shifts and the TREM2–APOE pathway (Keren-Shaul et al., 2017; Krasemann et al., 2017; Deczkowska et al., 2018).

Novel or less-established associations requiring validation (hypothesis-generating). Oligodendrocyte “logistics”: strong, late positive coefficients for resource intake and protein production (Trf +0.53+3.53+0.53\!\to\!+3.53, Pabpc1 +0.50+3.55+0.50\!\to\!+3.55, Caskin1 +0.52+3.95+0.52\!\to\!+3.95) suggest a resource-mobilizing oligodendrocyte niche as plaques enlarge; in contrast, Flt1 is strongly negative (0.574.02-0.57\!\to\!-4.02), pointing to a vascular/adhesion signature aligned with smaller plaques. Microglial cytoskeleton and lysosome: Rhoc shows a robust negative association (0.523.95-0.52\!\to\!-3.95), consistent with more contractile morphology near smaller plaques, while cathepsins diverge (Ctsl +0.20+1.04+0.20\!\to\!+1.04 vs Ctss 0.110.93-0.11\!\to\!-0.93), indicating targeted lysosomal remodeling rather than uniform activation. Astrocytes: persistence and amplification of transport↓/alarmin↑ patterns across time underscore a simple, testable hypothesis: supporting basic astrocytic transport may align with smaller plaques.

6 Conclusion

In this paper, we introduce a novel kernel-weighted method to model how plaque-proximal single-cell spatial transcriptomic expression patterns relate to plaque size. The proposed framework is designed to capture both cell-type-specific and time-varying effects, thereby accommodating heterogeneity across genes, cell populations, and sample collection times. In addition, the method is fully automated through data-driven specification of the tuning parameters, such as rank, penalty, and bandwidth. We further adopt a low-rank model for the regression coefficients, which naturally captures the shared structure and inherent dependencies among genes, cell types, and time points.

Our analysis reveals several plaque-size-associated expression patterns that are strongest at the 13-month data, mainly in microglia (brain immune cells) and oligodendrocytes (myelin-related cells), indicating that plaque-size associations reflect multiple transcriptional signatures acting in parallel rather than a single dominant signature. At a late disease stage, the microglial pattern is characterized by lipid-handling and complement-related genes, consistent with a plaque-proximal activation signature (Hong et al., 2016; Keren-Shaul et al., 2017; Krasemann et al., 2017); oligodendrocyte-lineage cells show a shift toward less mature (precursor-like) signatures, together with increased iron-homeostasis and protein-synthesis transcripts; and astrocyte signals are comparatively weak in net plaque-size tracking, with transport-related genes showing attenuated associations. These findings suggest testable hypotheses for future validation: reduce the late-stage microglial-dominant lipid/complement activity, support astrocyte water/solute transport, and steer oligodendrocytes toward a mature maintenance state to evaluate whether plaque growth can be limited.

For future work, we will analyze how gene expression changes in comparison to the plaque size. This will let us map which genes and multigene expression patterns rise or fall as plaques enlarge, compare patterns across different disease stages and brain regions. These insights will have important translational implications for understanding disease mechanisms and guiding hypothesis-driven experimental studies. Methodologically, there are opportunities to consider more flexible Tucker factorization for 𝜷\beta, and also other types of effect characterizations beyond linearity.

Data availability

We use publicly available data from Zeng et al. (2023) for our analysis.

Acknowledgement

The authors would like to thank Dr. Dongyuan Wu for helpful discussions and suggestions regarding the data. During the preparation of this work, the authors used ChatGPT to assist with writing. All content was subsequently reviewed and edited by the authors, who take full responsibility for the publication.

References

  • S. E. Alexeeff, R. J. Carroll, and B. Coull (2016) Spatial measurement error and correction by spatial simex in linear regression models when using predicted air pollution exposures. Biostatistics 17 (2), pp. 377–389. Cited by: §1, §1.
  • A. Anandkumar, R. Ge, and M. Janzamin (2015) Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-11 Updates. arXiv. Note: arXiv:1402.5180 Cited by: §3.1.
  • R. Arora, C. Cao, M. Kumar, S. Sinha, A. Chanda, R. McNeil, D. Samuel, R. K. Arora, T. W. Matthews, S. Chandarana, et al. (2023) Spatial transcriptomics reveals distinct and conserved tumor core and edge architectures that predict survival and targeted therapy response. Nature communications 14 (1), pp. 5029. Cited by: §1.
  • R. Bassiouni, M. O. Idowu, L. D. Gibbs, V. Robila, P. J. Grizzard, M. G. Webb, J. Song, A. Noriega, D. W. Craig, and J. D. Carpten (2023) Spatial transcriptomic analysis of a diverse patient cohort reveals a conserved architecture in triple-negative breast cancer. Cancer research 83 (1), pp. 34–48. Cited by: §1.
  • V. J. Berrocal, A. E. Gelfand, and D. M. Holland (2010) A spatio-temporal downscaler for output from numerical models. Journal of agricultural, biological, and environmental statistics 15 (2), pp. 176–197. Cited by: §1.
  • H. Cao, D. Zeng, and J. P. Fine (2015) Regression Analysis of Sparse Asynchronous Longitudinal Data. Journal of the Royal Statistical Society Series B: Statistical Methodology 77 (4), pp. 755–776 (en). External Links: ISSN 1369-7412, 1467-9868 Cited by: §1.
  • J. D. Carroll and J. Chang (1970) Analysis of Individual Differences in Multidimensional Scaling Via an N-way Generalization of “Eckart-Young” Decomposition. Psychometrika 35 (3), pp. 283–319 (en). External Links: ISSN 0033-3123, 1860-0980 Cited by: §3.
  • K. H. Chen, A. N. Boettiger, J. R. Moffitt, S. Wang, and X. Zhuang (2015) Spatially resolved, highly multiplexed rna profiling in single cells. Science 348 (6233), pp. aaa6090. Cited by: §1.
  • S. Chen, Y. Chang, L. Li, D. Acosta, Y. Li, Q. Guo, C. Wang, E. Turkes, C. Morrison, D. Julian, et al. (2022) Spatially resolved transcriptomics reveals genes associated with the vulnerability of middle temporal gyrus in alzheimer’s disease. Acta neuropathologica communications 10 (1), pp. 188. Cited by: §1.
  • W. Chen, A. Lu, K. Craessaerts, B. Pavie, C. Sala Frigerio, N. Corthout, X. Qian, J. Laláková, M. Kühnemund, I. Voytyuk, L. Wolfs, R. Mancuso, E. Salta, S. Balusu, A. Snellinx, S. Munck, A. Jurek, J. Fernandez Navarro, T. C. Saido, I. Huitinga, J. Lundeberg, M. Fiers, and B. De Strooper (2020) Spatial Transcriptomics and In Situ Sequencing to Study Alzheimer’s Disease. Cell 182 (4), pp. 976–991.e19 (en). External Links: ISSN 00928674, Link, Document Cited by: §1, §2.1, §5.
  • B. De Strooper and E. Karran (2016) The cellular phase of alzheimer’s disease. Cell 164 (4), pp. 603–615. Cited by: §1.
  • A. Deczkowska, H. Keren-Shaul, A. Weiner, M. Colonna, M. Schwartz, and I. Amit (2018) Disease-associated microglia: a universal immune sensor of neurodegeneration. Cell 173 (5), pp. 1073–1081. External Links: Document Cited by: §5.
  • B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani (2004) Least angle regression. Cited by: §3.1.
  • C. Escartin, E. Galea, A. Lakatos, J. P. O’Callaghan, A. Petzold, A. Serrano-Pozo, C. Steinhäuser, A. Volterra, G. Carmignoto, A. Agarwal, et al. (2021) Reactive astrocyte nomenclature, definitions, and future directions. Nature Neuroscience 24, pp. 312–325. External Links: Document Cited by: §5.
  • J. Fan (2018) Local polynomial modelling and its applications: monographs on statistics and applied probability 66. Routledge. Cited by: §1.
  • A. S. Fotheringham, W. Yang, and W. Kang (2017) Multiscale geographically weighted regression (mgwr). Annals of the American Association of Geographers 107 (6), pp. 1247–1265. Cited by: §1.
  • J. Friedman, T. Hastie, and R. Tibshirani (2010) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (1), pp. 1–22. External Links: Document Cited by: §3.1, 2.
  • C. A. Gotway and L. J. Young (2002) Combining incompatible spatial data. Journal of the American Statistical Association 97 (458), pp. 632–648. Cited by: §1.
  • H. Hampel, J. Hardy, K. Blennow, C. Chen, G. Perry, S. H. Kim, V. L. Villemagne, P. Aisen, M. Vendruscolo, T. Iwatsubo, et al. (2021) The amyloid-β\beta pathway in alzheimer’s disease. Molecular psychiatry 26 (10), pp. 5481–5503. Cited by: §1.
  • R. A. Harshman (1970) Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis. External Links: Link Cited by: §3.
  • S. Hong, V. F. Beja-Glasser, B. M. Nfonoyim, A. Frouin, S. Li, S. Ramakrishnan, K. M. Merry, Q. Shi, A. Rosenthal, B. A. Barres, et al. (2016) Complement and microglia mediate early synapse loss in alzheimer mouse models. Science 352 (6286), pp. 712–716. Cited by: §5, §6.
  • C. R. Jack Jr, J. S. Andrews, T. G. Beach, T. Buracchio, B. Dunn, A. Graf, O. Hansson, C. Ho, W. Jagust, E. McDade, et al. (2024) Revised criteria for diagnosis and staging of alzheimer’s disease: alzheimer’s association workgroup. Alzheimer’s & Dementia 20 (8), pp. 5143–5169. Cited by: §1.
  • C. R. Jack Jr, D. A. Bennett, K. Blennow, M. C. Carrillo, B. Dunn, S. B. Haeberlein, D. M. Holtzman, W. Jagust, F. Jessen, J. Karlawish, et al. (2018) NIA-aa research framework: toward a biological definition of alzheimer’s disease. Alzheimer’s & dementia 14 (4), pp. 535–562. Cited by: §1.
  • Y. Jin, Y. Zuo, G. Li, W. Liu, Y. Pan, T. Fan, X. Fu, X. Yao, and Y. Peng (2024) Advances in spatial transcriptomics and its applications in cancer research. Molecular Cancer 23 (1), pp. 129. Cited by: §1.
  • H. Keren-Shaul, A. Spinrad, A. Weiner, O. Matcovitch-Natan, R. Dvir-Szternfeld, T. K. Ulland, E. David, K. Baruch, D. Lara-Astaiso, B. Toth, et al. (2017) A unique microglia type associated with restricting development of alzheimer’s disease. Cell 169 (7), pp. 1276–1290.e17. External Links: Document Cited by: §5, §5, §6.
  • H. A. L. Kiers (2000) Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14 (3), pp. 105–122 (en). External Links: ISSN 0886-9383, 1099-128X, Link, Document Cited by: §3.
  • T. G. Kolda and B. W. Bader (2009) Tensor Decompositions and Applications. SIAM Review 51 (3), pp. 455–500 (en). External Links: ISSN 0036-1445, 1095-7200, Link, Document Cited by: §3.1.
  • S. Krasemann, C. Madore, R. Cialic, C. Baufeld, N. Calcagno, R. El Fatimy, L. Beckers, E. O’Loughlin, Y. Xu, Z. Fanek, et al. (2017) The trem2-apoe pathway drives the transcriptional phenotype of dysfunctional microglia in neurodegenerative diseases. Immunity 47 (4), pp. 566–581. External Links: Document Cited by: §5, §5, §6.
  • J. B. Kruskal (1989) Rank, decomposition, and uniqueness for 3-way and n-way arrays. In Multiway data analysis, pp. 7–18. Cited by: §3.1.
  • T. Li, T. Li, Z. Zhu, and H. Zhu (2022) Regression Analysis of Asynchronous Longitudinal Functional and Scalar Data. Journal of the American Statistical Association 117 (539), pp. 1228–1242 (en). External Links: ISSN 0162-1459, 1537-274X Cited by: §1.
  • T. Li, H. Zhu, T. Li, and H. Zhu (2023) Asynchronous Functional Linear Regression Models for Longitudinal Data in Reproducing Kernel Hilbert Space. Biometrics 79 (3), pp. 1880–1895 (en). External Links: ISSN 0006-341X, 1541-0420 Cited by: §1.
  • J. M. Long and D. M. Holtzman (2019) Alzheimer disease: an update on pathobiology and treatment strategies. Cell 179 (2), pp. 312–339. Cited by: §1.
  • A. Mallach, M. Zielonka, V. van Lieshout, Y. An, J. H. Khoo, M. Vanheusden, W. Chen, D. Moechars, I. L. Arancibia-Carcamo, M. Fiers, et al. (2024) Microglia-astrocyte crosstalk in the amyloid plaque niche of an alzheimer’s disease mouse model, as revealed by spatial transcriptomics. Cell Reports 43 (6). Cited by: §1.
  • K. R. Maynard, L. Collado-Torres, L. M. Weber, C. Uytingco, B. K. Barry, S. R. Williams, J. L. Catallini, M. N. Tran, Z. Besich, M. Tippani, et al. (2021) Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nature neuroscience 24 (3), pp. 425–436. Cited by: §1.
  • A. T. McKenzie, S. Moyon, M. Wang, I. Katsyv, W. Song, X. Zhou, E. B. Dammer, D. M. Duong, J. Aaker, Y. Zhao, et al. (2017) Multiscale network modeling of oligodendrocytes reveals molecular components of myelin dysregulation in alzheimer’s disease. Molecular neurodegeneration 12 (1), pp. 82. Cited by: §5.
  • E. Miyoshi, S. Morabito, C. M. Henningfield, S. Das, N. Rahimzadeh, S. K. Shabestari, N. Michael, N. Emerson, F. Reese, Z. Shi, et al. (2024) Spatial and single-nucleus transcriptomic analysis of genetic and sporadic forms of alzheimer’s disease. Nature Genetics 56 (12), pp. 2704–2717. Cited by: §1.
  • E. A. Nadaraya (1964) On estimating regression. Theory of Probability & Its Applications 9 (1), pp. 141–142. Cited by: §1.
  • S. E. Nasrabady, B. Rizvi, J. E. Goldman, and A. M. Brickman (2018) White matter changes in alzheimer’s disease: a focus on myelin and oligodendrocytes. Acta neuropathologica communications 6 (1), pp. 22. Cited by: §5.
  • Z. Ni, A. Prasad, S. Chen, R. B. Halberg, L. M. Arkin, B. A. Drolet, M. A. Newton, and C. Kendziorski (2022) SpotClean adjusts for spot swapping in spatial transcriptomics data. Nature Communications 13 (1), pp. 2971. Cited by: §1.
  • T. M. Oshan, Z. Li, W. Kang, L. J. Wolf, and A. S. Fotheringham (2019) Mgwr: a python implementation of multiscale geographically weighted regression for investigating process spatial heterogeneity and scale. ISPRS International Journal of Geo-Information 8 (6), pp. 269. Cited by: §1.
  • P. J. Ribeiro Jr and P. J. Diggle (2025) GeoR: analysis of geostatistical data. Note: R package version 1.9-6 External Links: Document, Link Cited by: §4.1.
  • V. Satopaa, J. Albrecht, D. Irwin, and B. Raghavan (2011) Finding a" kneedle" in a haystack: detecting knee points in system behavior. In 2011 31st international conference on distributed computing systems workshops, pp. 166–171. Cited by: §3.1.
  • S. Shah, E. Lubeck, W. Zhou, and L. Cai (2016) In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92 (2), pp. 342–357. Cited by: §1.
  • P. L. Ståhl, F. Salmén, S. Vickovic, A. Lundmark, J. F. Navarro, J. Magnusson, S. Giacomello, M. Asp, J. O. Westholm, M. Huss, et al. (2016) Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353 (6294), pp. 78–82. Cited by: §1.
  • W. W. Sun, J. Lu, H. Liu, and G. Cheng (2017) Provable Sparse Tensor Decomposition. Journal of the Royal Statistical Society Series B: Statistical Methodology 79 (3), pp. 899–916 (en). External Links: ISSN 1369-7412, 1467-9868 Cited by: §3.1.
  • A. A. Szpiro and C. J. Paciorek (2013) Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics 24 (8), pp. 501–517. Cited by: §1.
  • B. Todorich, J. M. Pasquini, C. I. Garcia, P. M. Paez, and J. R. Connor (2009) Oligodendrocytes and myelination: the role of iron. Glia 57 (5), pp. 467–478. External Links: Document Cited by: §5.
  • H. Tsai, J. Niu, R. Munji, D. Davalos, J. Chang, H. Zhang, A. C. Tien, C. J. Kuo, J. R. Chan, R. Daneman, and S. P. J. Fancy (2016) Oligodendrocyte precursors migrate along vasculature in the developing nervous system. Science 351 (6271), pp. 379–384. External Links: Document Cited by: §5.
  • L. R. Tucker (1966) Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika 31 (3), pp. 279–311 (en). External Links: ISSN 0033-3123, 1860-0980 Cited by: §3.
  • I. D. Vainchtein, G. Chin, F. S. Cho, K. W. Kelley, J. G. Miller, E. C. Chien, S. A. Liddelow, P. T. Nguyen, H. Nakao-Inoue, L. C. Dorman, et al. (2018) Astrocyte-derived interleukin-33 promotes microglial synapse engulfment and neural circuit development. Science 359 (6381), pp. 1269–1273. External Links: Document Cited by: §5.
  • S. Vickovic, G. Eraslan, F. Salmén, J. Klughammer, L. Stenbeck, D. Schapiro, T. Äijö, R. Bonneau, L. Bergenstråhle, J. F. Navarro, et al. (2019) High-definition spatial transcriptomics for in situ tissue profiling. Nature methods 16 (10), pp. 987–990. Cited by: §1.
  • X. Wang, W. E. Allen, M. A. Wright, E. L. Sylwestrak, N. Samusik, S. Vesuna, K. Evans, C. Liu, C. Ramakrishnan, J. Liu, et al. (2018) Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361 (6400), pp. eaat5691. Cited by: §1.
  • Z. Xun, X. Ding, Y. Zhang, B. Zhang, S. Lai, D. Zou, J. Zheng, G. Chen, B. Su, L. Han, et al. (2023) Reconstruction of the tumor spatial microenvironment along the malignant-boundary-nonmalignant axis. Nature Communications 14 (1), pp. 933. Cited by: §1.
  • H. Zeng, J. Huang, H. Zhou, W. J. Meilandt, B. Dejanovic, Y. Zhou, C. J. Bohlen, S. Lee, J. Ren, A. Liu, Z. Tang, H. Sheng, J. Liu, M. Sheng, and X. Wang (2023) Integrative in situ mapping of single-cell transcriptional states and tissue histopathology in a mouse model of Alzheimer’s disease. Nature Neuroscience (en). External Links: ISSN 1097-6256, 1546-1726 Cited by: §1, §1, §2.1, §2.1, §2.2, §2.2, §5, §5, §6.
  • J. Zhu, L. Shang, and X. Zhou (2023) SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics. Genome biology 24 (1), pp. 39. Cited by: §4.