激光点云配准算法——Cofinet / GeoTransformer / MAC
GeoTransformer + MAC是当前最SOTA的点云匹配算法,在之前我用总结过视觉特征匹配的相关算法
视觉SLAM总结——SuperPoint / SuperGlue
本篇博客对Cofinet、GeoTransformer、MAC三篇论文进行简单总结
1. Cofinet
Cofinet发表于2021年ICCV,原文为《CoFiNet: Reliable Coarse-to-fine Correspondences
for Robust Point Cloud Registration》,对这篇文章进行总结是因为它可以算作GeoTransformer的前身,其首次提出Coarse-To-Fine的点云匹配框架
Cofinet算法框架如下图所示:
算法主要又两部分组成,Correspondence Proposal Block和Correspondence Refinement Block
1.1 Correspondence Proposal Block
Point Encoding:对于输入的点云 P X ∈ R n × 3 , P Y ∈ R m × 3 P_X \in R^{n \times 3}, P_Y \in R^{m \times 3} PX∈Rn×3,PY∈Rm×3,使用KPConv进行特征提取,KPConv的细节在下文介绍,输出经过下采样的SuperPoint P X ′ ∈ R n ′ × 3 , P Y ′ ∈ R r n ′ × 3 P_X^{\prime} \in R^{n^{\prime} \times 3}, P_Y^{\prime} \in R^{r n^{\prime} \times 3} PX′∈Rn′×3,PY′∈Rrn′×3及其特征 F X ′ ∈ R n ′ × b , F Y ′ ∈ R m ′ × b F_X^{\prime} \in R^{n^{\prime} \times b}, F_Y^{\prime} \in R^{m^{\prime} \times b} FX′∈Rn′×b,FY′∈Rm′×b, 其中 b = 256 b=256 b=256: P X → P ′ X , F ′ X P Y → P ′ Y , F ′ Y \begin{aligned} & P_X \rightarrow P^{\prime}{ }_X, F^{\prime}{ }_X \\ & P_Y \rightarrow P^{\prime}{ }_Y, F^{\prime}{ }_Y \end{aligned} PX→P′X,F′XPY→P′Y,F′Y每个经过下采样得到SuperPoint表征了原输入点云一个小Patch上的所有信息
Attentional Feature Aggregation:对于SuperPoint P X ′ ∈ R n ′ × 3 , P Y ′ ∈ R r n ′ × 3 P_X^{\prime} \in R^{n^{\prime} \times 3}, P_Y^{\prime} \in R^{r n^{\prime} \times 3} PX′∈Rn′×3,PY′∈Rrn′×3及其特征 F X ′ ∈ R n ′ × b , F Y ′ ∈ R m ′ × b F_X^{\prime} \in R^{n^{\prime} \times b}, F_Y^{\prime} \in R^{m^{\prime} \times b} FX′∈Rn′×b,FY′∈Rm′×b进行Self-Attention和Cross-Attention操作,Self-Attention用于扩大感受野,Cross-Attention用于信息交互: F ′ X → F ~ ′ X F ′ Y → F ~ ′ Y \begin{aligned} & F^{\prime}{ }_X \rightarrow \tilde{F}^{\prime}{ }_X \\ & F^{\prime}{ }_Y \rightarrow \tilde{F}^{\prime}{ }_Y \end{aligned} F′X→F~′XF′Y→F~′Y
Correspondence Proposal:将 F ~ X ′ , F ~ Y ′ \tilde{F}_X^{\prime}, \tilde{F}_Y^{\prime} F~X′,F~Y′使用Sinkhorn算法构建Confidence Matrix,在训练阶段选用128对真值匹配点构建GT Confidence Matrix对Sinkhorn算法输出的Confidence Matrix进行监督,数目是固定的。在测试阶段Confidence大于0.2的匹配作为Coarse-Level的匹配结果,如果数目小于200则将阈值调整到0.01,输出的数目是不固定的。最终输出SuperPoint的Correspondence集合 C ′ = { ( P ′ X ( i ′ ) , P Y ′ ( j ′ ) ) } . C^{\prime}=\left\{\left(P^{\prime}{ }_X\left(i^{\prime}\right), P_Y^{\prime}\left(j^{\prime}\right)\right)\right\} . C′={ (P′X(i′),PY′(j′))}.
其中Attention部分和Optimal Transport部分和SuperGlue中采用的算法基本一致,在此不再赘述,感兴趣的同学可以参考视觉SLAM总结——SuperPoint / SuperGlue
1.2 Correspondence Refinement Block
Node Decoding:Decoder部分使用 F ~ X ′ , F ~ Y ′ \tilde{F}_X^{\prime}, \tilde{F}_Y^{\prime} F~X′,F~Y′作为输入,同样使用KPConv进行维度恢复,最终输出Point Level的特征 F X ∈ R n × c , F Y ∈ R m × c F_X \in R^{n \times c}, F_Y \in R^{m \times c} FX∈Rn×c,FY∈Rm×c,其中 c = 32 c=32 c=32
Point-To-Node Grouping:这部分的目的是将SuperPoint的Correspondence扩展为Point Level Correspondence,基于Point Level Correspondence再进一步求解位姿。这里使用的KNN建立SuperPoint和Point的关联,经过这个步骤后,每个SuperPoint P X ′ ( i ′ ) P_X^{\prime}\left(i^{\prime}\right) PX′(i′)会被分配一定数量的Point,这些Point构成了一个Patch G i ′ G_{i^{\prime}} Gi′,每个Patch的点的数量如果超过64个就会进行截断。 G i ′ P = { p ∈ P X ∣ ∥ p − P ′ X ( i ′ ) ∥ ≤ ∥ p − P ′ X ( j ′ ) ∥ , ∀ j ′ ≠ i ′ } G_{i^{\prime}}^P=\left\{p \in P_X \mid\left\|p-P^{\prime}{ }_X\left(i^{\prime}\right)\right\| \leq\left\|p-P^{\prime}{ }_X\left(j^{\prime}\right)\right\|, \forall j^{\prime} \neq i^{\prime}\right\} Gi′P={ p∈PX∣∥p−P′X(i′)∥≤∥p−P′X(j′)∥,∀j′=i′} G i ′ F = { f ∈ F X ∣ f ↔ pwithp ∈ G i ′ P } G_{i^{\prime}}^F=\left\{f \in F_X \mid f \leftrightarrow \text { pwithp } \in G_{i^{\prime}}^P\right\} Gi′F={ f∈FX∣f↔ pwithp ∈Gi′P}通过上述操作之后Patch和Patch之间在欧式空间和特征空间会分别构成集合: C P = { ( G i ′ P , G j ′ P ) } C_P=\left\{\left(G_{i^{\prime}}^P, G_{j^{\prime}}^P\right)\right\} CP={ (Gi′P,Gj′P)} C F = { ( G i ′ F , G j ′ F ) } C_F=\left\{\left(G_{i^{\prime}}^F, G_{j^{\prime}}^F\right)\right\} CF={ (Gi′F,Gj′F)}
Density-Adaptive Matching:接着对每一个Patch进行Point Level的Correspondence提取,Point Level级别无法直接使用Sinkhorn算法,原因是每个Patch中的存在的点的数量是不一致的,当两个点数不一致的Patch构建Similarity Matrix时点数不足的位置使用 − ∞ -\infty −∞进行填充,然后再使用Sinkhorn算法就可以消除点数不一致给模型带来的影响。
在获得Point Level的Correspondence后,仍然使用RANSAC方法进行旋转平移求解。
1.3 Loss
Coarse Scale损失函数如下: L c = − ∑ i ′ , j ′ W ′ ( i ′ , j ′ ) log ( S ′ ( i ′ , j ′ ) ) ∑ i ′ , j ′ W ′ ( i ′ , j ′ ) . \mathcal{L}_c=\frac{-\sum_{i^{\prime}, j^{\prime}} \mathbf{W}^{\prime}\left(i^{\prime}, j^{\prime}\right) \log \left(\mathbf{S}^{\prime}\left(i^{\prime}, j^{\prime}\right)\right)}{\sum_{i^{\prime}, j^{\prime}} \mathbf{W}^{\prime}\left(i^{\prime}, j^{\prime}\right)} . Lc=∑i′,j′W′(i′,j′)−∑i′,j′W′(i′,j′)log(S′(i′,j′)).其中 log ( S ′ ( i ′ , j ′ ) ) \log \left(\mathbf{S}^{\prime}\left(i^{\prime}, j^{\prime}\right)\right) log(S′(i′,j′))为Sinkhorn生成的Confidence Matrix和Ground Truth的Confidence Matrix的交叉熵损失, W ′ ( i ′ , j ′ ) \mathbf{W}^{\prime}\left(i^{\prime}, j^{\prime}\right) W′(i′,j′)为加权系数,定义如下: W ′ ( i ′ , j ′ ) = { min ( r ( i ′ , j ′ ) , r ( j ′ , i ′ ) ) , i ′ ≤ n ′ ∧ j ′ ≤ m ′ , 1 − r ( i ′ ) , i ′ ≤ n ′ ∧ j ′ = m ′ + 1 , 1 − r ( j ′ ) , i ′ = n ′ + 1 ∧ j ′ ≤ m ′ , 0 , otherwise. \mathbf{W}^{\prime}\left(i^{\prime}, j^{\prime}\right)= \begin{cases}\min \left(r\left(i^{\prime}, j^{\prime}\right), r\left(j^{\prime}, i^{\prime}\right)\right), & i^{\prime} \leq n^{\prime} \wedge j^{\prime} \leq m^{\prime}, \\ 1-r\left(i^{\prime}\right), & i^{\prime} \leq n^{\prime} \wedge j^{\prime}=m^{\prime}+1, \\ 1-r\left(j^{\prime}\right), & i^{\prime}=n^{\prime}+1 \wedge j^{\prime} \leq m^{\prime}, \\ 0, & \text { otherwise. }\end{cases} W′(i′,j′)=⎩ ⎨ ⎧min(r(i′,j′),r(j′,i′)),1−r(i′),1−r(j′),0,i′≤n′∧j′≤m′,i′≤n′∧j′=m′+1,i′=n′+1∧j′≤m′, otherwise. 其中 r ( i ′ ) r\left(i^{\prime}\right) r(i′)为单个Patch中Overlap点所占比例,定义如下: r ( i ′ ) = ∣ { p ∈ G i ′ P ∣ ∃ q ∈ P Y s.t. ∥ T ‾ Y X ( p ) − q ∥ < τ p } ∣ ∣ G i ′ P ∣ , r\left(i^{\prime}\right)=\frac{\mid\left\{\mathbf{p} \in \mathbf{G}_{i^{\prime}}^{\mathbf{P}} \mid \exists \mathbf{q} \in \mathbf{P}_{\mathbf{Y}} \text { s.t. }\left\|\overline{\mathbf{T}}_{\mathbf{Y}}^{\mathbf{X}}(\mathbf{p})-\mathbf{q}\right\|<\tau_p\right\} \mid}{\left|\mathbf{G}_{i^{\prime}}^{\mathbf{P}}\right|}, r(i′)= Gi′P ∣{ p∈Gi′P∣∃q∈PY s.t. TYX(p)−q <τp}∣, r ( i ′ , j ′ ) r\left(i^{\prime}, j^{\prime}\right) r(i′,j′)为两个Patch相互Overlap点所占比例,定义如下: r ( i ′ , j ′ ) = ∣ { p ∈ G i ′ P ∣ ∃ q ∈ G j ′ P s.t. ∥ T ‾ Y X ( p ) − q ∥ < τ p } ∣ ∣ G i ′ P ∣ r\left(i^{\prime}, j^{\prime}\right)=\frac{\mid\left\{\mathbf{p} \in \mathbf{G}_{i^{\prime}}^{\mathbf{P}} \mid \exists \mathbf{q} \in \mathbf{G}_{j^{\prime}}^{\mathbf{P}} \text { s.t. }\left\|\overline{\mathbf{T}}_{\mathbf{Y}}^{\mathbf{X}}(\mathbf{p})-\mathbf{q}\right\|<\tau_p\right\} \mid}{\left|\mathbf{G}_{i^{\prime}}^{\mathbf{P}}\right|} r(i′,j′)= Gi′P ∣{ p∈Gi′P∣∃q∈Gj′P s.t. TYX(p)−q <τp}∣这里其实很好理解,当Patch中被覆盖的点的占比越高,说明这个Patch被匹配的可能性越大,权重也就应该越高。
Finer Scale的损失函数如下: L f = − ∑ l , i , j B ~ ( l ) ( i , j ) log ( S ~ ( l ) ( i , j ) ) ∑ l , i , j B ~ ( l ) ( i , j ) \mathcal{L}_f=\frac{-\sum_{l, i, j} \widetilde{\mathbf{B}}^{(l)}(i, j) \log \left(\widetilde{\mathbf{S}}^{(l)}(i, j)\right)}{\sum_{l, i, j} \widetilde{\mathbf{B}}^{(l)}(i, j)} Lf=∑l,i,</