On determining itemtype-subtype() for recursive types

There were questions as to whether type checking using a lenient recursive type alias like I proposed in my previous post would terminate. I’ve been doing some research - so if like me you struggle to know the correct academic term to use to search for information, here is a handy guide:

Equirecursive types: The kind of type recursion we’re talking about for XSLT 3.0. More formally, a type system where A is considered the same type as B, given:

<xsl:type-alias name="A" as="map(xs:string, A)"/>
<xsl:type-alias name="B" as="map(xs:string, map(xs:string, B))"/>

In other words, a recursive type is considered equal to it’s unrolling by one level.

Isorecursive types: Recursive types that provide new type annotations. An unrolled isorecursive type is not considered equal to the original type. These are apparently considered simpler to reason about than equirecursive types.

Unification: An algorithm that allows you to check equality of two types (well, directed graphs actually). If implemented correctly, it handles graphs with cycles (recursive types), as well as graphs with variables (parametric types). Incidently the same algorithm an be used for both pattern matching and type inference.

My source material for the algorithm comes from the “Dragon Book”, section 6.5.5. The unification algorithm for ItemType equality would look something like this:

1) Express the types as directed graphs with type aliases expanded (I’ve numbered the nodes so I can refer to them later):

<!-- Slightly different defintion of B this time -->
<xsl:type-alias name="A" as="map(xs:string, A)"/>
<xsl:type-alias name="B" as="map(xs:string, map(xs:NMTOKEN, B))"/>

A:         map[1]-+
            /\    |
           /  ----+
xs:string[2]

B:         map[3]-----+
            /\        |
           /  \       |
xs:string[4]   map[5] |
               /\     |
              /  -----+
  xs:NMTOKEN[6]

2) The basic idea is to start with two nodes from the graphs, M and N, that you want to assert are equal. The algorithm keeps track of sets of nodes that must be equal for this assertion to be true, called equivalence classes. Equivalence classes are represented by a single node picked from that set.

3) find(n) returns the node representing the equivalence class for the node n. Each node starts off in it’s own equivalence class, and these classes get merged successively by the union(m,n) procedure.

4) The algorithm proceeds:

boolean unify(M, N) {
   S = find(M), T = find(N);
   if(S = T) return true;
   if(S and T have the same node label and the same number of children) {
     union(S, T);
     return every child s in S, child t in T satisfies unify(s, t)
   }
   return false;
}

5) The algorithm terminates because the number of equivalence classes is reduced by 1 for each recursion. Finally, we can extend the notion of unification to use equivalence classes based on itemtype-subtype() rules. We only need to make sure that every node in the equivalence class is a subtype of the representative node. By doing so, our unification algorithm verifies a subtype relationship rather than an equality relationship.

A working of the itemtype-subtype() algorithm for the types I named A and B above:

i) Equivalence classes: (map[1]), (xs:string[2]), (map[3]), (xs:string[4]), (map[5]), (xs:NMTOKEN[6])

unify(map[1], map[3])
   - S = map[1], T = map[3]
   - S and T are matching non-leaf nodes
   - union(S, T)
   - recursively check the children

ii) Equivalence classes: (map[1], map[3]), (xs:string[2]), (xs:string[4]), (map[5]), (xs:NMTOKEN[6])

unify(xs:string[2], xs:string[4])
   - S = xs:string[2], T = xs:string[4]
   - S is a subtype of T
   - union(S, T)
   - return true (no children to check)

iii) Equivalence classes: (map[1], map[3]), (xs:string[2], xs:string[4]), (map[5]), (xs:NMTOKEN[6])

unify(map[1], map[5])
   - S = map[1], T = map[5]
   - S and T are matching non-leaf nodes
   - union(S, T)
   - recursively check the children

iv) Equivalence classes: (map[1], map[3], map[5]), (xs:string[2], xs:string[4]), (xs:NMTOKEN[6])

unify(xs:string[2], xs:NMTOKEN[6])
   - S = xs:string[2], T = xs:NMTOKEN[6]
   - S is a subtype of T
   - union(S, T)
   - return true (no children to check)

v) Equivalence classes: (map[1], map[3], map[5]), (xs:string[2], xs:string[4], xs:NMTOKEN[6])

unify(map[1], map[3])
   - S = map[1], T = map[1]
   - S = T
   - return true

And we successfully determine that the recursive ItemType A is a subtype of the recursive ItemType B.