A Type-Theoretic Reconstruction of The Visitor Pattern: Peter Buchlovsky
A Type-Theoretic Reconstruction of The Visitor Pattern: Peter Buchlovsky
Hayo Thielecke 2
School of Computer Science
University of Birmingham
Birmingham B15 2TT, United Kingdom
Abstract
In object-oriented languages, the Visitor pattern can be used to traverse tree-like
data structures: a visitor object contains some operations, and the data structure
objects allow themselves to be traversed by accepting visitors. In the polymorphic
lambda calculus (System F), tree-like data structures can be encoded as polymorphic
higher-order functions. In this paper, we reconstruct the Visitor pattern from the
polymorphic encoding by way of generics in Java. We sketch how the quantified
types in the polymorphic encoding can guide reasoning about visitors in general.
Key words: Visitor pattern, polymorphic types, object-oriented
programming, Generic Java
Introduction
Tree-like data structures, such as abstract syntax trees or binary trees, and
their traversal (often called tree walking) are ubiquitous in programming.
Modern functional languages, such as ML and Haskell, provide constructs
in the form of datatype definitions and pattern matching to deal with trees
and more general recursive datatypes. However, in object-oriented languages
such as Java, the situation is more complicated. Testing class membership
and branching on it is widely considered a violation of object-oriented style.
1
2
Email: [email protected]
Email: [email protected]
This paper is electronically published in
Electronic Notes in Theoretical Computer Science
URL: www.elsevier.nl/locate/entcs
Instead, the Visitor pattern [6] has been proposed to define operations on
tree-like inductive datatypes.
The simplest example using visitors is that of a sum type of the form A+B.
Since Java lacks a sum type, the Visitor pattern implements a class doing the
job of a sum type by a sort of double-negation transform. Concretely, the class
has a method for accepting a visitor; the visitor itself has two methods, one
accepting arguments of type A, the other of type B. If we take a very idealized
view by taking methods as functions and objects as tuples of such functions,
we can regard the above as an instance of a well-known isomorphism in the
polymorphic -calculus [8]:
A+B
= .((A ) (B ))
These isomorphisms are firmly grounded in programming language theory.
Via the Curry-Howard correspondence, they also form part of a bigger picture
in terms of the definability of logical connectives such as disjunction in higherorder logics.
The present paper aims to flesh out this type-theoretic view of visitors.
Specifically, a Design Pattern (such as the aforementioned Visitor pattern),
by its very nature, is not so much a single unambiguous definition, but a
variety of related instances. Hence we aim to clarify how variants of the Visitor
pattern relate to the more idealized type-theoretic picture. We classify visitors
as internal or external, depending on whether the visitor itself or the tree
specifies the traversal. Moreover these can be either functional (by returning
a value) or imperative (having a void return type and side-effects instead).
The resemblance of variants of the Visitor pattern to the encoding of algebraic types into System F may be part of the type-theoretic folklore. However,
formulating this precisely seems to be very useful, given the possibility of
some technology transfer from the highly developed theory of polymorphic
lambda calculi to less rigorous, but widely used design patterns. Our point is
not to translate object-oriented languages into functional ones or conversely.
Rather, we can see that the same notion has a manifestation in both of these
very different scenarios.
The contributions of this paper include the following:
We present a stylized polymorphic -calculus to make the relation of polymorphic encodings to visitors perspicuous.
We show the type soundness of the resulting visitors in Featherweight Generic Java [9].
We sketch how this abstract view of visitors could be useful for reasoning
about visitors.
For completeness, an appendix (Section A) gives more details on Featherweight Generic Java. These details are not essential for understanding the
paper.
2
Background
We briefly recall the two areas of relevant background between which we aim
to bridge: visitors as presented in the Design Patterns literature [6], and
polymorphic lambda calculi [7].
We find it useful to give some object-oriented terminology:
Interface A fully abstract class. Defines a set of methods but does not give
their bodies. This corresponds to an ML signature.
Class A class may implement an interface by providing a body for each
method header in the interface. This corresponds to an ML structure.
We will consider Generic Java [3] (Java extended with type parameters on
classes, interfaces and methods) throughout. Briefly, the syntax is as follows.
An interface or class definition of the form class C<>{...} defines a class
C parameterized by the type variable . The type C<Int> instantiates the
type parameter of class C to Int. A method definition of the form <> T
m(...){...} defines a method m in which the type variable is universally
quantified. A call to method m of the form o.m<Int>(...) instantiates to
Int.
The purpose of the Visitor pattern is to organize a program around the operations on a datatype as opposed to the constructors. The canonical example
of the Visitor pattern consists of abstract syntax trees and their traversal by
various phases of a compiler; this approach is used in SableCC [5]. We will
consider a simpler example based on binary trees of integer leaves and the
operation of summing up the leaves.
The standard object-oriented implementation of binary trees is based around
the Composite pattern [6]. The datatype signature (sometimes called element) is represented as an interface and the constructor for each variant of
the datatype becomes a class (sometimes called concrete element) which
implements the interface. The interface includes method headers that specify
the signatures of all operations on the data. This forces each constructor class
to provide a method to handle the appropriate case of the operation.
This approach permits new datatype variants to be added without modifying existing code, something that is not possible in ML. The disadvantage
is that adding new operations is difficult as every existing class has to be
amended. The Visitor pattern turns the situation around. It becomes easy to
add new operations but difficult to add new variants.
The Visitor pattern is used as follows. Every operation on the datatype
is packaged into a concrete visitor class. The case for handling each variant
is contained in a method typically named visitCons where Cons is the name
of the constructor class for that variant. Every concrete visitor implements a
visitor interface. This specifies the types of visit methods that must be present
in a concrete visitor. It can be seen as a signature for concrete visitors.
The constructor classes of the datatype are modified to include an accept
3
method. Its role is to accept a visitor and call the visit method for the variant
which the class implements. This is essentially a form of double dispatch on
the datatype variant and the visit method. Any components of the variant
stored in fields inside the class must be passed to the visit method. It is also
necessary to parameterize the accept methods and visitor interface since we
cannot know in advance the type yielded by any concrete visitor class.
To summarize, the visitor pattern consists of the following classes:
Visitor This is an interface for visitors. It declares visit methods named
visitCons for each Cons class.
ConcreteVisitor One class for each operation on the data. It implements
the Visitor interface and has to provide implementations for each of the
visit methods declared there.
Data/Element This is an interface naming a datatype (e.g. BinTree). It
declares an accept method which takes a Visitor object as an argument.
ConcreteData/ConcreteElement A Cons class for each variant of the
datatype (e.g. Leaf). This corresponds to an ML datatype constructor
named Cons. It implements the accept method that calls visitCons in the
Visitor object.
The Visitor pattern does not prescribe where a visitor should store intermediate results or how it should return the result to the caller. We will distinguish
between functional visitors which return intermediate results through the result of the call to accept and imperative visitors which accumulate results in
some field inside the visitor.
Another aspect of the Visitor pattern is the choice of traversal strategies
for composite objects. We could put the traversal code in the datatype. To do
this we ensure that the accept method is called recursively on any component
objects and passes the results to the visitor in the call to visit. Alternatively,
we could put the traversal code in the visitor itself. We will refer to these as
internal and external visitors respectively, by analogy with internal and
external iterators [6].
We will use the polymorphic lambda calculus (System F) to encode data
types. In fact, we need a more powerful extension of System F with polymorphic type constructors, called F , since it allows us to approximate generics and interfaces better than we could with System F alone. Most features
of this system will not be new to anyone familiar with advanced languages
like Haskell or ML: intuitively, F contains polymorphic functions and also
polymorphic type constructors.
See Figure 1 for a fairly standard presentation of F extended with finite
products. We write
/ to mean that is not among the free variables
in . We abbreviate ::.T as .T and similarly for ::.T . Empty
products are written as 1. We will also write ti for the projection i t and
ha, bi:A B .t for x:A B .[a 7 1 x, b 7 2 x] t.
4
Terms
t, s ::= x | x:T .t | t t | ::K .t | t [T ] | hti ii1..n | i t
Kinds
K ::= | K K
Types
T ::= | T T | ::K .T | ::K .T | T [T ] |
i1..n
Ti
Contexts
::= | , x : T | , :: K
Typing
(Var) x : T
`x:T
` T1 :: , x : T1 ` t : T2
` t : T1 T2 ` s : T1
(App)
(Abs)
` t s : T2
` x:T1 .t : T1 T2
, :: K ` t : T
/
` ::K .t : ::K .T
` t : ::K .T1 ` T2 :: K
(TApp)
` t [T2 ] : [ 7 T2 ] T1
Q
` t : i1..n Ti
i 1..n ` ti : Ti
Q
(Tuple)
(Proj)
` hti ii1..n : i1..n Ti
` j t : Tj
Kinding
(TAbs)
(TVar) :: K
` :: K
, :: K1 ` T :: K2
(KAbs)
/
` ::K1 .T :: K1 K2
` T1 :: K1 K2 ` T2 :: K1
` T1 [T2 ] :: K2
, :: K ` T ::
` T1 :: ` T2 ::
(KArrow)
(KAll)
/
` T1 T2 ::
` ::K .T ::
(KApp)
(KTuple)
i 1..n ` Ti ::
Q
` i1..n Ti ::
Reductions
(x:T .t) s ;
(::K .t)[T ] ;
j hti ii1..n
[x 7 s] t
[ 7 T ] t
tj
[ 7 T2 ] T1
Internal visitors
and we further restrict attention to Fi that are products of the recursive type
variable and type constants. Thus the types defined this way are various
forms of trees, and we will see how visitors are tree walkers.
Definition 3.1 We define internal visitors to be pairs of the form hA, hai ii1..n i,
where A is a type (called the result type), and hai ii1..n is a tuple of functions
ai : Fi [A] A (called the visit methods).
P
In the presence of sums, we could define F [X] = i1..n Fi [X]. Visitors
are essentially F -algebras, and their visit methods give the structure map.
Next, we consider the encoding of algebraic types (that is to say initial
F -algebras), but rephrased in terms of weakly initial visitors.
We define an object T as follows:
Y
T = .( (Fi [] ))
i1..n
j1..n
visiti []
{z
Visitor[]
More intuitively, the elements of T can be thought of as trees; and they use
the visit methods to collapse themselves recursively into a single element of
A. This is achieved by letting the visitor visit any subtrees, thereby collapsing
them into elements of the result type, and then calling the appropriate visit
methods for the topmost node.
accept visitor
walk over subtrees
z
{
z
}|
{
Y}|
consi = x:Fi [T ]. .v: (Fj [] ). vi (Fi [t:T .t[]v]x)
|{z}
| {z }
j1..n
call
visit
method
constructor arguments
Visitor morphisms that witness the initiality of T can be seen as calling
accept on a T object and passing it a concrete visitor:
concrete visitor
z }| {
b
a = t:T . t[A] hai ii1..n
|{z}
call accept
3.2
External visitors
An external visitor consists of a pair hA, hvi ii1..n i, where A is a type and
hvi ii1..n is a tuple of functions vi : Fi [T ] A. Note the T in the position
where an internal visitor would have another occurrence of A. Intuitively, an
external visitor has visit methods just like an internal one; the difference is
that these methods may accept trees of type T as arguments, rather than
automatically collapsing the trees into elements of result type A.
We define a structure S that accepts external visitors in the same way that
T accepts internal ones:
Y
S = .( (Fi [T ] ))
i1..n
j1..n
The visitor structure on S induces a function from the weakly initial visitor T .
b
p:T S
Intuitively, this map takes a tree and pattern matches it, so that it can be
visited by an external visitor.
External visitors can themselves use this map for further pattern matching
of subtrees and thus traversal. However, without adding recursion, an external
visitor can not traverse trees of arbitrary depth. Since the traversal of the
whole tree is no longer built-in the way it was in internal visitors, an external
visitor would have to recur under its own steam, as it were, which requires a
fixpoint combinator in the visitor.
7
The overall view on visitors in this section is as follows. Our aim is to bring
out the connection between the Visitor pattern and type encodings in System
F by factoring various translations. We present an encoding OJK of algebraic
types into a polymorphic -calculus that is stylistically close to object-oriented
languages. The calculus is a restricted subset of F (with products), and
the encoding corresponds to the classical encoding FJK of algebraic types
in System F (up to some reductions and type isomorphisms). On the other
hand, because our calculus resembles object-oriented languages, there is a
straightforward embedding JK into Featherweight Generic Java (which is itself
a subset of Generic Java); and this lets us recover the internal variant of the
Visitor pattern (by composition).
Alg. Types
qqq
Visitor
qqq
OJK
qq
xqqq JK
FGJ o
?_
F JK
/ F
_
/ F
oo
GJ o
We need a calculus for expressing visitors in such a way that we can then
transform them into both System F and FGJ. To do so, we will approximate objects as tuples of methods (of a restricted function type). Another
ingredient is a form of parameterized let on types that we will use for approximating interfaces with generics. We define:
interface instantiation
| int
integer type
where , TyVar and int is a constant type. Note that this grammar
restricts all type variables to the kinds or . We also insist that all type
variables are bound and that all bound variables are distinct.
8
accept
z
}|
{
}|
{
z
Y
T = let <> be
( Fi [] ) in let be .[] in
| {z }
i1..n
visiti []
= ( :: .(.)[.[] ])[.
(Fi [] )]
i1..n
; ( :: ..[] )[.
(Fi [] )]
i1..n
; .(.
(Fi [] ))[]
i1..n
; .(
(Fi [] ))
i1..n
But we can also recover visitors in Generic Java. To do so, we define a translation.
Definition 4.3 The translation JK from types of oo
to a sequence of FGJ
interface definitions as follows: (To render this more compactly we slightly
abuse EBNF notation. Repeated occurrences on the LHS correspond to those
on the RHS.)
~ = (interface <> {JOK})+
J(let <> be O in)+ [S]K
Q
J i1..n Mi K = (JMi K)i1..n
Q
~ JSK m((JSj K xj )j1..m );
J ~ .( j1..m Sj ) SK = <>
~ = <JSK>
~
J[S]K
JintK = Int
interface <> {
( visiti (Int ~x, ~y);)i1..n
}
interface {
<> accept(<> v);
}
: ZB
leaf(n)
node
: (B B) B
The visitors in Figure 2 were purely functional and relied on generics. A more
typical rendition of an internal visitor using internal state instead is given in
Figure 3 (it is debatable whether visitNode() should be omitted, since there
12
interface Visitor {
void visitLeaf(int n);
void visitNode();
}
interface BinTree {
void accept(Visitor v);
}
class Leaf implements BinTree {
int n;
Leaf(int n) { this.n = n; }
public void accept(Visitor v) {
v.visitLeaf(n);
}
}
class Node implements BinTree {
BinTree left; BinTree right;
Node(BinTree left, BinTree right) {
this.left = left;
this.right = right;
}
public void accept(Visitor v) {
left.accept(v);
right.accept(v);
}
}
class SumVisitor implements Visitor {
int s;
SumVisitor(int s) { this.s = s; }
public void visitLeaf(int n) { s = s + n; }
public void visitNode() { }
}
Fig. 3. An imperative internal visitor in Java (without generics)
:ZZ
addZ = n:Z.s:Z.s + n : Z Z Z
Assuming parametricity [16], we have that for all relations R,
ht, ti ((Z R) ((R R) R)) R
We define a relation R : Z (Z Z) by hn, f i R iff f (x) = n + x for all x.
Then we have:
hidZ , addZ i Z R
h+, Z i (R R) R
The latter of these holds because if hhx, yi, hf, gii RR, then hx+y, f giR.
Since t maps related arguments to related results, we have
ht[Z]hidZ , +i, t[Z Z]haddZ , Z ii R
Hence, by the definition of R, t[Z Z]haddZ , Z i 0 = t[Z]hidZ , +i, as required.
14
Note that the proof made use of 0 being the neutral element of addition,
and of the associativity of addition in establishing the relation R between f g
and x + y:
(f g)(z) = f (g(z)) = f (z + y) = (z + y) + x = z + (x + y)
The above argument of relating a functional and an imperative visitor
by associativity is applicable to more substantial cases as well. Consider the
standard example of abstract syntax trees, and suppose we need to traverse the
tree to add information into a symbol table. The most evident specification in
terms of a synthesized attribute would be essentially functional, merging the
symbol table of the subtrees at the inner nodes. An imperative visitor could
instead start off with an empty symbol table and add entries by updating
a mutable symbol table during traversal. Showing the equivalence of the
functional and the imperative version should be analogous to the parametricity
argument above, with the empty symbol table being the neutral element, and
merging of symbol tables as the associative operation.
Conclusions
Idealized view
Visitor interface
Concrete visitor
Visit method
Witnesses initiality
Concrete data
Given by consi
Acknowledgement
We thank Alan Mycroft and the anonymous referees for their comments.
References
[1] Gavin Bierman, Matthew Parkinson, and Andrew Pitts. MJ: an imperative
core calculus for Java and Java with effects. Technical Report 563, University
of Cambridge Computer Laboratory, 2003.
[2] C. Bohm and C. Berarducci. Automatic synthesis of typed lambda-programs
on term algebras. Theoretical Computer Science, 39(2/3):135154, 1985.
[3] Gilad Bracha, Martin Odersky, David Stoutamire, and Philip Wadler.
Making the future safe for the past: Adding genericity to the Java
programming language. In Proceedings of the 13th ACM Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA98),
Vancouver, British Columbia, 1822 October 1998, pages 183200. ACM Press,
New York, 1998.
[4] Matthias Felleisen and Daniel P. Friedman. A Little Java, A Few Patterns.
MIT Press, Cambridge, Massachusetts, 1998.
[5] Etienne M. Gagnon and Laurie J. Hendren. SableCC, an object-oriented
compiler framework. In Proceedings of the Conference on Technology of ObjectOriented Languages and Systems, Santa Barbara, California, 37 August 1998,
pages 140154. IEEE Computer Society, Washington DC, 1998.
[6] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design
Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley,
Boston, Massachusetts, 1995.
16
FGJ program into a Generic Java program it is necessary to add the keyword
public before method implementations and to elide type instantiations on
method calls.)
Our description of FGJ includes interfaces which were not present in the
original definition. The extension is limited since it only permits single inheritance in interface hierarchies and a class is permitted to implement at most
one interface. There are also some restrictions on type parameter bounds.
Since we are not interested in casting we will omit that from our account.
The syntax and typing of FGJ with interfaces are shown in Figures A.1
and A.2. The computation rules are shown in Figure A.3. We omit some
auxiliary rules due to lack of space. We abbreviate the keywords extends as
/, implements as and return as . The metavariables , , range over
type variables; T, U, V range over types; N and O range over class types; and Q
ranges over interface types.
Some abbreviations are also necessary for various sequences:
~f = f0 , . . . ,fn
~M = M0 . . . Mn
~C ~f = C1 f1 , . . . , Cn fn
~C ~f; = C1 f1 ; . . . ; Cn fn ;
this.~f=~f; = this.f1 =f1 ; . . . ; this.fn =fn ;
The empty sequence is written as and concatenation is denoted with a
comma.
Unparameterized classes C<> and methods m<> can be abbreviated to C and
m. As in Java, unbounded parameters are assumed to have a bound of Object.
We also abbreviate C / Object to C. A class does not have to implement an
interface, in which case we can omit Q from its definition. Unlike the class
hierarchy, the interface hierarchy has no root so we also allow / Q to be omitted
from interface definitions. We will omit empty calls to super() and empty
constructors.
The class table CT is a mapping from class names to class declarations.
The extended calculus also has an interface table IT which plays a similar role
with respect to interfaces. The authors of FGJ define some sanity conditions
on class tables and we assume that both CT and IT obey them. We assume
the existence of the special variable this, which may not be used as the name
of a field or method parameter. A program in FGJ with interfaces is a triple
(CT, IT, e) of a class table, an interface table and an expression.
18
Syntax
CL ::= class C<~
/ ~N> / N Q {~T ~f; K ~M}
T ::= | | N | Q
N ::= C<~T>
H ::= <~
/ ~N> T m (~T ~x);
Q ::= I<~T>
M ::= <~
/ ~N> T m (~T ~x) { e;}
Subtyping
` T <: T
` S <: T
` T <: U
` S <: U
S-Refl
S-Poly
` <: ()
S-Trans
S-Sub
S-Extend
S-Impl
Well-formed types
WF-Obj
` Object ok
` Int ok
WF-Int
` ok
dom()
` ok
WF-Empty
WF-Poly
` ~T <: [~
7 ~T] ~N
WF-Interface
` I<~T> ok
CT (C) = class C<~
/ ~N> / N Q {...}
` ~T ok
` ~T <: [~
7 ~T] ~N
WF-Class
` C<~T> ok
Expression typing
; ` x (x)
T-Var
; ` e0 T0
fields(bound (T0 )) = ~T ~f
; ` e0 .fi Ti
T-Field
; ` e0 T0
~ / ~O>~U U
mtype(m, bound (T0 )) = <
` ~V ok
; ` ~e ~S
` N ok
~ 7 ~V] ~O
` ~V <: [
; ` ~e ~S
~ 7 ~V] ~U
` ~S <: [
~ 7 ~V] U
; ` e0 .m<~V>(~e) [
fields(N) = ~T ~f
` ~S <: ~T
; ` new N(~e) N
T-Invk
Fig. A.1. FGJ sans casting extended with interfaces: Main definitions
19
T-New
Method typing
~ <: ~O
=
~ <: ~N,
` ~T ok
` T ok
` ~O ok
` S <: T
` T ok
` ~O ok
T-Mhead
Class typing
~ <: ~N ` ~N ok
~ <: ~N ` N ok
~ <: ~N ` ~T ok
methods(N) = ~M1
fields(N) = ~U ~g
~ <: ~N ` Q ok
T-Class
Interface typing
~ <: ~N ` ~N ok
~ <: ~N ` Q ok
~H OK IN I<~
/ ~N>
interface I<~
/ ~N> / Q {~H} OK
T-Interface
Fig. A.2. FGJ sans casting extended with interfaces: Main definitions continued
Computation
fields(N) = ~T ~f
(new N(~e)).fi ; ei
Comp-Field
mbody(m<~V>, N) = (~x, e0 )
(new N(~e)).m<~V>(~d) ; [~x 7 ~d, this 7 new N(~e)] e0
Comp-Invk
Fig. A.3. FGJ sans casting extended with interfaces: Computation rules
20