Shrew - A Prototype For Subversion Analysis
Shrew - A Prototype For Subversion Analysis
Abstract
With the growth of the World Wide Web, version control systems have be-
come an essential component in collaborative software development. One
such version control system that has found generous adoption in recent
years is Subversion, a centralized system that was designed explicitly to
match the requirements of the open-source community. Equally, special-
ized web based tools have emerged to browse and inspect version control
systems such as Subversion and have proven themselves to be valuable
instruments for the developers of software projects. As projects become
larger and more complex however, these tools have often reached their lim-
itations on the level of introspecting they can provide. To solve this prob-
lem we present Shrew, an approach to analyze Subversion repositories that
builds upon a specialized meta-model and makes use of the Moose object-
orientated reengineering environment to facilitate information extraction
and that presents its results with a convenient web interface.
1. Introduction
The history of web based version control systems can be traced back to 1996
when Bill Fenner released a script for browsing CVS repositories called CVSweb
1
[CVSweb, 2007]. Since then, CVSweb has come a long way and inspired the
application to be ported to Python which resulted in the well known ViewCVS
project [ViewVC, 2007]. Today, ViewCVS has been extended to support the
Subversion version control system as well and has been renamed to ViewVC.
Further products such as Chora [Chora, 2007], FishEye [FisheEye, 2007] and
Trac [Trac, 2007] have been developed to support version control systems un-
supported by ViewVC or to provide functionality that the system does not cover.
Despite of the existing abundance of such web based tools, few exist that facil-
itate more complex historic analysis that could be used for reverse and reengi-
neering purposes while adhering to a straightforward web based interface.
Shrew is intended as a suggestion how this gap can be filled. For this purpose,
Shrew makes use of the Smalltalk programming language, the reengineering
environment Moose [Ducasse et al., 2000] and of the Seaside web application
framework [Seaside, 2007] to analyze and browse Subversion repositories.
Subversion was chosen as the preferred version control system for Shrew as
it has a wide and growing acceptance in the development community, partic-
ularly in open-source circles. It has a more modern design than and does not
suffer from the inherent flaws of CVS. Finally, Subversion has a less complex
architecture than distributed version control systems such as darcs [darcs,
2007] or Git [git, 2007].
2. Repository Access
Before any analysis can take place, it is essential that we can access the data
stored in a Subversion repository. Subversion provides multiple mechanisms
to access remote repositories of which the most popular uses WebDAV/DeltaV
which is an extension of HTTP. Two other possibilities are svnserve – which
uses a custom protocol – and direct file access if the Subversion server and
client are on the same host and permissions are set accordingly. To make
2
Presentation
Data Analysis
Repository Access
SVN
matters more complicated, WebDAV data can be tunneled via SSL, svnserve
over SSH and both may provide authentication mechanisms to restrict access.
With these considerations being made, there are a total of five solutions to
implementing a client in VisualWorks Smalltalk that were considered: (1) using
Subversion’s SWIG bindings, (2) importing the Subversion API using DLLCC,
3
(3) writing a C wrapper around the API, (4) implementing the protocols natively
and (5) calling the Subversion console client as a separate process.
I present these solutions in the order of the “elegance” of the solution. In prin-
ciple, the items are sorted by an estimate of their maintainability and efficiency
after having put a decent amount of thought into them.
Subversion already provides the interface definition file so SWIG would provide
an optimal solution for implementing a client. Unfortunately, SWIG does not
provide a language definition for any Smalltalk dialects natively and a related
project [Upright, 2007] attempting to provide such a definition is still quite
experimental and has not shown any progress in close to a year at the time of
this writing. Writing such a language definition, although not difficult, would
require an amount of time that was not available during the period of the
project.
DLLCC VisualWorks Smalltalk comes with a suite of tools named DLL and
C Connect (DLLCC) that can be used to generate and use interfaces between
Smalltalk and C. DLLCC can automatically parse the header files of a C appli-
cation and construct the necessary bind code in Smalltalk or the developer can
create the binding code manually. Unfortunately, DLLCC was unable to parse
the header files of Subversion correctly and a manual implementation would
have been a tedious process. Additionally, the primitive data types would have
needed to be converted to higher level objects adding more complexity to this
solution.
4
C wrapper A third solution is to write a wrapper around Subversion as a C
library and only exposing the interface as needed by Shrew. We could then
use DLLCC to construct the binding code automatically or even consider a
manual implementation as the header files would be a lot smaller than for the
entire Subversion code base. However, the issue with the data type conversion
persists, rendering this an unfeasible solution as well.
Console client The last proposal for a client solution would be to call the
Subversion console client, passing any options along the command line and
parsing the text output that is returned. Obviously this comes with an ex-
pensive overhead: every time information is retrieved from a remote location
we start a new process, fulfill a TCP handshake, possibly an SSL handshake,
exchange Subversion data including authentication, tear down the connection
and finally return a string result along the command line. If the result is in
XML format, we will additionally fire up an XML parser to interpret the data.
Needless to say, this last solution is far from elegant, efficient or maintainable
but it permits for the quick prototype implementation that was needed to be
able to continue the development of Shrew.
5
Subversion client commands and not all exhibit the same behavior as their
console counterpart.
Listing 1 shows an example how the client is used to check out a project.
The revision: part of the message is optional if the user wants to checkout the
newest version.
root := SVN
checkout : ’https :// example .org/svn/repos ’
revision : ( Revision number : 12)
The SVN class as used above is not actually the client implementation but
acts as a wrapper. It overrides #doesNotUnderstand: on the class side to dele-
gate messages to the concrete implementation of the Subversion client. Which
client this is, is specified in SVN class>>#defaultClient.
Since the client is incomplete, most users will not be interested in the client as
such, but will only use it in the context of the data analysis as will be presented
in the next session. The data analysis layer relies solely on a handful of the
client functions such as info, list and log. Furthermore, it uses a caching
wrapper around the default (console based) client to avoid retrieving the same
information from the server more than once. This implementation is called
CachedConsoleClient and its implementation is rather straightforward.
3. Data Analsis
Oncee the data from a Subversion repository can be accessed within Smalltalk,
the next step is to analyze this data. This is the responsibility of the model as
discussed below.
6
3.1. Model import
Behind the scenes, Moose delegates the responsibility of creating the meta-
model to ProjectBuilder which defines four public methods: buildProject:from:to:,
updateProject:to:, buildProject: and updateProject:. The latter two are conve-
nience methods which build all possible revisions or update to the newest
revision respectively. The ability to provide a range of revisions allows the user
to analyze only a subset of an entire Subversion repository.
The build methods expect a string or a URL pointing to the repository location
and return an instance of ProjectHistory that is the base of the meta-model (see
Figure 2 on the following page). To later update the project to a newer revision,
the instance is simply passed to updateProject:.
The design of the meta-model builds on top of Hismo [Gı̂rba, 2005], an ap-
proach that models history as a first class entity. A history is an ordered set of
versions where history and version are generic concepts that can be applied to
a file or a directory for example. This scheme is shown in Figure 2 on the next
page.
Every file and every directory will have a multitude of versions throughout the
development of a software project. Collectively, the versions constitute the
history of the respective file or directory.
7
1 1..* 1 1
FileHistory FileVersion FileSnapshot
* *
1 1..* 1 1
DirectoryHistory DirectoryVersion DirectorySnapshot
* *
1 1 1 1
1 1..*
ProjectHistory ProjectVersion
1 1.*
*
1
Author
Now that we have entities for concepts such as a FileHistory and a ProjectVer-
sion we can map relationships between them. As mentioned above, Histories
consist of their respective Versions but there is also navigation between ver-
sions and between histories. Whenever a user commits to a repository, he or
she commits changes to files and directories that are part of this repository.
This means that a number of file and directory versions are correlated with a
specific project version. Parallel to this, the history of a repository will consist
of a number of file and directory histories.
8
bility from the file and directory versions. A snapshot is associated with exactly
one version.
Some examples of measurements are the age of a history – that is the number
of versions it contains – or the deviation of a property P over time. Such a
property P could be the number of lines of a file or the number of modified files
in a commit, i.e. in a project version.
(
1, α = αf0
ownαf0 :=
0, else
lfn
(
sf − lfn sfn , α = αfn
ownαfn := ownαfn−1 · n +
sfn 0, else
9
of this method fails in she current version of Shrew because Subversion does
not provide information about the number of line changes in its log. If, as we
had previously discussed, the implementation of the client did not rely on the
console client but called the Subversion API directly, this information could be
easily obtained and such a measurement would be feasible.
Now the meta-model as implemented and described in section 3.2 does not
currently allow for such copies and moves to be modeled properly. Instead, if
a file is copied or renamed a new history is created and no information about
the relationship to the old history is stored.
This calls for an extension of our current meta-model which should not, how-
ever, break the current meta-model. Rather, the current meta-model should
be able to coexist as a degraded interpretation of this extended meta-model.
10
A‘
1 2 3 4 5 6 7 8 9 10 11
The diagram displays three files A, B and A0 . File A0 was created in revision 7
and is a copy of file A as it was during revision 4. As file A0 remains unchanged
we can interpret that it was meant as a tag of file A during revision 4. File B
was deleted in revision 6 and later resurrected in revision 8.
Now the edges displayed in the diagram are the proposed extension to the
meta-model. They represent what we call a history relationship and encapsu-
late a reference to the version copied and the first version of the new history.
Histories are extended that they have an optional reference to such a history
relationship.
Using these history relationships now permits to model moves and copies as
discussed above. Optionally, all histories connected by history relationships
could be collected within a co-history entity for analysis.
4. Presentation
The uppermost layer of Shrew is the presentation layer. It has the responsibil-
ity of using the data as provided by the data analysis layer and displaying it in
a manner that is accessible to the end user.
In Shrew, the presentation layer is twofold. First of all, the Moose reengineering
11
Figure 4: Shrew web frontend
Shrew provides a web based interface that is built on top of the Seaside ap-
plication framework [Seaside, 2007] which is considered the main interface to
Shrew. A screenshot of this interface in action can be seen in figure 4.
Project details shows information on the entire project, that is the ProjectHis-
tory it is associated with. This can include such information as the most
active authors, the last recent changes and the age of the project.
12
It provides navigation to the author that committed the change, to the
files modified displays the commit message if the author specified any.
Browsing allows the user of the web interface to navigate through the files
and folders of a particular version in a tree like manner. It also provides
navigation to reach a previous or later version of a particular file and is
able to display file contents and other detailed information.
With these views, Shrew allows the user to navigate efficiently through the his-
tory entities of the meta-model but retains an approach that resembles strongly
that of systems such as ViewVC and others.
5. Conclusion
1. The Subversion client is more than inefficient at the moment. Every time
the client retrieves information from a repository, a system process is ini-
tialized that connects to the remote location and retrieves the datum from
the server. The connection is then torn down and the process stopped,
requiring the client to repeat this process and causing quite an expensive
overhead.
The author of this thesis strongly suggests that a new implementation
would make use of the SWIG interface as provided by Subversion and
discussed in section 2.1. This would require writing a language definition
for SWIG but the final result would be more maintainable, more versatile
13
and less prone to changes in the Subversion protocol as say an implemen-
tation of the client using the WebDAV protocol. Furthermore, the SWIG
language definition could be used for creating entirely different interfaces
to other C or C++ based applications.
2. The data analysis currently does not make use of all the information avail-
able from a Subversion repository. In particular, the meta-model does not
understand file moves and copies and start a new history whenever a file
or a directory is renamed.
All in all, these momentary limitations are encouraging to continue the work
on Shrew where convincing enough to keep the word “prototype” in the title of
this text.
A. SeasideSCGComponents
These elements can be separated into two distinct groups: views and panes.
Views define the page layout of a website and come in one, two or three column
versions. They provide some convenience methods for displaying various infor-
mation such as a location (bread-crumb trail) and a title. Intended to be used
with these views are the panes that provide a generic page component. Panes
come with a title and a content. Specialized types of panes display tabular data
as the content or images.
Developers wishing to implement their own panes can write a new class that
inherits from Seaside.AbstractPane and override #renderContentOn:. The method
14
takes one argument which is the WARenderCanvas that Seaside uses to render
the HTML components.
If the pane only needs to display simple content, it may be more convenient to
use the predefined class Seaside.SimplePane as shown in listing 2.
exampleSimplePane
ˆ SimplePane new
title : ’Example SimplePane ’;
contents : [ :html |
html paragraph with: [
html strong : ’This is an example SimplePane !
’.
html text: ’It really is , really simple .’
].
]
In the above listing, a block with one argument is passed to contents: which be-
haves in an equal manner as overriding renderContentsOn: as described above.
Alternatively, contents: also accepts a Seaside WAComponent or anything else
that Seaside can directly render such as a String.
exampleTabularPane
ˆ TabularPane new
parent : self;
title : ’Numbers ’;
rows: #(1 2 3 4 5 6);
columns : ( OrderedCollection new
add: ( TableColumn new valueBlock : # yourself ;
title : ’original ’; clickBlock : [ : number |
self inform : ’You clicked on ’, number
printString , ’!’ ]);
add: ( TableColumn new valueBlock : # factorial ;
title : ’factorial ’);
add: ( TableColumn new valueBlock : [ : number |
15
number even ifTrue : [ ’even ’ ] ifFalse : [ ’
odd ’ ] ]; title : ’even/odd ’);
yourself
);
yourself
The important methods in this example are rows: and columns:. The first takes
an ordered collection of the data that is supposed to be displayed. In the above
example we are going to display a table with some integers and some informa-
tion about them. The second method columns: takes an ordered collection of
TableColumn instances which define what information is displayed and the be-
havior of the table columns. The column title is defined with title:, the value for
each row is calculated using valueBlock: which can either be a symbol which
will be performed as a message on each row or a block with one argument.
Furthermore, clickBlock: receives a block that causes the value for that row to
be displayed as a link and calls the block when it is clicked.
There are two more methods that are not used in the example above. sortBlock:
overrides the default sorting mechanism for the particular column and format-
Block: alters the way the value is displayed. The interested reader should take
a look at how these are initialized in TableColum>>#initialize to understand
how to use them.
References
[darcs, 2007] darcs, a free, open source source code management system.,
2007. http://darcs.net/.
16
[Ducasse et al., 2000] Stéphane Ducasse, Michele Lanza, and Sander
Tichelaar. Moose: an Extensible Language-Independent Environment for
Reengineering Object-Oriented Systems. In Proceedings of CoSET ’00 (2nd
International Symposium on Constructing Software Engineering Tools), June
2000.
[FisheEye, 2007] FishEye, analyze, search, share and monitor CVS and Sub-
version repositories, 2007. http://www.cenqua.com/fisheye/.
[SWIG, 2007] SWIG, a software development tool that connects programs writ-
ten in C and C++ with a variety of high-level programming languages., 2007.
http://www.swig.org/.
[Trac, 2007] Trac, an enhanced wiki and issue tracking system for software
development projects., 2007. http://trac.edgewall.org/.
[Upright, 2007] Ian Upright. SWIG language bindings for smalltalk, 2007.
http://commonsmalltalk.wikispaces.com/SWIG.
17