Generic XML Structure Version 1 2
Generic XML Structure Version 1 2
Version: 1.2
Version history: v 0.0 February 2012 - EvH, TP, LvW, ML - University of Antwerp
v 1.0 March 2012 - discussion with AW, JF - Lund University
v 1.1 May 2012 - discussion Cost conference in Poitiers
v 1.2 July 2012 - discussion Sig Writing conference in Porto
This document provides a description of a generic XML log format for logging human
computer interaction related to writing. It has been initiated by the developers of Scriptlog
and Inputlog. In an initial stage both research groups have been discussing the best way to
guarantee an optimal exchange of logging files between both programs. This resulted in a
decision to develop an XML log standard that enables researchers to exchange log files
between the two programs, utilizing the complementarity of the analyses that both
programs offer. A further development of the XML log format to a more generic format,
should create a standard for logging process information for a wider range of writing
devices (e.g., digital pens, tablets, mobile devices).
The main aim of the project is to develop a standardization of the logging of digital writing
processes. A generic XML structure (a) simplifies the interchangeability of research data, (b)
the description of writing logging data, and (c) further establishes process logging as a
research method.
Firstly, this format simplifies the interchangeability of research data collected with
different logging tools. This enables researchers to exchange process data of various
logging tools for further analysis. For instance, a logging session recorded by Scriptlog can
therefore be analyzed via Inputlog. One of the main advantages is that the complementary
functions of the logging tools are better exploited: you can run an Inputlog revision
analysis on Scriptlog data, and you can run a word-in-context analyses on Inputlog data
via the analyses module of Scriptlog. In other words, we strive to make data of the main
logging tools interchangeable. At this moment Scriptlog, Inputlog, EyeWrite, Eye&Pen and
HandSpy are involved in the project.
Secondly, the generic XML structure describes process data in a uniform and unambiguous
way. We create a common perspective for the description of writing processes using
different input devices (e.g. keyboard, speech, touchpad, digital pen) defined by a few
generic primitives.
The first part of the report describes the different components and how they are connected.
Then an overview of this framework is presented in a scheme. The second part describes
how the logged elements are represented in a XML notation.
1. Header
The header contains of:
Meta data pertaining to different items not specifically related to a logging session,
such as the version of the program used to log, the version of the program used to read
and analyze the log, the processor used, a warning concerning the expected timing
accuracy, and possibly others.
Session data uniquely related to this log file, such as the identification of the person,
the language used in the interaction, a timestamp.
The header has one essential condition: a timestamp that indicates the start of the logging
process.
Users are free to add other elements as required by their specific logging program. As an
example: Inputlog needs to know the language used in the text production to perform
linguistic analysis. The format used is: a meta or session element name followed by a
series of key/value pairs that describe the feature.
In the case of merging two or more log files, the session data (or a selection thereof) will be
used to calculate a hash tag allowing to identify in the new merged log file the original file
for every event. In the context of input logging the hash tag is a short piece of machine
readable metadata added to an event and pointing to a bundle of session facts that are
sufficient to trace back the original log file if necessary.
2. Footer
The footer contains of:
Information that is only available after a session. E.g. the product statistics of a writing
job (number of words, sentences, etc.), the average processor load over the session.
Pointers to specific modules that may be used in an automated analysis process such
as different processing steps that need to be executed without further human
intervention.
A footer has no prerequisites. Researchers are free to add the elements required by their
logging program in the format: module name followed by a series of key/value pairs that
describe the feature.
3. Logged events
An 'event' is the top level entity that holds all the information concerning action(s) and its
related parts. Although in most cases one event records one action, it will be possible to
have many-action events. This happens when the subject is performing simultaneous tasks
(e.g. typing and eye tracking) or when a particular analysis task demands the merging of
some logging files on event level.
The central idea behind the structure of event logging is the distinction between:
action
properties of an action
output
It is up to the specific logging tool to define what the minimal (atomic) action is. For a key
logger this may be a mouse click or a key press, for an eye tracking device it may be the gaze
on a screen, for a tablet the pressure of a stylus on the touch screen.
The proposed action taxonomy has four primitives and an open number of input types
derived from these primitives.
Primitives:
Click: a click is every action that can be described as a point in time and in space.
Move: a move is every action with a certain duration in time and with one or multiple
contiguous paths in space.
Placeholder: a placeholder is an empty action that either is a point or a duration in
time/space and provides contextual information about the passing of time of subsequent
events. It may be ignored in the analysis.
Other: special constraints related to other sensory information such as sound, smell or
taste may justify using an additional action primitive.
Click, Move, and Other are abstract expressions for the real action instruments. These action
instruments are defined as input types. The placeholder as such is not an instrument but it can
be typed and provided with properties to supply additional information on the flow of events.
Input types:
Tightly linked to a specific technology, it is up to the researcher to provide the input types
relevant for his/her software configuration. A placeholder action can be a work break or a
session break. The placeholder action can also appear as the result of a post processing
procedure. For instance the researcher may decide to ignore different focus changes to and
from external sources and replace them with a placeholder.
Examples:
Primitive Input type - Use case Configuration - Use case
click touch (tap) touch pad on a laptop
click key (press) keyboard of a desktop
click mouse (click) desktop
move swipe (X) touch screen of a smartphone
move mouse (drag & drop) laptop
move mouse (scroll) desktop
move stylus (write) touch screen of a tablet
placeholder merging fusion point of two ifdxs
placeholder work break user takes a break
other sound instruction to the user
Examples:
Primitive Input type- Use case Parameters - Use case
click touch (tap) pressure
click key (press) position, replay, doc length
click mouse (click) button left
move swipe (X)
move mouse (drag & drop) pause threshold
move mouse scroll orientation, delta
move stylus (write) orientation, pressure
placeholder session break url to original idfx
placeholder work break
other sound wav-file
B. Properties of an action
For an action to be recordable, certain hardware and software provisions need to be present.
The program needs to log position and timing to record the moments in space and time of an
event. A third element to record is the state of certain components. The state is used to give a
description of the behavior of a component in response to explicit events. For instance, when
logging a writing process on a standard computer a keyboard event might have the shift key
locked (VK_SHIFTLOCK) which in combination with the 'f' key (VK_F) produces the capital
letter 'F'. With knowledge of the state of the shift key (state: key down) the program knows
that the next letter will be a capital. Note: the key up of this action can be derived from the
timing properties.
A single point in space is defined by its X/Y coordinates; an instant in time by its start and end
moments. A series of consecutive X/Y points on a space is sequence (a path), and a duration is
the time between start and end. The duration is not recorded in the log but is calculated in the
analysis as the difference between start and end. A placeholder may contain position and/or
timing information.
Examples:
Position Timing
Primitive Type Multiplicity Sequence X Y Start End State Properties
click Mouse click No No √ √ √ √ key Button
shift
down
move Multi touch Yes* Yes √ √ √ √
swipe
click Eye track Yes** Yes √ √ √ √
gaze
placeholder Transition No Yes √ √ √ √
* Five-finger gesture on an iPad
** Each eye has its own X/Y coordinates
C. Output
The output of an action is every recordable effect of an action.
Outputs are typed and have a source. The output value is the actual result of an action as it is
presented onto some form of hardware.
Examples:
Device Output Source Value
type
keyboard text keyboard "b"
touch screen text virtual "v"
keyboard
smartphone text speech "This is a sentence I have spoken and that is
transcribed"
D. Properties of an output
Depending on the specific device used in the logging process and depending on the
recording conditions, certain properties can be used to further describe the output. An
example of such a property is the 'text_area'. This allows the logging program to define
complex documents with multiple input regions. Another property could be 'style', describing
the formatting of a text.
Examples:
Device Output Source Value Property Property value
type
smartphone text speech "This is a sentence I sound C:/some_path/
have spoken and that file your_sentence.wav
is transcribed"
keyboard text keyboard "This is pasted text" text_area #2
keyboard text keyboard "This is pasted text" position 42
keyboard text keyboard "This is pasted text" style bold
The nature of the selections is signaled by a change of focus that is recorded and typed. The
target of the focus change describes the new content in view, e.g. a selection of text with a
mouse action. Additional properties may complement the focus description such as the start
and the end position of the selected text.
An insertion is a keyboard action, such as the pressing of CTRL+V that copies text onto the
screen. This is expressed by having an output event of the type 'text', where the pasted text is
the value of the output. The position is logged as a property of the output element.
A deletion of a selection could be the pressing of a backspace or delete key after having
selected some text. This would be expressed by having a focus change of the type 'text',
where the selection is the value of the focus change together with the beginning and end
positions of the selection as properties. The output of this event would then be an output
element with an empty value, describing that there has been output, but that the output is
empty.
A replacement is very much like a deletion, the difference being that the output in this case
would not be empty but would be a single character or a string of characters.