VisualDesignMethodsforVR MikeAlger
VisualDesignMethodsforVR MikeAlger
Mike Alger
MA Moving Image
September 2015
Abstract: This paper presents some pre-visualization design methods for volumetric user
interfaces and experiences within the larger scope of a virtual reality operating system.
!2
Initial Manifesto:
(11:05)
https://vimeo.com/116101132
Final Summary:
(17:47)
https://vimeo.com/141330081
!3
Table of Contents
4 Introduction
4 Personal Motivation
6 Primary Question
7 Research Process
9 Context
15 Opportunity for the Workplace
17 Theory
19 Input
21 Input UI
26 Content
27 Environment
29 Icons
31 Buttons
36 Content Zones
51 Practical Application
53 Use Case: Animation Prototyping
59 Use Case: Zone and Environment Prototyping
61 Adding Depth to Monoscopic Photo Spheres
66 Putting these Concepts Together for a VR OS Design
75 Prototype and Evaluation
76 Whats Next?
76 Conclusion
79 References
90 Appendix: Avatars
92 Avatar Creation Methods
92 3D Modelling
93 Photogrammetry
95 3D Scanning
97 Hybrid Solutions
!4
Introduction
Personal Motivation
sand island with a single building. Godzilla emerges from the water and joins
several characters, mostly from pop culture, either standing on the island or ying
on their own clouds. Most of them are laughing and commenting on how ridiculous
the oversized Iron Giant looks perched cross-legged on his own oating yellow poof.
Their accents are Dutch, French, British, and American. As a group, weve come here
through a portal from one world and will depart through another when we get
bored.
the shoot function is replaced by talk. The application is called VR Chat and each
week a group of people will log on to see the latest avatars and tour the latest
worlds created by each other (Gaylor and Joudrey, 2015). There are other social VR
still in its infancy, only regaining popularity in recent years (Nelson, 2013), the
and many have their own projects. Gunter, for example, leads the tours and hosts a
avatars and worlds heavily themed with the cross section of biblical and digital
concepts (Tom23, 2015). Ggodin develops an application that allows people to use
their Windows interface in another environment (Godin, 2015). Jesse and Graham
are the creators of VR Chat itself (Gaylor and Joudrey, 2015). But what do I do?
production and graphic design. The marriage of these mediums is motion graphics
websites, I couldnt help but feel that I was producing content that would soon
become irrelevant, lost forever in the ether with the myriads of preceding lms and
more signicant and lasting impact on mankind. It would seem that thing could be
virtual reality.
like participating in the internet before web designer was a position. The things
you do and techniques you use to do them for virtual reality can become the
standards that people later think of as commonplace. Although a fairly weak and
medium with such potential makes me feel like Im contributing to something that
will have a lasting impact, for worse and better. Deciding to work with the human
perception of the moving image as it relates to virtual reality led me to ask how can
I contribute?
!6
Primary Question
It would seem that a large portion of virtual reality content being produced
currently is intended purely for consumption (Bye, 2015). Very few virtual reality
experiences allow the user to create something with it. I initially thought it was
strange that developers would write code, pull the headset down over their eyes to
test it, take it o to adjust their work, put it back on to test again it seemed that
this all should be done with the headset on. If it is such a powerful and versatile
Even as the concept is suggested, the worms start to exit the can. Creating a
game, for example, may require 3D modeling, image editing, and coding. While
created that are designed for these tasks in the volume of virtual reality. A user
cant even see their keyboard to type. Theyre now moving their body in ways that
werent the case with the previous mouse and keyboard system. It would seem
initially that everything we know about digital interaction design is thrown out the
window as the majority of our current interfaces are perceived as the two
One of the interesting caveats of this question is the prospect of design. It proposes
the creation of workow - to design a process for design. In order to create this new
Research Process
Virtual reality is currently a very fast evolving topic, seeming to change by the week.
approach. While traditional methods of reading books and papers is necessary and
discussions online and read the problems and ndings of developers as they
experience and document them. These sources are, of course, typically anecdotal
and purely qualitative without controlled studies or quantitative rigor. There remain
many guidelines that are generally agreed upon by the VR community without
particular proof from a study, but are apparent in practice. This is one of the
reasons that my research process has also included trying out as many experiences
as possible for myself, allowing me to form my own opinions about the validity and
testing them as well as showing them to other people has been another part of my
research process. Understanding what other people are thinking and feeling about
VR is helpful for the user experience design process for obvious reasons. Another
part of this has been my participation in meetups, hackathons, and game jams -
physical gatherings where groups discuss and create content. I have also been
giving talks presenting my own ndings along the way at such events
source of information for me has been to ask questions and interview experts via
applications has also aorded me the opportunity to converse with the spectrum of
!8
these research practices, many of my resources come from recent years and
information age.
virtual realitys current state and potential, I have found myself pursuing and
self, for which I present a few thoughts and methods for creation of personalized
avatars in the Appendix. Within the original question, however, I found that I would
rst need to create methods and guidelines for VR design. I found that I needed to
start with ergonomically responsible zones for content mixed with design
workows modied from existing mediums. It is this process that this manuscript
intends to describe.
!9
Context
When attempting to understand and develop relevant ideas for subjects, its
context. With regard to virtual reality, one of the basic things to consider is the way
that our senses serve as the input our brain uses to construct an understanding of
the world around us. Sight, hearing, touch, smell, and taste are the most widely
accepted set of external stimulus that the human body perceives (Sense, 2015).
These senses and our reactions to them are the result of millennia of natural
selection (Darwin, 1859)* and there are several consequences of this built into our
instinct. This is all relatively common knowledge and seems like it may not need to
be reiterated here, but the important thing is to state that we, as humans, have
certain predictable outputs based on certain sets of inputs. Essentially, its instinct.
Human nature. Certain behaviors are hard-wired into us from our ancestors as
something without having to be told (Intuitive, 2015). Interfaces are praised for
being intuitive because people will know how to use them without any time or
human instinct. A planned exhibition may use light, motion, sound, and space to
draw a persons attention through an area. A well designed website will similarly
use color, distance, and typography to clearly communicate a purpose and often
persuade some sort of action. Contrasting elements of sight like light, color, and
motion naturally draw our attention because they were necessary for our own
The same applies for contrast in sound, touch, smell, and taste. The process of
design is often the creation of methods to coax an automatic response from end
interesting vignette in the enormous subject of visual design. The GUI evolved as a
solution to understand the data and processes taking place in a computers system.
interface is created. Its medium allows input and output that the computer
understands, but also that a human can understand, though typically with some
amount of training. The rst graphical user interface to use the now-common
desktop concept was created by Xerox PARC and set the precedent for desktop
metaphors that are still used today (Thacker et al., 1979; Koved and Selker, 1999).
The main 2D analogy was a desktop with pieces of paper sitting on top of each
other and consisted of the now common elements of windows, icons, menus, and
pointer (Preece et al. 1994; Hinckley 2002). This same structure has proven useful in
personal computing for decades now, based on rectangle sections of content within
a rectangle screen. Of course, several other GUIs now exist like iPods or ATMs... the
most widely adopted of which recently may be the multitouch smartphone. Each of
these has a tailored user interface to accept physical input from a human and
Of course, this is all mentioned to come back to virtual reality and the way we
can interact with it. There are several forms of virtual reality including the cave
with projections on walls (Cruz-Neira et al., 1992; 1993) and the workbench with
stereoscopic desk projections (Kreuger et al., 1994; 1995). This paper relates to
virtual reality through head mounted displays (HMD). An HMD is like headphones
!11
for your eyes. Headphones give your ears articial sounds, HMDs give your eyes
articial light (Shibata, 2002). Just like headphones can be designed to block out
outside sound, HMDs can either let light through or replace your vision entirely. At
the time of this writing, replacing vision entirely is referred to as virtual reality (VR),
while mixing real light with articial is referred to as augmented reality (AR)
(Agarwal and Thakur 2014). Head mounted displays for VR and AR, like other forms
of virtual reality, work by presenting stereoscopic images to each eye and updating
those images as the user moves their head (Cakmakci and Rolland, 2006). As long
as the hardware are software are performing their tasks correctly, virtual objects
What is particularly impressive in use are the illusions of scale, space, and
depth that naturally occur with HMDs. The way we perceive the world optically can
be dened by a few variables describing the way our eyes receive light in what is
called the plenoptic function (Adelson and Bergen, 1991). We are accustomed to
moving through and perceiving light elds as light will be entering our pupils from
every direction no matter where we place our head. Our degrees of freedom to
perceive a scene can be dened by 10 variables: x, y, and z position; pitch, yaw, and
roll rotation; distance, horizontal, and vertical position to the point of convergence
and focus; and the size of the pupils aperture (McGinity, 2014). Virtual reality head
position, 3 rotation, and 2 convergence variables. As the user moves and rotates
their head while looking around a scene, the image is updated for each eye
accordingly. However, the nal two variables are not typically accounted for in VR
systems: focus and aperture. The eyes experience some strain over time as they
with their convergence (Homan et al., 2008). The 2D display screen doesnt allow
for this, though. Displays are also currently restricted to a dynamic range dictated
by their electronics, and dont aord the high dynamic ranges of light found in the
real world that would aect the pupils aperture normally (Reinhard et al., 2010). In
the future, plenoptic lenses (Lanman and Luebke, 2013), light eld tensors
(Wetzstein et al., 2012), and high dynamic range screens (Seetzen et al., 2004) may
solve these problems, but they are not currently consumer product solutions. Even
still, the eect of perception to the user with current HMDs is enough to convince
This is where a few common buzzwords of virtual reality come in: immersion
choose to accept the presented reality as plausible for the sake of the experience.
This may even reach visceral level in which the bodys subconscious reactions are
environment, it is much easier for them to accept these surroundings as fact, both
consciously and subconsciously (Abrash, 2014). While the terms immersion and
presence are increasingly used for hype and marketing purposes, I personally like
the way that Michael Abrash describes presence as it relates to virtual reality. He
describes the human perceptual system through the use of optical illusions, in
which our understanding of raw image data is clearly being fooled. He describes
that our other systems (hearing, proprioception, touch, etc.) are also susceptible to
It is now that all of those millennia of evolution nally come into play with
computers. It is now that the human perceptual system and the graphical user
interface really meet. Whats particularly interesting here is that virtual reality as a
technology is actually older than the graphical user interface. Ivan Sutherland
created the rst virtual reality and augmented reality head mounted displays in the
late 1960s (Sutherland, 1968). The rst GUIs were created by Xerox in the early
movement and the updated image must meet that benchmark (Abrash, 2013).
Valve requires a frame rate of 90 frames per second for their Vive system (Faliszek,
2015). Resolution of screens for HMDs also must be extremely high to not be
noticeable/distracting; higher than high denition television packed into the size of
nausea for users. The brain senses a mismatch between the optical and vestibular
systems, assumes the body has been poisoned, and makes the user ill to eject
whatever substance has been consumed (Kennedy and Frank, 1985). Many
1990s to the fact that computers were not fast enough to overcome this obviously
!14
serious problem (Barras, 2014). However, the advance of Moores law (Moore, 1965)
in conjunction with the proliferation of the smartphone market has facilitated the
creation of small, fast computers with extremely high resolution displays including
acceptable frame rate, developers must budget the number of polygons, scripts,
materials, etc. used in their scenes. The resolution of the displays have reached a
level that I would personally say appears like illuminated sand, but thats still not
particularly clear for reading small or far away text. There are also diering
opinions on solutions to user movement and text input. A users viewpoint cannot
rotate or accelerate independently from their head without risking some degree of
motion sickness (Oculus, 2015a). Each of these are challenges to be aware of and
At the time of this writing, there are a few existing and announced consumer
head mounted displays and input devices that can be targeted for development.
augmented reality, but as time goes on it becomes clear that there will be no
hardware distinction between the two eventually. Putting a camera (or two) on a VR
headset makes it AR. Covering the whole eld of view with pixels in an AR device
makes it VR. So, from the design perspective of a graphical user interface, many of
the same tactics can be applied to both. Many elements of an operating system
interface design for the Oculus Rift or Valve Vive could be used with the Microsoft
Hololens. The methods and types of input would be the main things that would
!15
change the interfaces design, but most seem to be adopting some form of motion
controllers for both hands in 3D space. So, while I will more often use the term
virtual reality throughout this manuscript, the same principles will typically be true
for humans that will be easy to use based on their instincts. Virtual reality interfaces
with the human perceptual system to a more intense degree, providing a more
A well designed operating system specically for virtual reality has the potential to
be revolutionary for the digital workplace. There are droves of people who go to
work and sit in front of a computer screen. They use their operating systems to
interfaces, screen space is at and fairly limited. This means users must spend
some of their brain power navigating the abstraction between tasks that is inherent
to the display method (Medich, 2015). The interruption of doing this breaks thought
continuity and decreases productivity (Shamim, Islam and Hossain, 2012). One way
people deal with this is by getting multiple monitors. The increased screen space
interruptions and increase productivity (NEC, 2010; Ball and North, 2005; Kang and
Stasko, 2008). That is to say, when you dont have to organize windows on top of
each other, you can get more done. The immersive volume of virtual reality entirely
!16
surrounding the user 360 seems a natural inevitability for a maximized working
canvas. That canvas has the added benet of z-depth. In addition to this, letting
users spatially organize their tasks can lead to an increase in productivity upwards
of 40% (Colgan, 2015). And if that werent enough, workers in more pleasant
surroundings tend to be happier (Fisher, 2010; Gallagher, 2007) and users have the
Theory
In order for any of this to be eective, reasonable design principles need to be both
implemented and discovered. There are several existing principles for design which
can be translated from other mediums. Print design, web design, architecture,
interior design, theater, motion graphics, etc. all have elements that can be seen as
relevant and adopted. At the same time, the medium of virtual reality has
properties, like the ability for content to intersect, that are unique.
thought the easiest way to gure out what people want to do with their computers
would be to look at what the most common applications and websites are currently,
Looking at Alexa (Alexa, 2015) for the most visited websites and online lists for the
most popular mobile and desktop applications, I compiled this list, organizing them
Watching Video youtube, Netix, Amazon Prime, twitch, QuickTime, VLC, Windows Media Player, Plex
Shopping Amazon, Taobao, Tmall, Apple, Craigslist, Flipkart, Adobe, eBay, Etsy, Walmart, Ikea
File Management Windows Explorer, Finder, iPhoto, Dropbox, Google Play, App Store, iTunes Store, zip
extractors, Contacts
Computer Management System Preferences, Microsoft System Center, Symantec, McAfee, Remote Desktop,
1Password
Finance Quicken, stocks, online banking, BankofAmerica, Chase, WellsFargo, Mint, TurboTax,
currency conversion
Games Steam, Candy Crush, Angry Birds, Temple Run, Bioshock, Solitaire, Portal 2, Counter-Strike,
Skyrim, Half-Life, GTA V, Minecraft, Sims, World of Warcraft, etc.
Word Processing Word, Pages, Google Docs, Open Oce, DayOne, Adobe Acrobat
General News Yahoo, Live, Bing, MSN, BBC, Xinhua, NYtimes, HungtonPost
Content Discovery youtube, Reddit, Pinterest, Imgur, Diply, Youkou, Tudou, Vimeo, DeviantArt, Buzzfeed
!19
What surprised me in doing this practice was that, even though they are my
own categories, I could only come up with these 35. There werent any other things
that I could think of that people use computers for on a regular basis. Based on
this, it seems like you could provide an operating system that does everything
comprehensively. At the same time, I understand that designing any one of these to
that level is a large task and people may want to do entirely dierent things in
virtual reality that have yet to be invented. I tried starting with some of the things I
A media viewer for basic content like documents, images, video, & audio
status indicators
But how are video and audio represented in 3D? Where do status indicators go that
a 3D design than its traditional 2D form? How would one interact with it? It became
very quickly clear that there were some much more basic questions to answer
before individual applications could be designed. Namely, where to put content and
how to interact with it. I was going to have to start at square one and think about
the basics of layout and design workows repurposed for three dimensions.
Input
One of the rst questions for interface design is how the user is going to be giving
information back to the system. This is typically dictated by the hardware available.
!20
The mouse, keyboard, joystick, ATM buttons, iPod clickwheel, etc. all suggest
dierent types of interaction and the feedback represented on screen will change
accordingly. The most common interface for personal computing is the keyboard
and mouse combination. Before touch screens, the mouse provided a way for users
items. The keyboard, adopted from the typewriter, provided a familiar text input
expanded to include hotkeys to choose tools and perform actions quickly as well as
modiers like the control or alt keys. This allows users to use both hands: one
standardization for a very long time and is still arguably unsettled. Google
Cardboard uses a single button on the side of the handheld viewing box (Google,
2015). Gear VR uses a trackpad and back button (Samsung, 2015). Recently, both
Valves Vive and Oculus Rift HMDs have been announced to have separate motion
tracked controllers (Valve, 2015; Oculus, 2015b). Development kits of the Oculus Rift
mice, trackpads, and game controllers. In the absence of inputs in the beginning,
several startup companies oered solutions including their own motion tracked
seem like a particularly ideal solution because anecdotally, people tend to put on a
VR headset and raise their hands automatically in my experience and those I have
talked to. I have witnessed many people try to touch invisible things and have been
However, for detailed and intricate tasks, hand tracking and gesture recognition
!21
frustrated (Plafke, 2013). There is also no haptic feedback for hands in space. For
They provide normal range of movement and rotation for hands as well as physical
buttons to press.
A hybrid solution may be an option in the future. Just as the left hand used a
keyboard while the right hand used a mouse, a nal solution might be to have a
controller in the left hand while the right hand is tracked in space. Or a wand in the
right hand as the mouse cursor while the left hand presses modier buttons using
the current solutions, which are optimized for right-handed users. Left-handedness
use over years. For this project, I will assume the case of a motion tracked controller
in the dominant hand with a free tracked non-dominant hand. This will allow design
Input UI
As virtual reality has existed for nearly fty years, several interaction interface
concepts have already been created. One is to cast a ray like a laser pointer for a
cursor (Sherman and Craig, 2003). It can be dicult to hit a target at a distance, so
Menus in the past have often mimicked the 2D dropdown style, but radial
designs are also becoming more common. Which choice is best tends to depend on
!22
where the menu is originating from. Options around a hand will often be radial
directly aecting objects at a distance, the user has a small map of the
environment. Object properties in the larger world and the world in miniature are
bound such that changing things on the map changes them in the larger
environment as well.
Some of the less solved problems are locomotion and text input. Several
virtual reality have been presented, but none of them have been standardized by
dierently from the users head without risking the possibility of vestibulo-ocular
mismatch, and therefore nausea (Yao, 2014). Many experiences will just ignore this
and allow users to walk their game view in a rst-person-shooter style. Others
attempt to mitigate the problem through the use of a cockpit, occluding a portion of
the view with more static surroundings. Variations of teleportation are also used,
either moving the user to another location very quickly or immediately. Finally, the
last way to solve abstracted locomotion is to ignore it altogether and only allow the
user to navigate the area they have available for tracking, one-to-one as it is called.
This particular solution would be ne for a room-scale tracking system like Valves
Text input remains another unsolved mystery. Like the other user interfaces,
some solutions have been created already. Every solution relies on input method,
!23
however. The best user to keep in mind is probably a code writing developer,
because they rely on text input. The main goals of text input would be accuracy,
speed, and comfort. If a proposed text input solution is less accurate, slower, or less
tracked keyboard can be represented in the virtual space so that the user can
touch-type as they currently do on real buttons (Sleight, 2014). The same could be
done with a multitouch surface like an iPad. But if the user is carrying a motion
controller or two, switching back and forth will not be as simple as moving from a
mouse to the keyboard as I think putting down the controllers would be annoying.
Voice recognition is possible, though not yet accurate enough for a task like coding
as I imagine saying slash, colon, bracket wouldnt be ideal. Writing letters in the air
would also be possible, but not as fast as 60 words per minute (Brown, 1988). Hand
and gesture tracking is also not robust enough to recognize sign language at that
speed. Another text input would be to use the controllers buttons and analog sticks
or trackpads like the Steam Controller interface (Plunkett, 2015). One solution is to
provide both hands with radial menus. If each hand has six options, rotating the
depth of multiple rings and modied with controller buttons or gestures. It would
interface. Using the primary hand as a raycast cursor, the user could swipe the
beam through a virtual keyboard, similar to the Swype keyboard style (Swype Inc.,
2015). The keyboard mode and beam on/o could be controlled by the left hand,
with buttons on a motion tracked controller or simple gestures with a tracked hand.
These are some of the text input options Ive considered, but more are sure to
Tool shortcuts and modiers can also be useful for virtual reality. Currently
you may hold the Shift key while dragging to keep alignment or hold the Alt key to
like the text input. Simple gestures like a C symbol, pointing, or extended thumb
would be most reliable and allow a user to modify their tools and actions on the y.
mistake, but this behavior can be embraced as a part of the mediums strength. For
example, the slider is a common user interface, allowing a user to adjust a variable
with an icon that can be dragged through a range. In virtual reality, the zone can be
represented as a cylinder. The user can intersect their hand at whatever value they
want.
Content
Lets build an image of what the things in a virtual reality operating system might
look like. At rst, I made a concept art illustration to communicate the idea:
applications and icons surround them. As I continued through the design process, I
realized this concept has some major fallacies. For one, if this is an operating
system designed for a work environment, 40 hours per week is an awfully long time
to be standing. Its also a lot of time to be raising and moving your arms. This would
be uncomfortable after just two hours. The initial concept also places all content at
the same distance from the user, foregoing the opportunity to use z-depth for
an interface volume would be preferable. That doesnt mean users are prohibited
from standing up or using their arms more, just that they arent required to. In fact,
!27
I think that room scale tracking has opened up the opportunity for the virtual oce
space where you can stand up and walk around your customized environment if
you want to. This is probably good as either sitting or standing for long periods
becomes uncomfortable.
Virtual reality may also not be the best medium yet for many types of
content. Low resolution makes text dicult to read at a distance and, as mentioned
before, vergence-accommodation conict causes eye strain and fatigue over time.
For these reasons, mobile and PC will still be preferable for tasks like reading emails
or books. Extruding text to make it 3D doesnt help with the silhouette recognition
What VR would be a better medium for, rather, are the types of content that
are inherently 3D, but are traditionally abstracted to 2D forms out of display format
necessity. Blueprints would be a classic example of this. Buildings are 3D forms, but
imaging, ski resort maps, mechanical schematics, and geology lessons all have 3D
consumption and creation of these content types that virtual reality would be the
arguably better format, with stereoscopic spatial presentation. This eliminates the
Environment
The most obvious 3D content initially may be the environment that you are in. As
stated, the user has the ability to customize their environment like a current user
can change their PCs desktop background. However, the types of customization
!28
environment around the user. They can sit underwater, atop a skyscraper, on the
moon, in a Frank Lloyd Wright building, or amidst elds of abstract color and
particles. The environment can also be from a spherical panorama (like Google
Maps street view photos) or use elements from real life scanned objects using laser
scanners or photogrammetry.
Users also have the ability to place content within their environment, which
brings up an interesting design point. In physical reality, all objects must exist in a
document that is a piece of paper, it will exist somewhere and you will have to go to
that place if you want to retrieve that document. In traditional computer interfaces,
through the folder structure. The only persistent objects are those which are
traversing distance to its location. This concept is not particularly practical for an
By doing so, the exhibit space can be recreated and the museums experience is
available to people worldwide. You can walk around the virtual space and nd the
piece you are looking for. The concept is such a success that the entire museums
collection is digitized. You can now virtually walk through the entire museum, but it
takes several minutes to get from one end to the other to nd a desired piece. This
MoMA all museums have their entire collections digitized making a virtual reality
searchable criteria. What you have is a database of content with metadata. You
Thus you end up naturally with the same kind of paradigm as the current PC
desktop. Things that the user wants immediately visible or accessible exist as
physically located objects in the environment around them. Everything else exists in
and placement of these things is decided by the user, and I would expect that some
people would prefer as little as possible while others prefer what amounts to
clutter in the physical world very much like the dierences seen in both physical
desk spaces and computer desktops currently. This also lends itself to the spatial
cognition Im sure weve all heard referred to: I have it how I like it! I know where
everything is!
Icons
What do those objects look like, though? Initially, the GUI relied on icons: thumbnail
images to represent le types. This has largely remained the case, but content types
like images and video will now oer a thumbnail preview of their actual content in
visually this way in virtual reality, too. But at the same time, two-dimensional
symbolic icons are more quickly recognized than three-dimensional realistic icons
!30
(Smallman et al. 2000). The styles of icons will likely need to be reimagined to
represent the newer content types. Images, for example, can be monoscopic or
stereoscopic and intended for mapping to a plane, cube, sphere, or mesh. Video
can be all the same. Essentially, they are all textures for models. Models could be
displayed without texture, but textures cant be displayed without some sort of
model. Some sort of iconography is still necessary for abstract content types like
computer interfaces typically show all le types as the same visual size, regardless
of their size on the disk. When manually cleaning up a hard drive, nding the larger
les or sections is more dicult without a data visualization program. One idea for
a virtual reality operating system would be to have a view mode where les
volumes are actually representative of their physical disk space. In thinking about it,
though, this probably wouldnt be good as the default view because a text
document might appear physically tiny compared to a body scan. But it would be a
Buttons
Buttons allow a user to initiate an action. Typically, a label accompanies the button
Buttons have been implemented digitally in many forms. Most obviously, they are
design trend has favored minimalism and simple text or colored regions (Turner,
additional visual states were created like hover and pressed. These tell the user
that it is an interactable button and simulates the z-depth action of pressing it.
Button style for VR depends on the interaction method. A raycast cursor pointing at
a button in the distance will need dierent feedback from a button being pressed
placed within arms reach and physically pressed. The implementation would be
either in the area of a console around the user, or attached to the users arms or
controllers.
I began with some visual style mockups. A button would need to be easily
legible whether showing text or a symbol. It would need to be opaque for this
functionality as press-able and not just an image or text region. Initially, I created
The last two seemed legible while communicating the functionality simply. Next I
thought it would be necessary to communicate the states of the button to the user.
There is no haptic feedback, but visual cues and sound can be triggered. In fact,
sound may help users feel like they have touched something when they actually
several animations in After Eects. I purposely simulated the nger going beyond
the bounds of the button, intersecting it to see how this might be perceived. The
By seeing the nger go through the button, we are reminded that this isnt entirely
unnatural. It looks like a nger dipping into water. This seemed like an opportunity
design the button to mimic the act of submersion. The user pushes it through a
First, I would need a color palette to represent these elements. Because the
ethos was human nature, I decided to sample images of things found in nature that
may inspire some sort of instinctive reaction. The sight of blood is naturally
alarming, while campres and sunsets are naturally calming, in my opinion. Purple
rarely occurs naturally and draws a lot of attention when it does. I created this
With the hope of inspiring some amount of intimacy with the interface for
the user, I chose the skin on the palms of hands as the main inuence for the color
of touched surfaces. It was the combination of this and the water color that I used
communicative button for a hand tracked interface. Its legible, it communicates its
functionality and states to the user, and it feels natural because its based on
nature.
Content Zones
The areas in space to put menus and content becomes another question. As an
example, video editors have title safe and action safe zones to make sure their
content would be seen on older screens (NAB, 2010). Theater employs the use of
foreground, midground, and background areas for artistic purposes and to help
audiences understand zones of action (Malloy, 2014). With the assumption that the
dene some guideline zones for types of content. Nuances of the zones
For this practice, I am dening zones based on the Oculus Rift Developer Kit 2 (DK2)
(Oculus, 2015c). These zones will assume the user to be in a non-rotating chair
because HMDs like the Rift and Vive have wires that a user will get wrapped up in if
they rotate.
Firstly, we can dene the eld of view if the user is looking straight forward.
The DK2s horizontal eld of view is 94.2, based on the camera settings of Oculus
Unity assets (Oculus, 2015d). Alex Chu of Samsung research gave some useful
2014). Your eyes strain more to focus on objects as they get closer to your face until
you are eventually cross-eyed. The distance that he gives where this starts to
minimum distance of 0.75 meters (Oculus, 2015e). Between there and 10 meters is
a strong sense of stereo depth and separation between elements. This gradually
fades o and is less noticeable up to 20 meters away. After 20 meters, the stereo
As objects approach innite distance, they approach a limit at which the two
screens would be identical, pixel for pixel. Innite distance is, essentially
monoscopic. I will explain the mathematical reasoning for this far depth horizon
soon, but this diagram illustrates the perception of depth as it relates to the DK2s
According to that same presentation, people can comfortably rotate their heads
horizontally 30 from the center and have a maximum rotation of 55. I concluded
that rotation of 30 combined with the devices eld of view gives an area in which a
user can comfortably rotate their head and see elements, 77 to the side (94/2 +
!38
30). Beyond that, combining the maximum rotation of 55 with the eld of view
gives an additional area where people can strain to see things in their peripheral,
but persistent content would not be comfortable to see on a regular basis, 102 to
the side (94/2 + 55). After that, content behind the user could only be seen if they
physically rotate their body, likely out of curiosity about the environment. By
zones.
Right: Combining rotation with FOV results in beginning zones for content
!39
meters, that radius around the user can be deemed as an area devoid of
discomfort which increases closer to the user, 0.5 meters is chosen as an easily
zone.
The no-no zone comprises the area directly around the users head at a radius of ~0.5
meters
!40
The far boundary for content can be determined by the physical properties of the
head mounted display. The convergence angle of the eyes is a primary reason for
the perception of stereoscopic depth (Banks et al., 2012). As objects appear farther
away from the observer, the angle to which the eyes must rotate inwards for that
Head mounted displays have exact resolutions, so each pixel represents a xed
incremental change in rotation degree. Near the center of the display, where the
image is most clear, the incremental angle of rotation can be estimated by dividing
encompassed within it. In the case of the DK2, the horizontal resolution of 1920 can
be divided by two to get 960 pixels per eye. Dividing the eld of view of 94.2 by the
960 pixels distributed through it yields a rotation of approximately 0.1 per pixel, on
average.
Eye rotation per pixel near the center of the display can be estimated
The degree of convergence rotation for the eye is directly related to an objects
left and right eye displays render the exact same image pixel for pixel, being
essentially monoscopic.
Left: Equation to calculate distance as a function of IPD and convergence rotation angle
By subtracting the average rotation angle of a single pixel, we can estimate the
maximum perceivable depth for a head mounted display. In the case of the DK2,
subtracting 0.1 yields a perceived distance of 20.34 meters for this inter-pupillary
!43
distance. The IPD range of 52mm to 78mm yields perceived distances of 14.9 to
Left: A pixel rendered at the same coordinates for each eye is perceived at an innite distance
Right: Moving the pixel inward by one increment yields a maximum perceivable distance of
~20 meters for this device
perceivable distance for any given head mounted display. In this equation, the
devices resolution is treated as the full resolution across both eyes, assuming no
Equation for estimating the maximum perceivable distance for a head mounted display
!44
Using the same equation for other head mounted displays such as the
consumer Rift, Vive, or Gear VR, yields nearly the same 20 meter distance every
time (Further information can be found on my blog here: (Alger, 2015f)). All other
distances exist within the anti-aliasing and interpolation of a single pixel. Content
perception and can thus be deemed the far horizon at which meaningful content
But, of course, this is only the horizontal plane. This diagram is again an
three dimensional volumes. First, there is that no-no zone extending at a 0.5
meter radius from an average adult height users eyes, but as a sphere. The DK2s
screen is rectangular and the default game camera has a vertical FOV of 106.1;
because it has circular lenses. Im choosing to use the narrower of the two as the
!45
and 40 maximum because our necks get in the way. In 3D, that zone between 0.5
Oce ergonomics with relation to computers has been around long enough
for some more clearly dened numbers to emerge. The recommended angle of
viewing for longer working periods tends to be between 15-50 downwards and at
page from Dennis Ankrums Visual Ergonomics in the Oce (Ankrum, 1999). We
can slice that section out of our content zone to get an area most comfortable for
permanent content. Text for longer reading would be most comfortably placed in
this area at a distance that matches the focal distance of the devices lenses. This
would be 1.3 meters for the DK2 and likely 2.5 meters for future devices
(Answers.oculus.com, 2014).
!46
We now have zones for content surrounding the user. Next we will dene
zones of touchable interaction. This would only be useful for interface elements
that are meant to be touched with hand tracking or motion controllers using
collisions of some sort. They would naturally be within arms reach. If the solution
were one tracked hand and one motion controller, then the reaching area of the
tracked non-dominant hand is the one touch buttons would be used for. Reaching
the arms to their full extent on a regular basis would likely result in fatigue, we well
persistent content would be uncomfortable, we are left with an area ideal for touch
interaction.
!47
This zone interestingly includes the users thighs. With body or surface tracking,
hands or arms. Actual physical controllers could be placed there as well like buttons
or multitouch surfaces. The same can be applied to the users forearms. This is
currently possible with hand tracking, but not with motion controllers because the
position of the elbow is unknown. For tracked hands, the area just outside of the
silhouette is available for interface elements, but intersecting the silhouette causes
!48
of UI just outside the arm can be seen in Leap Motions Planetarium application
(Planetarium, 2015). For motion controllers, the area immediately around them is
available, though the radius of that is likely a design decision. An example of this is
Left: Hand tracking can use the areas around the hand and forearm silhouette as long as
the left and right silhouettes dont intersect.
useable as a guide for VR applications. The asset can be imported and dropped into
a project and interface elements can be placed in the zones. Then the guide can be
distances color coded with the zones labels, paying attention to the extremities
where they would be less likely to work. I also added several at varying distances
Left: Template 3D le for placing content and UI. Content and workspace zones are shown
Right: Testing zones applicability with multiple labels at varying locations for each
I found that there may be some caveats. The angles of the content, peripheral, and
curiosity zones seem appropriate. However, I could tell that the nearer labels at 20
meters were closer than others. This is probably because in the testing done by
Samsung, the compared elements occluded the same area regardless of distance
(Chu, 2014). Distances were easier to see in this application because my signs were
the same size but extending into the distance with perspective, getting smaller. I
also noticed that the hand UI zone occludes a main portion of the workspace zone -
that is to say, your buttons could get in the way of the thing youre trying to look at.
Depending on the nature of the UI, having an overlaid element may actually be of
use, but more likely, designers would usually avoid this corner of the zone.
Another caveat to this method of zones is the existence of a oor. Users feel
o balance if they appear to be oating, but oor beneath them and a static
!50
horizon line help stop this from happening (Ludwig, 2013; Cleworth, Horslen and
Carpenter, 2012; Messing and Durgin, 2005). Adding a oor that extends from the
users feet to the horizon turns the entire sphere to a dome and cuts o most of the
workspace zone. The result is the user looking directly at the virtual oor in front of
you with only small elements existing in the workspace. One possible solution is to
design the working environment to have the user on a peak, slope, or cli of some
sort, extending to the horizon. This would likely be uncomfortable for people with a
fear of heights without conditioning, though (Opdyke, Williford and North, 1995).
Having the slope angle at 50 downward to match the bottom of the main content
zone may be the best middle-ground, though I havent tested this yet.
These zones presented are specically for a non-rotating, seated design with
the Oculus Rift DK2. Changing the criteria results in dierent zones. If the user is
intended to be able to rotate all the way around in a swivel chair, then the content
zones will wrap all the way around, and the main content zone will exist as a ring. If
the device is a later model with higher resolution, the 20 meter maximum for depth
!51
information may increase. Obviously this also changes dramatically for a room-
that creating a starting point for content zone guidelines for this application is
Practical Application
Outside of the operating system concept, I have tried putting some of these
principles into application for various projects. The primary usefulness for this is
that I have come up with some design workows that others may be able to utilize.
Building assets and coding virtual reality experiences takes a lot of time. If it is done
the same case with workows for other designed digital mediums, especially those
then be rened and tweaked before a lot of valuable time is spent in development.
Of course, these design workows either havent been standardized or dont exist
Some designers have talked about their processes. In Alex Chus talk, he talks
about the system of greyboxing when designing for the Gear VR headset (Chu,
2014). He places primitive polygon shapes without textures in the environment and
tests how their placement and size feel in VR. Josh Carpenter of Mozilla described
his process of designing an interface for WebVR (Carpenter, 2015). He designs the
surface with a 0.5m radius around the user. He keeps the background transparent
with a blurred image of the VR environment to see legibility. While these are
platform. The example I will describe as well as case studies I include in Appendices
III and IV were for the Google Cardboard, Oculus Rift Developer Kit 2 with Leap
user puts their smartphone into a cardboard box with lenses (Google, 2015). The
image is updated using the gyroscope and accelerometer information from phone
and the user can press one button. Google Cardboard has rotation tracking, but no
position tracking. The Rift DK2 was introduced earlier in this manuscript as a head
mounted display. Its position is tracked within the range of a camera facing the
user. The Leap Motion controller is a hand tracking input device that uses infrared
the front of an HMD, hands can be presented in the virtual world, as long as the
head mounted display and two motion controllers which are tracked in a room-
scale volume by two laser emitters in the rooms corners (Valve, 2015). This means
users can walk across the oor within the boundaries of the room.
very dierent from the creation medium. For example, when creating a painting,
!53
you make it on a canvas and it will be seen on that canvas. When designing a
VR, robust creative tools dont yet exist within the medium itself. We have to design
I did VR interface design work for a Berlin based virtual reality company. Their
because the primary content is photos and videos captured with smartphones.
Features of the application are similar to YouTube, with custom content uploading,
viewing, commenting, sharing, categorization, etc. but there are other features like
multiplayer shared viewing and talking. I am not at liberty to divulge the nal
I will instead describe the workow process of rapid prototyping an animated UI.
create a browsing interface. I worked with Chris Manseld to brainstorm and create
inputs. There is the yaw, pitch, and roll rotations, and a single button. This
interaction can be expanded slightly by considering that the user can do things with
the button like holding or double clicking. A knock on the device could also be
simple scenes quickly to test our impression of them when experienced with depth.
video mockup. To do this, I used After Eects because it is useful for animating
background; again borrowing from the popular concept in theater and cinema.
set the width-to-height ratio to 2:1 for equirectangular mapping. Design guidelines
!55
created a grid with vertical and horizontal lines for the sphere to represent every 5
around the user, 360 around and 180 vertically. For the actual visualization
16:9 ratio, like the HD standard resolution of 1920x1080. I divided the width in half
because it is separated between the two eyes, and its much quicker to simulate one
compositions within this nal composition. I applied the CC Sphere eect to all of
them, changing the settings to render only the inside and removing lighting eects.
The midground composition had a radius such that the angle of the sphere that
was visible matched the HMDs eld of view. This can be seen with the grid in place,
since every line represents 5. The foreground had a slightly smaller radius and
the background slightly larger. When all three spheres are rotated together, this
oset gives a sense of parallax in the 2D video which helps viewers understand the
depth relationships better. I also added a vignette to simulate the fallo in eld of
Right: Spheres with dierent radius represent foreground, midground, and background
Something I realized was that the angles of the content zones can be directly
translated to 2D areas on the grid. These grid and area guides can be turned on and
o for quick reference by showing and hiding the layers in their compositions.
Using this template, I was able to simulate a hypothetical user navigating through
like colors, fonts, and animation speeds remain changeable to test dierent styles.
The interface design is based on a locked reticle following the users rotation.
The user can point their head at dierent options and a clear hover state is shown
for selectable elements, similar to the Oculus Home application for the Gear VR
HMD (Oculus Home, 2015). The user can press the button once to bring up a
content browsing menu, or hold the button down to bring up another radial
navigation interface. Letting the button go then chooses the option the users reticle
is hovered on. There are several other aspects of the interface that the creators
have asked me not to share yet because its not out yet, so theyve been omitted.
!58
by itself, I made another composition with a section for notes to the side. It has a
representation of the Google Cardboards button. This allows me to show when the
button is being pressed/held and leave notes for the developers about what is
happening. These are some example screenshots of the still in-process mockup:
progress works for an event in June 2015 called Utopia. Because most people
workows. I chose to create a simple button pressing interface using hand tracking.
The interface elements positions would be based on testing the content zones
In use, I noticed some caveats of the tracked hand button interface. For one,
the users head has to be pointed in the direction of the button they are pressing.
The hand tracking device is mounted to the front of the HMD, so it cant track
!60
something outside of its own infrared cameras eld of view. My demonstration had
an option to turn on a chair mesh with a button. Some people would look behind
themselves at the chairs location and use their proprioception to press in the air
where they remembered the button to be, in the opposite direction. The action
wouldnt take place and they would appear disappointed or confused. I also saw in
my own testing and with other people that it is a bad idea to have buttons of this
style aligned vertically. In order to press the button, many users would swing their
arm downward through it. Theres no normal force to stop their hand or give
feedback, so their arm would continue in an arc through the motion past the
button. If there was another button beneath, it would often be triggered, too. Again,
One of the items the users could toggle on and o was a scaled down
hackathon project called Museum of Lies for which the summary can be seen
here: (Alger, 2015c)). I purposely placed this within the no-no zone near the users
face. Not unexpectedly, its appearance was startling to many. Some attempted to
lean back away from it immediately. What was unexpected was how much people
liked to put their head inside of it. Similar to the world in miniature concept, this
appeared to be a miniature building model that they could go into and enjoyed.
One part of the process while making this experience was my use of
skyspheres and skydomes. The environments that users could switch between in
this experience were primarily equirectangular photos. The simplest way to display
these in virtual reality is to map them to either a cube skybox or a sphere with
nature and typically captured by taking photos in every direction from around a
!61
single point. Besides the lack of stereoscopic depth, these images suer from
the camera appear too large. This is because they take up a certain angle in the
eld of view which is also typically associated with a certain convergence related to
the objects distance. When the eyes are viewing a monoscopic equirectangular
photo on a sphere, they are converging at the same distance for every point in the
photo. If the skybox is set to render at an innite distance, objects in the scene will
appear to be the correct scale as their actual distance approaches innity. Objects
that are closer to the camera will look too big as they take up a wider eld of view
but also appear to be at innite distance. This is particularly obvious often when
looking down at the ground below the photos capture position. It will appear to be
Size distortion in monoscopic photo spheres. Objects appear larger as the eld of view angle
is preserved but their distance is increased to innity. Decreasing the spheres radius results in
One way to mitigate this is to move the geometry of the sphere closer to the user
for specic locations in the photo. This brings the convergence back to the correct
depth and eliminates the disproportionate scale illusion. I tested this by creating a
dome mesh for some of the photos to be mapped to. In order for the photo to
appear correctly, the textures UVs must remain at the same viewing angle from the
user while moving closer to the eyes. One way to do this is to select the vertices to
be transformed and scale them downward toward the point of origin at the center
of the sphere. I tried this with a at ground only, assuming the cameras distance to
By moving points closer to the user, but keeping their angle, we can both reintroduce
convergence and reduce the size distortion. This diagram shows how it can be applied to
only the ground extending to the horizon. This doesnt account for other objects besides
The main problem with this that I encountered in testing was UV mapping artifacts.
As the mesh gets pulled, the squares become trapezoids and the texture, mapped
the shader (Northway, 2013) or more clearly dening the texture space coordinates
(Everitt, 1997). One way to try to work around it is to increase the polygons. This
View from inside the dome comparing low and high poly UV distortion. While increasing the
geometry helps, it doesnt solve the problem and a custom shader to eliminate the artifact is
a better solution.
!64
For the exhibition, I had environment photos mapped to both spheres and
domes. People who were afraid of heights were often uncomfortable with the
spheres because the ground appeared far away. Some would pull their feet up or
brace themselves when saying they didnt like it. They would often quickly switch
the environment back to one of the dome options. The dome solution was clearly
imperfect for many types of photographic content, however. For example, a beach
scene has palm trees extending from the sand. While the sand and ocean appear to
correctly map into the distance, the palm trees do not. They extend across the
ground to the horizon where they then move up the sky, again looking too large.
Ideally, every pixel of the photo would be mapped to its correct depth. The
current polygon sphere model is not ideal for this because one vertex per pixel
would be millions of faces and far too many for a VR experience to process fast
Capturing the depth map would be a matter of 3D scanning techniques like infrared
Right: Depth map. White is 0 distance, black is innite distance. Its a bit dicult to see, but
the nearest lamp post is the closest distinguishable feature.
The problem with this solution is that it lacks the occluded stereo parallax
information where one eye would be seeing things that the other cannot because
of its positional distance. So, where a pixels position was shifted to accommodate
for convergence, its original position would be left blank. There would appear to be
a shadow tear between every near object and its backdrop. Repeating the further
pixels for one eye would be possible, but not ideal as its not the true information
from the environment. Capture methods incorporating lateral movement for stereo
information such as photogrammetry and light elds remain ideal (Wilburn et al.,
2005).
!66
This manuscript has presented several guidelines and workows related to virtual
reality experience and interface design. The larger goal was to apply these methods
to the use of an operating system environment. This concept can easily become
Oculus Rift and Touch controller mounted with a Leap Motion version 2 hand
tracker (Kreylos, 2014; Bedikian, 2015). It would also be possible for the software
only present a concept for an operating system for a simple hardware setup using
The example I will present is a distributable operating system design for the
Vive system in an oce workplace. To recap, Vive is a head mounted display with
4.5x4.5 meters (Htcvr.com, 2015). This concept would work exactly the same with
the Oculus Rift and Touch controllers because it is also capable of room-scale
tracking (Lang, 2015); however, I havent yet had the opportunity to try the
with which I have more experience for this thought experiment. I will also present
this as a hypothetical environment where the user has one motion controller in
their dominant right hand and a tracked left hand as mentioned before. This way, I
The hypothetical user has a business that provides 3D scanning services for
museums. She has an oce space with a chair, in which the Vives lighthouse
tracking system has been set up. Her current task is researching a 3D scanner to
see if her business should invest in it. Wearing the headset, she watches a 2D video
explaining the project. She is in the VR equivalent of Full screen mode, with just
the video and controls. The video itself is taking up 54 of her horizontal viewing
angle, the same as a standard movie theater (Imax, 2013), and it is at a distance of
She scrubs ahead in the video by intersecting her hand with a cylinder representing
the videos timeline. Removing her hand from the cylinder resumes the video
She presses the pause button, which is near her left hand and styled as described
earlier. She exits out of full screen by pressing a button oating next to her
controller.
!69
The video returns to a smaller size amidst her customized environment and
applications. She now appears to have a mountain valley in front of her. To her
sides are some of her favorite sculptures and paintings in and on a partial building.
She points the motion controllers cursor at the scanners user manual document,
represented as a preview thumbnail of its content. When hovered, its name appears
and the thumbnail edges glow. She presses the thumbpad button to open it and
its position and scale transform to the ideal content area for reading: 1.3 meters
Using the thumbpad on the controller, she scrolls through the document. She
selects a point with the trigger, scrolls ahead, and makes a gesture, pointing her left
thumb to the side while selecting the second point. This is akin to holding the shift
key while clicking on a computer. When the gesture is recognized, the cursor
pointer changes shape. The gesture modies the cursors action, highlighting the
She brings up the documents menu by squeezing the controller. The radial menu
protrudes from the space around it. She selects a sub-menu by moving the
controller into the correct options and releasing the squeeze, choosing the option
to speed read.
The words ash rapidly one at a time, and she adjusts the speed using the haptic
thumbpad on the motion controller. While this is going on, a radial light begins to
pulse in her peripheral vision. It is an alarm she had set earlier as a calendar alert to
remind her to meet with a prospective client in an hour. She stops the speed
reading function by pressing a button with a stop icon similar to the video pause
earlier. Looking in the direction of the alarms beacon reveals its title and time, as
well as options for dismiss (left), snooze (right), or details (down). Pointing the
raycast cursor at it, she pulls the trigger and swipes it to the left, dismissing it.
!72
She looks at her left wrist, which has the current time. Her forearm shows the
currently allocated resources per application, and the back of her hand shows her
Shell need to grab lunch before the meeting. She turns to a personal assistant
still minimalistic, has elements inspired by the proportions of human children. The
!73
goal of this is to reduce user frustration for unexpected responses. Upon directing
her gaze at it, two dash icons widen to simulate eye contact showing that the
application is now listening to the microphone. She says Wheres a good place to
eat near the Museum of Art? and the words she speaks appear above the assistant
The assistant replies Heres restaurants with more than ve stars near Museum of
Art. and a list appears. The user says Show them on a map. Again, the words
appear above the animated assistant icon. A 3-dimensional map of the museums
neighborhood appears with location indicators for points of interest. The higher
rated restaurants labels are displayed at a higher altitude. She points the raycast
cursor at it and pulls the trigger, holding the map as she repositions it. While she
holds it, moving her left hand further left scales it larger, and towards and away
from her rotates it. She selects a restaurant by pointing her cursor at it and
Each time she interacts with content, the button interface in front of her changes to
the relevant application. She presses a button with her hand to get directions. The
map displays a path from her oce to the restaurant. She presses another button
to send the directions to her smartphone. Removing the head mounted display, she
What I have just described is one potential solution for this series of tasks.
methods I depicted may not be, and indeed very likely arent the most ideal. Even
changing individual aspects of the narrative alters the probable interface elements.
tracking would mean all touched elements being constrained to that surface. This
narrative thought experiment simply serves as a starting point from which user
experience testing and iteration can begin. There is no doubt that unexpected
caveats and best practices would emerge from repeated meaningful feedback.
As a personal starting point for the evaluation and progressive iteration of these
paper is a modied extension of the interaction application used for the Utopia
aperturesciencellc.com/vr/application.zip.
Whats good about this is that several of the concepts I describe are
objects makes them more understandable. Its quite obviously a far shout from the
robust narrative just described, though. Hand tracking using the current Leap
Motion controller is not accurate enough for raycasting cursors without frustration,
for example. As a user experiences the project, they will undoubtedly understand
!76
the concepts with greater depth by feeling the interactions that they nd more and
less comfortable.
doubt that a full virtual reality operating system could be created using existing
technology. This prototype and the concepts described and illustrated are merely a
small stone near the beginning of a path. Its just the beginning, but it is a solid
start, I believe.
Whats Next?
The most obvious next step is testing and adjustment for each element as it is
implemented. The goal would be to build applications that I would want to use.
Where and how to do that are unknowns. It does seem to me that positions like
Elements of user interface are only small parts of the larger and more important
user experience. The design process for head mounted displays is particularly
experiences that are magical, that reshape a users view of the world, or that can
Conclusion
is best suited for and the most likely ideal locations for that content, both
which we can design interfaces to interact with and modify that content using
manuscript.
On a personal note, I do hope that this work will prove useful to others. I
explained in the introduction that I have a desire to work on something with a more
lasting application and value. I also explained that virtual reality is a rapidly evolving
topic, changing frequently. It is very possible that much of these concepts and
workows will be rendered obsolete within even a years time. But it does seem that
concepts like ergonomic considerations for zones of content will be necessary, even
For a year, I wasnt sure if evolving personally from a focus in motion design
reading, experiencing, creating, and talking to others about and in virtual reality, I
still wasnt completely sold myself on a career shift. It was at a weekend game jam
using the Vive system that my mind was made up. This was the rst time I used a
around the digital objects that we had just created I couldnt help but gain a sense
really hit home the concept of the most powerful and versatile storytelling medium
ever created.
volumetric interfaces do not yet have established conventions. Where writing, lm,
television, radio, theater, graphic design, etc. have expected elements, head
the consumer market will run virtual reality through the rening crucible of ethics,
committees will form to ensure the mitigation of social risks. We will soon see the
rst VR related death, claims of head mounted displays causing cancer, blaming the
melting the brains of its users. Alongside this will be the immersive storytelling,
References
Abrash, M. (2013). Why Virtual Reality Is Hard (and where it might be going). Game
Developers Conference 2013 [presentation] Available at: http://
media.steampowered.com/apps/abrashblog/MAbrash%20GDC2013.pdf
Adelson, E. and Bergen, J. (1991). The Plenoptic Function and the Elements of Early
vision. Computational Models of Visual Processing. Cambridge, MA: MIT Press.
pp. 3-20.
Agarwal, C. and Thakur, N. (2014). The Evolution and Future Scope of Augmented
Reality. IJCSI International Journal of Computer Science Issues. Volume 11, Issue
6, No 1. pp. 59.
Alexa (2015). The Top 500 Sites on the Web. [website] Available at: http://
www.alexa.com/topsites
Alger, M. (2015b). Take Care of Your Humans (with Virtual Reality!). Hemnes, Warsaw,
Poland. [presentation] Available at: http://www.meetup.com/GoMobile-with-
Design/events/220210315
!80
Alger, M. (2015d). Insider Notes from the HTC Vive VR Jam at Londons Playhubs.
[online] Road to VR. Available at: http://www.roadtovr.com/insider-notes-from-
the-htc-vive-vr-jam-at-londons-playhubs/
Alger, M. (2015f). HMD resolution and maximum depth perception. [blog] Available
at: http://mikealgermovingimage.tumblr.com/post/127113260256/hmd-
resolution-and-maximum-depth-perception
Ball, R., North, C. (2005). Eects of tiled high-resolution display on basic visualization
and navigation tasks. CHI '05 Extended Abstracts on Human Factors in
Computing Systems. pp. 1196-1199.
Banks, M., Read, J., Allison, R. and Watt, S. (2012). Stereoscopy and the Human
Visual System. SMPTE Motion Imaging Journal, 121(4), pp.24-43.
!81
Barras, C. (2014). How Virtual Reality Overcame its 'Puke Problem'. BBC. [online]
Available at: http://www.bbc.com/future/story/20140327-virtual-realitys-puke-
problem
Brinkmann, R. (1999). The art and science of digital compositing. San Diego: Morgan
Kaufmann.
Carpenter, J. (2015). UI/UX design for WebVR with Josh Carpenter. SFHTML5.
[presentation]
Cleworth, T., Horslen, B. and Carpenter, M. (2012). Inuence of Real and Virtual
Heights on Standing Balance. Gait & Posture. 36(2), pp.172176.
Colgan, A. (2015). Designing VR Tools: The Good, the Bad, and the Ugly. Leap Motion.
[online] Available at: http://blog.leapmotion.com/designingvrtoolsgoodbadugly/
!82
Collet, A., Chuang M., Sweeney P., Gillett D., Evseev D., Calabrese D., Hoppe H.,
Kirk A., Sullivan. S. (2015). High-quality streamable free-viewpoint video. ACM
Transactions on Graphics. 34(4).
Cruz-Neira, C., Sandin, D., DeFanti, T., Kenyon, R. and Hart, J. (1992). The CAVE:
audio visual experience automatic virtual environment. Communications of the ACM,
35(6), pp.64-72.
Everitt, C. (1997). Getting to know the Q texture coordinate. [online] Available at:
http://www.xyzw.us/~cass/qcoord/
Faliszek, C. (2015). Vive Game Jam. Playhubs, London. [in person] 11 July 2015.
Gaylor, G. and Joudrey, J. (2015). VR Chat. v 0.8.7. [software] Available at: http://
www.vrchat.net
Imax (2013). IMAX 101: Theatre Geometry. [online] Available at: http://
www.imax.com/community/blog/imax-101-theatre-geometry-video/
": having the ability to know or understand things without any proof or
evidence"
!84
Kennedy, R. and Frank, L. (1985). A Review of Motion Sickness with Special Reference
to Simulator Sickness. Westlake Village, CA: Canyon Research Group Inc.
Koved, L. and Selker, T. (1999). Room With a View (RWAV): A metaphor for interactive
computing. Yorktown Heights, N.Y.: IBM T.J. Watson Research Center.
Kreuger, W. Bohn, C., Froehlich, B., Schueth, H., Strauss, W., and Wesche, G.
(1995). The Responsive Workbench: A Virtual Work Environment. IEEE Computer. Vol.
28, No. 7. pp. 42-48.
Lanman, D. and Luebke, D. (2013). Near-Eye Light Field Displays. NVIDIA Research.
Available at: https://research.nvidia.com/sites/default/les/publications/NVIDIA-
NELD_0.pdf
Leapmotion.com (2015). Leap Motion for Virtual Reality. [website] Available at:
https://www.leapmotion.com/product/vr
!85
Lessig, L. (1999). Code and Other Laws of Cyberspace. New York: Basic Books.
Ludwig, J. (2013). Lessons Learned Porting Team Fortress 2 to Virtual Reality. Game
Developers Conference. [presentation]
Medich, J. (2015). What Would a Truly 3D Operating System Look Like?. Leap Motion.
[online] Available at: http://blog.leapmotion.com/truly3doperatingsystemlook-
like/
Messing, R. and Durgin, F. (2005). Distance Perception and the Visual Horizon in
Head Mounted Displays. TAP, 2(3), pp.234-250.
NAB (2010). Television Safe Areas Redened. TV TechCheck. [online] Available at:
http://www.nab.org/xert/scitech/pdfs/tv031510.pdf
NEC, (2010). Monitor Size and Aspect Ratio Productivity Research. [presentation]
Nelson, N. (2013). Is Virtual Reality Gaming Destined For A Comeback?. All Tech
Considered. NPR. Available at: http://www.npr.org/sections/alltechconsidered/
2013/06/12/191067676/is-virtual-reality-gaming-destined-for-a-comeback
Oculus (2015a). Oculus Best Practices. [online] pp. 15-16. Available at: http://
static.oculus.com/documentation/pdfs/intro-vr/latest/bp.pdf
Oculus (2015c). Oculus Rift Development Kit 2. [website] Available at: https://
www.oculus.com/en-us/dk2/
Oculus (2015d). Oculus Utilities for Unity 5. [website] Available at: https://
developer.oculus.com/downloads/game-engines/0.1.0-beta/
Oculus_Utilities_for_Unity_5/
Plafke, J. (2013) Leap Motion review: Is it time to replace the mouse?. Extreme Tech.
[online] Available at: http://www.extremetech.com/extreme/161813-leap-
motion-review/3
Plunkett, L. (2015). Valve Thinks It's Cracked Typing With A Controller. Kotaku.
[online] Available at: http://kotaku.com/valve-thinks-its-cracked-typing-with-a-
controller-1709175825
Poeter, D. (2015). How Moore's Law Changed History (and Your Smartphone). PC
Mag. [online] Available at: http://uk.pcmag.com/cpus-components-products/
41195/news/how-moores-law-changed-history-and-your-smartphone
Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., Carey, T. (1994). Human-
Computer Interaction, Addison Wesley.
Reinhard, E., Ward, G., Pattanaik, S., Debevec, P., Heidrich, W., Myszkowski, K.
(2010). High Dynamic Range Imaging. Burlington, MA: Morgan Kaufmann/Elsevier.
pp. 239.
Seetzen, H., Heidrich, W., Stuerzlinger, W., Ward, G., Whitehead, L., Trentacoste,
M., Ghosh, A. and Vorozcovs, A. (2004). High Dynamic Range Display Systems. TOG,
23(3), p.760.
!88
": one of the ve natural powers (touch, taste, smell, sight, and hearing)
through which you receive information about the world around you"
Sherman, W. and Craig, A. (2003). Understanding Virtual Reality. San Francisco, CA:
Morgan Kaufmann, pp.310-325.
Sleight, L. (2014). VRO (VR Objects). AnyAll. [website] Available at: http://
www.anyall.net/#!vro/c1igr
Smallman, H., John, M., Oonk, H. and Cowen, M. (2000). When Beauty is Only Skin
Deep: 3-D Realistic Icons are Harder to Identify than Conventional 2-D Military Symbols.
Proceedings of the Human Factors and Ergonomics Society Annual Meeting,
44(21), pp.3-480-3-483.
Stoakley, R., Conway, M., Pausch, R. (1995). Virtual Reality on a WIM: Interactive
Worlds in Miniature. University of Virginia.
Thacker, C., McCreight, E., Lampson, B., Sproull, R. and Boggs, D. (1979). Alto: A
personal computer. Computer Structures: Principles and Examples, second
edition, pp.549-572.
Turner, A. (2014). The History of Flat Design: How eciency and minimalism turned the
digital world at. The Next Web. [online] Available at: http://thenextweb.com/dd/
2014/03/19/history-at-design-eciency-minimalism-made-digital-world-at/
Wilburn, B., Joshi, N., Vaish, V., Talvala, E., Antunez, E., Barth, A., Adams, A.,
Horowitz, M., Levoy, M. (2005). High Performance Imaging Using Large Camera
Arrays. ACM Transactions on Graphics. Vol. 24, No. 3. pp. 765-776
Yao, R. (2014). The Human Visual System and the Rift. Oculus Connect.
[presentation]
!90
Appendix: Avatars
representation of the user (Lessig, 1999). This is usually for the purpose of
representing the user to other participants in a multiplayer setting, but it can also
just be the body or reection the user sees as belonging to themselves. An avatar
isnt necessary and many experiences choose to forego the representation of the
disparity (Romo, 2015). This is the mismatch of signals to the brain between the
internal sense of knowing where the body parts are not matching up with where
they look like they are. Most systems dont track the whole users body, so an avatar
wont be moving in unison with the user. Multiplayer experiences still need to
degrees. VR Chat allows any kind of avatar (Gaylor and Joudrey, 2015). ConVRge
allows a single object that rotates with the users head motion (Lee and Whiting,
2015). AltSpace only allows their own avatars which will only represent the aspects
that are known (AltspaceVR Inc., 2015). For example, if the user has eye tracking,
eye representations are shown. If they have hand tracking, hands are shown,
otherwise they are not. By default, the avatars are all similar looking torsos with
prospect. Multiple people can be in the same space working on the same content at
the same time. This does already exist in 2D. For example, support services may use
situations are shown as cursors essentially their avatar. In virtual reality, the
The concepts of self identity and social representation of self are interesting
certain ways. One could write thousands of papers on the topic and indeed
thousands may have been written, but virtual reality provides a particularly
Participating in the VR Chat community, it would seem like every day was
Halloween; Like every gathering is a low-key masquerade. Each user can choose any
name they want and don any appearance they want. A majority of users then
choose the appearance of favorite characters from pop culture, video games, comic
books, television, and movies. While most of these are humans, there are currently
rabbit with a rainbow trail will jump around the feet of a 20 meter tall giant. One
user stuck to the identity of Mr. Whiskers, a black cat who never says anything, but
will meow or hiss and defecate on things he doesnt like. He hasnt broken character
once in the months Ive seen him. Another would switch rapidly in sessions
honking van with headlights, being completely invisible, or even being an entire
room himself.
What I found in returning to talk to these people each week was that I could
remember people I had met more easily if they had the same name and
appearance. This may seem obvious, but while most people keep the same name,
they change what they look like on a regular basis. The experience would be
!92
analogous to attending a party where everyone wears nametags. You meet people,
have conversations, and relationships are established. The next week, you go back
to the same party with the same people, but theyve switched skin. The personality
is attached to the nametag and not the persons appearance. You could imagine
how you would have a more dicult time remembering who someone was,
particularly if someone else now looks like they used to. Basically, the continuity of
relationships is more dicult to maintain. I realized this was the case for me, too, as
represent myself in a way that people could maintain their relationship continuity
I tried out three methods for avatar creation from actual humans: 3D Modeling,
and textures which can be viewed and animated in a game engine. Each option has
pros and cons and I will describe my conclusions on the strengths and weaknesses
3D Modelling
Basic 3D modelling is the rst option. I used a free program called MakeHuman to
features. Using reference photos, I attempted to get the model as close as I could.
Then, after exporting the result, I used Maya to tweak vertices and Photoshop to
!93
edit textures, but the same tasks could likely be completed with free softwares like
Blender and Gimp. The main problem with this method, in my opinion, is my own
inability to be objective. We tend to see and process humans, and especially faces,
and even more especially our own faces with distinct scrutiny and replicating their
exact image requires skilled artists. This was the process and result:
Photogrammetry
photos taken around a subject at varying angles (Walford, 2007). Detail points are
then analyzed for dierences between images and a point cloud is generated from
their parallax. This point cloud then is used to create a mesh and the original
photos are applied as the texture. Agisofts Photoscan and Autodesks 123D Catch
are examples of programs that can do this (Agisoft, 2015; Autodesk, 2015a). I tested
the ability of 123D Catch to generate a mesh of my face using only a smartphone
camera and app. While it would also need some mesh and texture cleanup, it
provided a very good starting point, particularly having originated solely from
pose for an attempt at a full body scan. I took photos all around the subject, again
fairly uncontrolled environment. After running those photos through the Photoscan
- testing for consumers since most would not have a custom studio
!95
Entire sections of geometry are missing where photo edges cropped the subject.
only determine a single face of geometry in a plane. The legs were also not well
determined by the software because they were not far apart and there was very
little contrast provided by the dynamic range of the camera. In order to have the
best tracking, camera settings like exposure need to be constant between photos.
The environment tested here had a high dynamic range that caused the
controlling the environment and taking photos carefully. Including the whole
subject in every photo with a high resolution camera locked at a specic exposure,
and having the subject wear opaque clothing in an evenly lit environment with legs
and arms apart would be ideal. It is no surprise, then, that this is the process for
3D Scanning
The third option is 3D scanning using computer peripherals. The scanners I used for
this were the Artec MH-T, Eva, and Spider. The rst problem with these is that they
are cost-prohibitive for home or hobby use. They are very detail accurate, though.
To test avatar creation from the scan, I isolated the head to create an object for use
with ConVRge. I used a free program called Meshlab to reduce the geometry in
steps using the quadratic edge collapse decimation command (MeshLab, 2015).
Many of the UVs needed to be adjusted so the texture would look more correct.
Then, using Maya, I created a normal map of the high-poly model for the low-poly
version (Autodesk, 2015b). I created a custom reectivity map for unity that mimics
the increased specularity of eyes, lips, forehead, nose, cheeks and hairs. The nal
Pros Cons
Common equipment
Hybrid Solutions
Solutions that combine these methods would also be viable. For example, the body
could be created photogrammetrically and the face 3D scanned. Or the body could
scientic name: a sillustration) about VR design. If you want to talk about the things