Image Processing Theory, Tools and Applications
Smartphone based Guidance System for Visually Impaired
Person
Muhammad Asad Waseem Ikram
Department of Electrical and Electronics Department of Electrical Engineering
Engineering National University of Computer and
University of Sheffield Emerging Sciences
UK Pakistan
[email protected] [email protected] Abstract— In order to facilitate the visually im- blind, while the rest 87% have low vision [4]. One of the
paired person in navigation, we have developed a pro- biggest problem faced by these visually impaired persons
totype guidance system. The main assumption of this is to perform everyday task. For this purpose, designers,
guidance system is that there are many straight paths
in different real world scenarios. These straight paths both visually impaired and sighted, have developed a
have parallel edges, which when captured as an image number of tools for use by the blind people.
seem to converge to a single point called the vanishing The major mobility tools used include white canes
point. Proper feature extraction and mathematical and guide dogs. White canes are claimed to be not safe
modelling of the captured frame leads to the detection enough to cross streets and go to other insecure places.
of these parallel edges. The vanishing point is then
calculated and a decision system is formed which There has also been some reluctance to use these white
notifies the blind person about his/her deviation from canes, particularly by those people who are not totally
a straight path. The scope of this system is limited to a blind. The much bigger problem is that these canes are
straight path and has been tested in different lighting not reliable as they are not giving sufficient information
conditions and with different level of occlusion. A of the surroundings to the blind person. A small number
laptop mounted on a 2D robotic platform is used to
develop and verify the robustness of the algorithm. of people employ guide dogs to assist in mobility, as it is
Finally, a smartphone based real-time application has mostly done in western countries and can be expensive
been implemented for this visual guidance system, in or unavailable to many. Also in some cultures, dogs are
which the decision system returns an audio output to socially unacceptable.
guide the visually impaired person. This application In literature, there has been a lot of research in this
has an average execution rate of 20 frames per second,
with each frame being of 320 by 240 pixel size. The topic. This includes mobile phone based systems to find
system has an accuracy of 84.1% in a scenario with cross walks and paths [5][6][7]. Research has also been
pedestrians and other objects, while without pedes- done using stereo vision cameras and mobility aids such
trians it produces an accuracy of over 90%. as power wheelchairs [8]. Work has also been done for
Index Terms— Guidance System, Real-time Sys- correcting the orientation of a blind person [9][10][11].
tem, Visually Impaired, Hough Transform, Vanishing
Point, Navigation Most of these approaches were aimed at small devices
and, hence, were computationally challenged, because
of the limited hardware available. Therefore these ap-
I. Introduction
plications were either not real-time or had very limited
Recent developments in the field of computer vision functionality. Some of these methods used colour targets
and the emergence of different compact and powerful for easy recognition of location. Most of the research
smart phones with high resolution cameras, made it was aimed at indoor or controlled environment, with less
possible for the development of different computer vis- variations in illumination and was sensitive to occlusion.
ion applications on these devices. This is evident by Similar work has also been reported for robot navigation
the recent porting of many real-time computer vision in indoor environment [12].
libraries to different mobile phone platforms. The most In this paper, we propose a guidance system which
prominent and powerful of all is the implementation makes use of the fact that due to the structural sym-
of OpenCV libraries for android based smart phones. metries in all man made structures, there exist a lot of
This has given a whole new platform for research in the straight paths in different real world scenarios. These
field of computer vision applications on mobile phones straight paths have many parallel edges, which humans
and has already resulted in many smart phones based use everyday to perceive the 3D picture. This picture
applications which use different complex computer vision however is not available to both completely blind and
techniques [1][2][3]. partially blind persons. These edges, when looked upon
Today, about 4% (285 million of the world’s total through a camera, converge to a vanishing point. We
population) is visually impaired. From these 13% are compute the deviation of a visually impaired person
978-1-4673-2584-4/12/$31.00 ©2012 IEEE
according to the location of this vanishing point in the of size 640 by 480 pixels and the average execution time
image, which is then used to guide the visually impaired was calculated, which is presented in Table I. Canny’s
person in the straight path.
The organization of this paper is as follows: Section
II explains the method used to develop the guidance for
a visually impaired person. The results are presented in
Sec. III followed by conclusion in Sec. IV.
II. Methodology
We propose a guidance system based on the fact that
in almost every man made structure, there exists a
geometrical symmetry. Human visual system uses these
structures to perceive the location and assist the person TABLE I: Average Execution Time of different edge detection
in navigating in a path. However this is not the case with operators on images of size 640 by 480 pixels
computer vision programs where this is viewed as a 2D
image with parallel edges converging to a point [13]. Our edge detector being the most accurate edge detector was
proposed approach simulates a system which is similar to also computationally the most expensive one. For this
human visual system and provides the visually impaired prototype system, only the dominant edges were required
person with the information about their deviation from but canny’s edge detector detects edges of even small
the straight path. details in the image. Therefore, these small details needs
to be filtered out if canny’s edge detector is used, which
lower the efficiency of the system. Other edge detection
techniques like perwitts, sobel, LoG, Roberts and Zero
Cross are very efficient, however they are not very ac-
Fig. 1: Block diagram of the proposed method, depicting the three curate as they filter out a lot of required details. Using
main blocks involved in designing the guidance system the edge detection technique from [14] efficient results
were achieved with very less compromise over accuracy.
Our proposed guidance system consists of three major This technique is based on the morphological operators
blocks, which are shown in Fig. 1. These blocks are which makes it efficient as only the dominant edges are
discussed in detail in subsequent sections. detected with high accuracy. From Table I, it is seen that
this morphological edge detection technique is almost as
A. Feature Extraction efficient as perwitts, however it produces much better
One of the most important step in any computer vision result than any other operators with similar execution
application is the extraction of relevant features. This time [14]. It was also observed that taking average of the
extraction step should focus on both extracting relevant frame before applying edge detection technique improved
information and minimizing out all the outliers. This the detection of the prominent edges. A 3x3 averaging
process can then be followed by much more complex filter was used to perform this operation. The size of the
operations only on the relevant and small set of features. image was chosen to be 320 by 240 pixels as this size was
An efficient feature extraction step is essential for the neither too small to diminish the effect of edge detection
implementation of real-time systems. These systems have techniques, nor it was too large to have an effect on
limited time to process and present results, and with an the execution time. The images were directly acquired
proper feature extraction technique they can be more in grayscale format to lower the execution time of the
robust and efficient. The proposed system uses several system.
feature extraction techniques which are discussed below.
In the proposed system the features extracted are edge B. Feature Selection
features of the parallel lines in straight paths. It was Feature selection process is also a trivial part of a real-
observed from the results of testing phase of this system, time system. This process makes use of the observations
that from all the edges detected only a few prominent and selects the best possible features for further pro-
edges actually contributed towards the guidance decision cessing. It uses the edge detected output of the feature
which will be presented in Section II-B. Since the system extraction step, and selects the relevant lines in order to
is aimed at real-time implementation, therefore a balance facilitate the guidance system with required information.
between efficiency and accuracy was required. For this This step is further divided into two steps. First step
reason, a statistical comparison between different exist- deals with mapping prominent lines into mathematical
ing edge detectors was done, comparing both efficiency equations and the second step involves selecting lines
and accuracy of each operator with different variation in which are relevant to the system and also connecting
illumination, occlusion and different level of prominent disconnected lines due to occlusion and illumination
paths. Each operator was run on different set of images changes.
(a) Orignal Image (b) Edge detection using canny’s edge de- (c) Edge detection using edge detection
tector method from [14]
Fig. 2: Output comparison of canny’s and morphological edge detection technique from [14]
The mapping of lines into mathematical equations results obtained by applying hough transform on the
is achieved using a feature extraction technique called binary image shown in Fig. 2(c) with the points where
Hough Transform. Hough transform is used to extract maximum number of curves intersect. These parameters
features such as lines or curves in an image [15]. This is can then be used to calculate gradient and y-intercept of
achieved using the basic idea that if two or more set line using eq. (2) and (3).
of points are collinear, then they will have a unique
line associated with them. This line can be found by mi = tan θ (2)
fitting all possible number of lines and finding the one
which fits all these points. Using Cartesian coordinate
ci = y − mi x (3)
system, the computation required for this is huge and
in some cases the number of lines are infinitely large From the above equations it can also be noted that θ
and becomes difficult to model. On the contrary, using represents the orientation of the line with ρ being the
hough transform reduces the computational complexity perpendicular distance of the given line to the origin.
of modelling mathematical equations for these lines. In Further explanation of this can be found in [14].
this transform, each point in Cartesian coordinate system From Fig. 3, it is observed that maximum number of
is mapped to a parameter space defined by sinusoidal lines have a θ between -70 to -40 or 40 to 70 degrees. This
curves, which is given by: is due to the detection of prominent edge in the straight
path in Fig. 2. In addition to these lines, a noticeable
ρ = x cos θ + y sin θ (1) amount of lines are seen at θ equal to 0 and 90 degrees.
These lines are due to the horizontal and vertical edges
detected as a result of object which do not contribute
towards a continuous straight path, for example doors
and windows in this case. These lines can be discarded,
and the former set of lines is selected for computing the
guidance decision in the next step of this system. A major
problem faced in this step is the identification of occluded
edges and the interpolation of many disjoint parts of a
line into one single line. This problem is solved by using
hough transform as even if the line is made up of many
disjoint lines, hough transform always has single pair of
(ρi , θi ) corresponding to one line in Cartesian coordinate
system.
C. Guidance Decision
The vanishing point is a point to which all the edges of
Fig. 3: Hough transform of Fig. 2(c) with lines represented by
maximum number of curve intersections (shown here using small a straight path converge. In reality, these edges are paral-
white squares) lel to each other, however when looked at from a specific
point, they all seem to converge to one point. Using the
The collinear points correspond to sinusoidal curves mathematically mapped equations from previous step,
which intersect at some arbitrary value (ρ, θ). Computing this imaginary vanishing point can be found easily [16].
the points where maximum number of curves intersect, The gradient and y intercept corresponding to each point
gives the (ρi , θi ) parameters of a line. Fig. 3 shows the (ρi , θi ) in hough transform is used to solve equations
TABLE II: System accuracy on different sequences
is made and accordingly one of the three instructions
are given to the visually impaired person to correct his
path, i.e. go straight, go left and go right. The number of
deviation markers can be raised to make more decision
regions and specific and accurate guidance according
to transition of the vanishing point in either of these
regions. This decision making process is explained in
Figure 5 with all three possible situations.
III. Results
Fig. 4: Extraction of vanishing point from intersection of lines.
Points in blue are the intersection points, while point in magenta
The algorithm was tested in different lighting con-
color is the vanishing point calculated by taking median of all the ditions both outdoors and indoors. A laptop mounted
intersection points on a 2-D robotic platform was used for testing and
development of this algorithm. Later, a smart phone
application was developed using OpenCV
R libraries for
of different lines simultaneously with each other. These android and the algorithm was tested. For testing, the
points are not always inside the range of the frame as smart phone was mounted onto the chest of a user, so
vanishing point itself is imaginary point, therefore these that it was facing directly into the direction the person
intersection points are also imaginary and sometimes are was moving in. The guidance decision was then given
even out of the range of the image. This results in a to the user using an audio signal. This algorithm was
cluster of a lot of intersection points with many points tested in many straight paths, with and without other
near the actual vanishing point and some outliers as a pedestrians. It was observed that the algorithm correctly
result of edges which do not converge to the vanishing guided the person even if there were other pedestrians
point. The effect of these outliers can be minimized while occluding the path partially. This algorithm produced
finding vanishing point by taking median of all these an average execution rate of 20 frames per second on
points. Median filter ranks all the points and uses the Galaxy
R Nexus S smart phone.
region where many intersection points exist to define To test the accuracy of this algorithm in real world
vanishing point. Figure 4 shows the calculation of the environment, image sequence # 0 for [17] was used,
vanishing point on the frame from Fig. 2(a). The points which contained a total of 499 RGB frames from left
in blue show the intersection points of the lines and camera. These frames were first classified into one of
the magenta coloured point is the vanishing point which the three possible decision categories using the actual
arises by taking median of all these intersection points. location of vanishing point. The algorithm was then
The vanishing point is then used along with two run with maximum number of lines selection limit of
vertical markers, to determine three different decisions. 20 lines from hough transform. Out of 499 frames, 420
These decisions are based on the horizontal position of frames output had correct decisions. This made the
the vanishing point in one of the three regions defined overall system accuracy of 84.1% in real world path
by these two markers. This process of decision making with pedestrians. The algorithm was also tested on Hall
is based on the fact that vanishing point always stays in sequence and Highway sequence [18]. Table II presents
the middle of the path, as all the lines in a path converge the system accuracy for each of these sequences. The
to this point. In case the visually impaired person, with proposed approach produced 90.4% accuracy on Hall
camera facing directly in front of him, deviates from Sequence and 95.5% accuracy on Highway Sequence. The
the straight path, the frame from his camera shifts to results from some of the frames are shown in Fig. 6, 7, 8
either side depending on the direction of his deviation. and 9 at the end.
Since vanishing point arises from converging edges, it
always remains in the centre of the path. Hence with IV. Conclusion
a deviation in frame, this point also shifts towards either A guidance system for visually impaired person is
side. This deviation of the vanishing point makes it proposed. This system makes use of the vanishing point
move in different regions as defined by vertical markers. concept to make a guidance decision for the visually
Based on the vanishing point’s location in either of these impaired person. A real-time application on an android
regions, the decision about the heading of the person smart phone was developed for this guidance system,
(a) Decision -> Go Straight (b) Decision -> Turn Left (c) Decision -> Turn Right
Fig. 5: Guidance decision making process with two vertical decision markers (shown in black) and relative position of vanishing point(shown
by red markers)
giving an average execution of 20 frames per second and [7] J. Coughlan, R. Manduchi, and H. Shen, “Cell phone-based
an accuracy of 84.1% with pedestrians and above 90% wayfinding for the visually impaired,” in 1st International
Workshop on Mobile Vision, 2006.
without pedestrians. This difference in accuracies is due [8] V. Ivanchenko, J. Coughlan, W. Gerrey, and H. Shen, “Com-
to the fact that some sequences have a lot of occlusion puter vision-based clear path guidance for blind wheelchair
due to pedestrians and the background is also cluttered. users,” in Proceedings of the 10th international ACM SIGAC-
CESS conference on Computers and accessibility. ACM, 2008,
This system uses the edges from different surrounding pp. 291–292.
structures, hence if these edges are not present it loses [9] V. Ivanchenko, J. Coughlan, and H. Shen, “Crosswatch: a cam-
its accuracy. era phone system for orienting visually impaired pedestrians at
traffic intersections,” Computers Helping People with Special
This algorithm can also be used in many more applic- Needs, pp. 1122–1128, 2008.
ations involving navigation of different vehicles. Future [10] S. Se, “Zebra-crossing detection for the partially sighted,” in
work may look into paths which are not straight, in more Computer Vision and Pattern Recognition, 2000. Proceedings.
IEEE Conference on, vol. 2. IEEE, 2000, pp. 211–217.
complex environments and with a lot of variations in [11] V. Ivanchenko, J. Coughlan, and H. Shen, “Detecting and loc-
occlusion. The concept of interpolation of lines can be ating crosswalks using a camera phone,” in Computer Vision
used to cater for many real world scenarios where a large and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE
Computer Society Conference on. IEEE, 2008, pp. 1–8.
part of the edge information of paths is occluded. [12] E. Bayramo, N. Andersen, N. Poulsen, J. Andersen, and
There exist many reliable and robust methods for O. Ravn, “Mobile robot navigation in a corridor using visual
calculating vanishing points in a scene [19], [20] and [21]. odometry,” Control, 2009.
[13] L. Quan and R. Mohr, “Determining perspective structures
We plan to use a similar method to further improve the using hierarchical hough transform,” Pattern Recognition Let-
results of this algorithm, making it more reliable and ters, vol. 9, no. 4, pp. 279–286, 1989.
robust. [14] Y. Zhao, W. Gui, and Z. Chen, “Edge detection based on
multi-structure elements morphology,” in Intelligent Control
References and Automation, 2006. WCICA 2006. The Sixth World Con-
gress on, vol. 2. IEEE, 2006, pp. 9795–9798.
[1] W. Oui, E. Ng, and R. Khan, “An augmented reality’s frame- [15] R. Duda and P. Hart, “Use of the hough transformation to
work for mobile,” in Information Technology and Multimedia detect lines and curves in pictures,” Communications of the
(ICIM), 2011 International Conference on. IEEE, 2011, pp. ACM, vol. 15, no. 1, pp. 11–15, 1972.
1–4. [16] S. Barnard, “Interpreting perspective images,” Artificial intel-
[2] G. Takacs, V. Chandrasekhar, N. Gelfand, Y. Xiong, W. Chen, ligence, vol. 21, no. 4, pp. 435–462, 1983.
T. Bismpigiannis, R. Grzeszczuk, K. Pulli, and B. Girod, [17] A. Ess, B. Leibe, and L. V. Gool, “Depth and appearance
“Outdoors augmented reality on mobile phone using loxel- for mobile scene analysis,” in International Conference on
based visual feature organization,” in Proceeding of the 1st Computer Vision (ICCV’07), October 2007.
ACM international conference on Multimedia information [18] http://trace.eas.asu.edu/yuv/, “Yuv video sequences.”
retrieval. ACM, 2008, pp. 427–434. [19] J. Choi, W. Kim, H. Kong, and C. Kim, “Real-time van-
[3] V. Paelke and C. Reimann, “Vision-based interaction-a first ishing point detection using the local dominant orientation
glance at playing mr games in the real-world around us,” in signature,” in 3DTV Conference: The True Vision-Capture,
Proceedings of the 2nd International Workshop on Pervasive Transmission and Display of 3D Video (3DTV-CON), 2011.
Gaming Applications (PerGames) at ERVASIVE, vol. 2005, IEEE, 2011, pp. 1–4.
2005. [20] C. Rasmussen, “Grouping dominant orientations for ill-
[4] http://www.who.int/mediacentre/factsheets/fs282/en/, structured road following,” in Computer Vision and Pattern
“Statistics about blindness and eye disease,” vol. 2011, Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE
October 2011. Computer Society Conference on, vol. 1. IEEE, 2004, pp. I–
[5] H. Shen, K. Chan, J. Coughlan, and J. Brabyn, “A mobile 470.
phone system to find crosswalks for visually impaired pedes- [21] J. Tardif, “Non-iterative approach for fast and accurate van-
trians,” Technology and disability, vol. 20, no. 3, pp. 217–224, ishing point detection,” in Computer Vision, 2009 IEEE 12th
2008. International Conference on. IEEE, 2009, pp. 1250–1257.
[6] V. Ivanchenko, J. Coughlan, and H. Shen, “Staying in the
crosswalk: A system for guiding visually impaired pedestrians
at traffic intersections,” Assistive technology research series,
vol. 25, no. 2009, p. 69, 2009.
(a) Original image (b) Edge detection output (c) Guidance decision (vanishing point in
magenta color)
Fig. 6: Result: Pedestrian Sequence (Decision: Go Straight)
(a) Original image (b) Edge detection output (c) Guidance decision (vanishing point in
magenta color)
Fig. 7: Result: Pedestrian Sequence (Decision: Go Right)
(a) Original image (b) Edge detection output (c) Guidance decision (vanishing point in
magenta color)
Fig. 8: Result: Hall Sequence (Decision: Go Straight)
(a) Original image (b) Edge detection output (c) Guidance decision (vanishing point in
magenta color)
Fig. 9: Result: Highway Sequence (Decision: Go Straight)