Deep-Learning-Based Stair Detection Using 3D Point Cloud Data For Preventing Walking Accidents of The Visually Impaired
Deep-Learning-Based Stair Detection Using 3D Point Cloud Data For Preventing Walking Accidents of The Visually Impaired
ABSTRACT Visually impaired individuals worldwide are at a risk of accidents while walking. In particular,
falling from a raised place, such as stairs, can lead to serious injury. Therefore, we attempted to determine
the best accident prevention method that can notify visually impaired individuals of the existence, height,
and step information when they approach stairs. In this study, we have investigated stair detection through
deep learning. First, the three-dimensional point cloud data generated from depth information are learned
by deep learning. Stairs were detected using the results of deep learning. To apply the point cloud data for
deep learning-based training, we proposed preprocessing stages to reduce the weight of the point cloud data.
The accuracy of stair detection was 97.3%, which is the best performance compared to other conventional
methods. Therefore, we confirmed the effectiveness of the proposed method.
INDEX TERMS Visually impaired support systems, depth sensor, 3D point cloud data, deep-learning,
PointNet.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 56249
H. Matsumura, C. Premachandra: Deep-Learning-Based Stair Detection Using 3D Point Cloud Data
support systems for the physically challenged [20], [21]. The D. DEEP LEARNING FOR 3D POINT CLOUD DATA
specifications of the depth camera are listed in Table 1. 3D point cloud data is a method of describing 3D shapes
expressed as a set of 3D points (x, y, and z). 3D point cloud
III. STAIRS DETECTION USING DEEP LEARNING OF 3D data has two important properties that must be considered
POINT CLOUDS when handling it in deep learning; order and translation
A. PREPARING THE DATA SET invariances [24].
In this study, we first generated 3D point-cloud data based on First, let us discuss order invariance. It is the property
the depth information captured by a depth camera. Thereafter, that the output is invariant even if the order of the points
considering the reduction in processing time, we performed is changed and input into the model. Because point cloud
downsampling [22] on the number of points in each point data does not have a fixed format and the order of points
cloud data sample to prepare a lightweight 3D point cloud cannot be assigned to each element, the order of input to the
dataset. model is arbitrary. Therefore, for a point cloud of N points,
We prepared 1000 training and 500 validation datasets for there will be N! different inputs, but the object represented
each of the above classes and conducted experiments with by the point cloud will be similar, even if the order of the
3000 training and 1500 validation datasets. Fig. 4 shows inputs changes. Therefore, a deep learning model is required
the sample 3D point cloud data stairs with their 2D image to output the same value each time for different permutations
(Fig. 4 (a)). Fig. 4 (b) shows an example of depth data of point cloud inputs. Next, we discuss translation invariance.
from the Realsence while Fig. 4 (c) shows extraction of Translation invariance is a property in which the output
the approximate stair region by Open 3D. Fig. 4 (d) shows is invariant, even if point cloud data are input to a deep
the down-sampled results of the depth image in Fig. 4(c). The learning model under parallel or rotational translation. First,
down-sampling process is explained in the next sub-section. the invariance to translation is expressed by (2) as follow:
E. POINTNET output will be the same as the output before the replacement,
PointNet is a deep-learning model that considers the order because the function outputs the largest element.
and movement invariances described above [19]. In con- Next, we describe the movement invariance of PointNet,
ventional 3D convolutional neural networks, point clouds which estimates the affine transformation matrix of the input
are voxelized and one layer is treated as an image that point cloud and multiplies it by the transformation matrix to
is used as the input. On the other hand, PointNet accepts obtain approximate movement invariance. The structure of
point clouds as input, which facilitates the handling of point this network is illustrated in Fig. 5. The affine transformation
cloud data and solves the shortcomings of conventional matrix is a transformation of rotation, translation, and scaling,
methods. and can be represented by a single 3 × 3 matrix. The affine
This section describes how PointNet considers the two transformation matrix is estimated using T-Net [25], and by
points of order and movement invariances, as described multiplying the input point cloud by this estimated matrix,
above. A symmetric function is a function whose value the output does not change even if the point cloud data
does not change even if the order of the variables is are transformed by translation or rotation. Here, T-Net is a
changed [19]. PointNet obtains order invariance by using a network consisting of feature extraction, max-pooling, and
symmetric function called MaxPooling, which outputs the total joins.
largest element among the input elements. In other words, We describe the flow of the PointNet classifications.
even if the input elements of MaxPooling are replaced, the The structure of this network is illustrated in Fig. 6.
with a very high accuracy rate of 97.3% and exhibited the best [20] Y. Endo and C. Premachandra, ‘‘Development of a bathing accident
performance compared to other conventional methods. monitoring system using a depth sensor,’’ IEEE Sensors Lett., vol. 6, no. 2,
pp. 1–4, Feb. 2022.
[21] Y. Ito, C. Premachandra, S. Sumathipala, H. W. H. Premachandra, and
REFERENCES B. S. Sudantha, ‘‘Tactile paving detection by dynamic thresholding based
[1] Fact Sheet Blindness and Vision Impairment, World Health Org., Geneva, on HSV space analysis for developing a walking support system,’’ IEEE
Switzerland, 2019. Access, vol. 9, pp. 20358–20367, 2021.
[2] Results of the 2016 Survey on Difficulties in Daily Life (National Survey [22] E. Nezhadarya, E. Taghavi, R. Razani, B. Liu, and J. Luo, ‘‘Adaptive
on Children and Persons With Disabilities at Home), Department Health hierarchical down-sampling for point cloud classification,’’ in Proc. CVPR,
Welfare Persons Disabilities, Social Welfare War Victims Relief Bureau, Jun. 2020, pp. 12956–12964.
Ministry Health, Labour Welfare, 2018. [23] Z. Yang, Y. Sun, S. Liu, X. Qi, and J. Jia, ‘‘CN: Channel normalization for
[3] R. R. A. Bourne, S. R. Flaxman, T. Braithwaite, M. V. Cicinelli, A. Das, point cloud recognition,’’ in Proc. ECCV, 2020, pp. 600–616.
and J. B. Jonas, ‘‘Magnitude, temporal trends, and projections of the [24] Y. Liu, C. Wang, Z. Song, and M. Wang, ‘‘Efficient global point cloud
global prevalence of blindness and distance and near vision impairment: registration by matching rotation invariant features through translation
A systematic review and meta-analysis,’’ Lancet Global Health, vol. 5, search,’’ in Proc. ECCV, 2018, pp. 448–463.
no. 9, pp. 888–897, Sep. 2017. [25] J. Kossaifi, A. Bulat, G. Tzimiropoulos, and M. Pantic, ‘‘T-Net:
[4] (2020). International Guide Dog Federation. Guide Dogs Parametrizing fully convolutional nets with a single high-order tensor,’’ in
Worldwide. [Online]. Available: https://www.guidedogs.org.U.K./ab Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
out%20us/what%20we%20do/guide%20dogs%20worldwide pp. 7822–7831.
[5] J. Ballemans, G. I. Kempen, and G. R. Zijlstra, ‘‘Orientation and mobility
training for partially-sighted older adults using an identification cane: A
systematic review,’’ Clin. Rehabil., vol. 25, no. 10, pp. 880–891, Oct. 2011.
[6] A. Nobuyuki and H. Norihisa, ‘‘Walking accident national survey to
maintain visually handicapped persons walking environment,’’ Bull.
Hachinohe Inst. Technol., vol. 24, pp. 81–92, Feb. 2005.
[7] N. Abe and N. Hashimoto, ‘‘Walking accident national survey to maintain
visually handicapped persons walking environment,’’ Bull. Hachinohe Inst. HARUKA MATSUMURA received the B.S.
Technol., vol. 24, pp. 81–92, Feb. 2004. degree in electronic engineering from the Shibaura
[8] N. Molton, S. Se, M. Brady, D. Lee, and P. Probert, ‘‘Robotic sensing for Institute of Technology, Tokyo, Japan, in 2022.
the partially sighted,’’ Robot. Auto. Syst., vol. 26, nos. 2–3, pp. 185–201, Her research interests include depth image
Feb. 1999. processing, visually impaired support systems, and
[9] U. Patil, A. Gujarathi, A. Kulkarni, A. Jain, L. Malke, R. Tekade, 3D vision.
K. Paigwar, and P. Chaturvedi, ‘‘Deep learning based stair detection and
statistical image filtering for autonomous stair climbing,’’ in Proc. 3rd
IEEE Int. Conf. Robotic Comput. (IRC), Feb. 2019, pp. 159–166.
[10] S. Carbonara and C. Guaragnella, ‘‘Efficient stairs detection algorithm
assisted navigation for vision impaired people,’’ in Proc. IEEE Int. Symp.
Innov. Intell. Syst. Appl. (INISTA), Jun. 2014, pp. 313–318.
[11] A. Ramteke, B. Parabattina, and P. K. Das, ‘‘A neural network based
technique for staircase detection using smart phone images,’’ in Proc.
6th Int. Conf. Wireless Commun., Signal Process. Netw. (WiSPNET),
Mar. 2021, pp. 374–379. CHINTHAKA PREMACHANDRA (Senior Mem-
[12] E. Mihankhah, A. Kalantari, E. Aboosaeedan, H. D. Taghirad, S. Ali, and ber, IEEE) was born in Sri Lanka. He received
A. Moosavian, ‘‘Autonomous staircase detection and stair climbing for
the B.Sc. and M.Sc. degrees from Mie University,
a tracked mobile robot using fuzzy controller,’’ in Proc. IEEE Int. Conf.
Tsu, Japan, in 2006 and 2008, respectively, and
Robot. Biomimetics, Feb. 2009, pp. 1980–1985.
[13] C. Zhong, Y. Zhuang, and W. Wang, ‘‘Stairway detection using Gabor filter the Ph.D. degree from Nagoya University, Nagoya,
and FFPG,’’ in Proc. Int. Conf. Soft Comput. Pattern Recognit. (SoCPaR), Japan, in 2011.
Oct. 2011, pp. 578–582. From 2012 to 2015, he was an Assistant Profes-
[14] S. Murakami, M. Shimakawa, K. Kivota, and T. Kato, ‘‘Study on stairs sor with the Department of Electrical Engineering,
detection using RGB-depth images,’’ in Proc. Joint 7th Int. Conf. Soft Faculty of Engineering, Tokyo University of
Comput. Intell. Syst. (SCIS) 15th Int. Symp. Adv. Intell. Syst. (ISIS), Science, Tokyo, Japan. From 2016 to 2017, he was
Dec. 2014, pp. 699–702. an Assistant Professor. From 2018 to 2022, he was an Associate Professor
[15] R. Munoz, X. Rong, and Y. Tian, ‘‘Depth-aware indoor staircase detection with the Department of Electronic Engineering, School of Engineering,
and recognition for the visually impaired,’’ in Proc. IEEE Int. Conf. Shibaura Institute of Technology, Tokyo. In 2022, he was promoted to a
Multimedia Expo Workshops (ICMEW), Sep. 2016, pp. 1–6. Professor with the Department of Electronic Engineering, Graduate School
[16] S. Wang, H. Pan, C. Zhang, and Y. Tian, ‘‘RGB-D image-based detection of Engineering, Shibaura Institute of Technology, where he is currently the
of stairs, pedestrian crosswalks and traffic signs,’’ J. Vis. Commun. Image Manager of the Image Processing and Robotic Laboratory. His research
Represent., vol. 25, no. 2, pp. 263–272, Feb. 2014. interests include AI, UAV, image processing, audio processing, intelligent
[17] M. Hayami and M. Hild, ‘‘Detection of stairs using stereo images as a
transport systems (ITS), and mobile robotics.
walking aid for visually impaired persons,’’ in Proc. Conf. Inf. Process.
Dr. Premachandra is a member of IEICE, Japan; SICE, Japan; and
Soc. Japan, 2010.
[18] (2021). Intel RealSense Technology. Intel RealSense Camera SOFT, Japan. He received the FIT Best Paper Award and the FIT Young
D400 Series Product Family Datasheet. [Online]. Available: Researchers Award from IEICE and IPSJ, Japan, in 2009 and 2010,
https://www.intelrealsense.com/wp-content/uploads/2020/06/Intel- respectively. He was a recipient of the IEEE Japan Medal, in 2022. He has
RealSense-D400-Series-Datasheet-June-2020.pdf served many international conferences and journals as a steering committee
[19] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, ‘‘PointNet: member and an editor, respectively. He is the Founding Chair of the
Deep learning on point sets for 3D classification and segmentation,’’ in International Conference on Image Processing and Robotics (ICIPRoB)
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, which is technically co-sponsored by the IEEE.
pp. 652–660.