Elsevier

Robotics and Autonomous Systems

Volume 72, October 2015, Pages 29-36
Robotics and Autonomous Systems

The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments

https://doi.org/10.1016/j.robot.2015.04.003Get rights and content

Highlights

  • An integration of EKF-SLAM and Q-learning algorithm for navigation is presented.

  • The clustering algorithm is applied on laser sensor data at one observation.

  • FOPI is designed to minimize the motion deviation during NAO’s walking.

  • The simulations and experiments prove the proposed method is valid.

Abstract

The two important problems of SLAM and Path planning are often addressed independently. However, both are essential to achieve successfully autonomous navigation. In this paper, we aim to integrate the two attributes for application on a humanoid robot. The SLAM problem is solved with the EKF-SLAM algorithm whereas the path planning problem is tackled via Q-learning. The proposed algorithm is implemented on a NAO equipped with a laser head. In order to differentiate different landmarks at one observation, we applied clustering algorithm on laser sensor data. A Fractional Order PI controller (FOPI) is also designed to minimize the motion deviation inherent in during NAO’s walking behavior. The algorithm is tested in an indoor environment to assess its performance. We suggest that the new design can be reliably used for autonomous walking in an unknown environment.

Introduction

Humanoid robots have attracted significant research and public interest over the last two decades particularly due to their “human-like” appearance  [1]. The important problems that have been addressed by many researchers are the humanoid robot ability to navigate autonomously from point A to point B, seamless interaction with its environment, and take appropriate actions in different scenarios  [2], [3]. Humanoid robots are expected to share tasks in indoor and outdoor environments such as houses, offices and hospitals  [4], [5]. In the past few years, many methods have been proposed to solve each of the above tasks separately, such as SLAM algorithms for the recognition in between  [6], [7] and localization in an unknown environment or algorithms based on artificial intelligence for navigation (path planning or obstacle avoidance) in between  [8], [9]. However, it is imperative for a humanoid robot to solve all sub-tasks together in order to be able to cope with real environments independently.

Current research suggests that the reinforcement learning [10], [11], [12], [13] could provide a solution to navigation in unknown environments albeit indirectly. Authors in  [10] proposed an algorithm whereby a humanoid robot (NAO) benefited from LandMarks to localize itself within an unknown environment. Though this design facilitated some aspects of navigation, it had several limitations and the robot could not realize autonomous navigation in the real environment. The reinforcement learning and Q-learning algorithm and their applications were introduced in  [11], [12], [13]. These papers did not actually deal with the path planning problem in the context of this paper. The intention in this paper is to simultaneously apply SLAM  [14] and Q-learning to address the problem of navigation and obstacle avoidance in one unified algorithm. The two methods are the most widely used approaches for the implementation for autonomous robotic systems. It takes advantage of the SLAM algorithm to make sense of the environment during the exploration phase and adds a parallel process of navigation-policy learning by reinforcement learning.

In this study, we use the humanoid robot NAO, which is full-body functional, convenient to program and affordable  [15], [16]. It is a small body (56 cm), and lightweight (4.8 kg) biped robot with 25 degrees of freedom and two cameras, an inertial measuring unit, sonar sensors in its chest, and force-sensitive resistors under its feet  [17]. It is also a low cost humanoid robot which can be used in many research and development studies. We utilized a nonstandard version of NAO which is equipped with a laser sensor head. We implemented an EKF-SLAM to achieve navigation in the general indoor environments. In order to compensate deviation during walking, we designed a fractional order PI (FOPI). Finally, the reinforcement learning was used for the path planning when the NAO faced with unknown obstacles.

The paper is organized as follows. Section  2 discusses an overview of the algorithms used. Section  3 presents the path planning approach for the NAO platform. Section  4 presents the simulation results and the experiments for the integration of both algorithms. Section  5 concludes the paper.

Section snippets

EKF-SLAM

SLAM is a process by which a robot can build a map of an environment and use this map to deduce its location at the same time. In SLAM, both the trajectory of the platform and the location of all landmarks are estimated online without the need for any priori knowledge of location  [1], [4]. EKF-SLAM was presented by Smith and Cheeseman in 1987 and has been used extensively since the publication of their seminal paper  [1]. The main steps of SLAM include robot motion prediction, new landmarks

Obstacle modeling

In the field of engineering application and artificial intelligence, many mathematical models of problems can be referred to as path planning problem. This problem is to find a path which can avoid the obstacles. The expressions of traditional environment model have grid representation, space graphic method, minimum polygon method, etc.  [19]. Minimum polygon method directly deals with obstacles using minimum polygons, and describes the obstacles as polygons, which simplifies the solution

Simulation and experiment results

In order to evaluate the proposed method, we conduct extensive simulations and experiments. Here, we will present the full EKF-SLAM for NAO with two unknown landmarks set in its area. Fig. 4 shows the scenario of this experiment where the robot moves straight in the pointed path. The estimated position of the robot and landmarks of this experiment are illustrated in Fig. 4.

The observation function in NAO can be accomplished by the laser sensor on the NAO head. There is a main module named

Conclusions

In this paper, the integration of EKF-SLAM and Q-learning was proposed for navigation in unknown environment for the humanoid robot NAO. Q-learning algorithm was used for path planning in unknown environments, while EKF-SLAM algorithm was used for localization and mapping the environment. Q-learning was added to the EKF-SLAM algorithm to obtain autonomous obstacle avoidance ability in the indoor environment. The simulation results and experimental results showed the proposed method could

Acknowledgments

The work was partly supported by the National Natural Science Foundation of China (Project No. 61473248), the Natural Science Foundation of Hebei Province of China under the project No. F2014203095, China Postdoctoral Science Foundation (Project No. 2014M560196), Scholars Studying Abroad Science and Technology Activities (Project No. C201400355), the Young Teacher of Yanshan University under the project No. 13LGA007.

Shuhuan Wen was born in Heilongjiang, China, on July 16, 1972. She received the Ph.D. degree in control theory and control engineering from the Yanshan University, Qinhuangdao, China in 2005.

She is currently a Professor of automatic control in the Department of Electric Engineering, Yanshan University. She has coauthored one book, and about 40 papers. Her research interests include humanoid robot control, force/motion control of parallel robot, Fuzzy control, 3-D object recognization and

References (22)

  • M.A. Akhtaruzzaman, A.A. Shafie, Evolution of humanoid robot and contribution of various countries in advancing the...
  • H. Durrant-Whyte et al.

    Simultaneous localization and mapping (SLAM): Part I: the essential algorithms

    IEEE Robot. Autom. Mag.

    (2006)
  • D. Gouaillier et al.

    Omni-directional closed-loop walk for NAO

  • A. Hornung et al.

    Humanoid robot localization in complex indoor environments

  • J. Xu, C. Wei, C. Wang, et al. An approach to navigation for the humanoid robot nao in domestic environments, in:...
  • H. Durrant-Whyte et al.

    Simultaneous localization and mapping (SLAM): Part II: State of the art

    IEEE Robot. Autom. Mag.

    (2006)
  • S. Oßwald et al.

    Learning reliable and efficient navigation with a humanoid

  • K. Macek et al.

    A reinforcement learning approach to obstacle avoidance of mobile robots

  • Y. Zhou et al.

    Self-learning in obstacle avoidance of a mobile robot via dynamic self-generated fuzzy Q-learning

  • N. Navarro-Guerrero et al.

    Real-world reinforcement learning for autonomous humanoid robot docking

    Robot. Auton. Syst.

    (2012)
  • C.J.C.H. Watkins et al.

    Q-learning

    Mach. Learn.

    (1992)
  • Cited by (44)

    • iRotate: Active Visual SLAM for Omnidirectional Robots

      2022, Robotics and Autonomous Systems
      Citation Excerpt :

      Recently, automated learning applied to SLAM has gained momentum. Several methods applying deep learning and deep reinforcement learning emerged, like [18–20]. Some of those apply learning to both the SLAM backend and to the active SLAM procedure itself [18].

    • An Improved UFastSLAM With Generalized Correntropy Loss and Adaptive Genetic Resampling

      2024, International Journal of Control, Automation and Systems
    View all citing articles on Scopus

    Shuhuan Wen was born in Heilongjiang, China, on July 16, 1972. She received the Ph.D. degree in control theory and control engineering from the Yanshan University, Qinhuangdao, China in 2005.

    She is currently a Professor of automatic control in the Department of Electric Engineering, Yanshan University. She has coauthored one book, and about 40 papers. Her research interests include humanoid robot control, force/motion control of parallel robot, Fuzzy control, 3-D object recognization and reconstruction.

    Dr. Wen was a Visiting Scholar of the Ottawa University, Carleton University and Simon Fraser University in Canada from 2011 to 2013.

    Xiao Chen was born in Hebei, China, in August, 1988. She received the Master degree from the Department of Electric Engineering, Yanshan University in 2014.

    She has coauthored about three journal papers. Her research interests are focused on humanoid robot control, Fractional system and RL algorithm.

    Chunli Ma was born in ShanXi, China, in May, 1989. She received the Bachelor degree from the Department of Control Engineering, Northeastern University at Qinhuangdao in 2014.

    She has coauthored one journal paper. Her research interests are focused on humanoid robot control, the simultaneous localization and mapping(SLAM), 3-D object recognization and reconstruction.

    H.K. Lam received the B.Eng. (Hons.) and Ph.D. degrees from the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, in 1995 and 2000, respectively. During the period of 2000 and 2005, he worked with the Department of Electronic and Information Engineering at The Hong Kong Polytechnic University as Post-Doctoral Fellow and Research Fellow respectively. He joined as a Lecturer at King’s College London in 2005 and is currently a Reader.

    His current research interests include intelligent control systems and computational intelligence. He has served as a program committee member and international advisory board member for various international conferences and a reviewer for various books, international journals and international conferences. He is an associate editor for IEEE Transactions on Fuzzy Systems, IET Control Theory and Applications, International Journal of Fuzzy Systems and Neurocomputing; and guest editor for a number of international journals. He is on the editorial board of Journal of Intelligent Learning Systems and Applications, Journal of Applied Mathematics, Mathematical Problems in Engineering, Modelling and Simulation in Engineering, Annual Review of Chaos Theory, Bifurcations and Dynamical Systems and The Open Cybernetics and Systemics Journal. He is an IEEE senior member.

    He is the coeditor for two edited volumes: Control of Chaotic Nonlinear Circuits (World Scientific, 2009) and Computational Intelligence and Its Applications (World Scientific, 2012), and the coauthor of the monograph: Stability Analysis of Fuzzy-Model-Based Control Systems (Springer, 2011). His co-authored paper (J.S. Dai, H.K. Lam and S.M. Vahed, “Type Identification for Autonomous Excavation Based on Dissipation Energy”, Proceedings of the Institution of Mechanical Engineers, Part I, Journal of Systems and Control Engineering, vol. 225, no. 1, pp. 35–50, 2011) received SAGE Best Paper Award in 2011.

    Shaoyang Hua was born in Hebei, China, in January, 1992. He received the Bachelor degree from the Department of Electric Engineering, Yanshan University in 2014.

    He has coauthored one journal paper. His research interests are focused on humanoid robot control, 3-D object recognization and reconstruction.

    View full text