Discount Factor Parametrization for Deep Reinforcement Learning for Inverted Pendulum Swing-up Control

Authors

  • Atikah Surriani Universitas Gadjah Mada
  • Hari Maghfiroh Universitas Sebelas Maret
  • Oyas Wahyunggoro Universitas Gadjah Mada
  • Adha Imam Cahyadi Universitas Gadjah Mada
  • Hanifah Rahmi Fajrin Soon Chun Hyang University

DOI:

https://doi.org/10.12928/biste.v7i1.10268

Keywords:

Discount Factor, Single Swing-up Inverted Pendulum, Deep Reinforcement Learning (DRL), Deep Deterministic Policy Gradient (DDPG)

Abstract

This study explores the application of deep reinforcement learning (DRL) to solve the control problem of a single swing-up inverted pendulum. The primary focus is on investigating the impact of discount factor parameterization within the DRL framework. Specifically, the Deep Deterministic Policy Gradient (DDPG) algorithm is employed due to its effectiveness in handling continuous action spaces. A range of discount factor values is tested to evaluate their influence on training performance and stability. The results indicate that a discount factor of 0.99 yields the best overall performance, enabling the DDPG agent to successfully learn a stable swing-up strategy and maximize cumulative rewards. These findings highlight the critical role of the discount factor in DRL-based control systems and offer insights for optimizing learning performance in similar nonlinear control problems.

Author Biography

Atikah Surriani, Universitas Gadjah Mada

References

M. Hesse, J. Timmermann, E. Hüllermeier, and A. Trächtler, “A reinforcement learning strategy for the swing-up of the double pendulum on a cart,” Procedia Manufacturing, vol. 24, pp. 15–20, 2018, https://doi.org/10.1016/j.promfg.2018.06.004.

C. A. M. Escobar, C. M. Pappalardo, and D. Guida, “A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole,” Applied Sciences, vol. 10, no. 24, pp. 1–19, 2020, https://doi.org/10.3390/app10249013.

T. Morimura, H. Hachiya, M. Sugiyama, T. Tanaka, and H. Kashima, Parametric return density estimation for reinforcement learning. arXiv preprint arXiv:1203.3497, 2012, https://doi.org/10.48550/arXiv.1203.3497.

S. Meyn. Control systems and reinforcement learning. Cambridge University Press. 2022. https://books.google.co.id/books?hl=id&lr=&id=UZNsEAAAQBAJ.

Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, “Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation,” Robotics and Autonomous Systems, vol. 112, pp. 72–83, 2019, https://doi.org/10.1016/j.robot.2018.11.004.

S. Bhagat, H. Banerjee, Z. H. Tse, and H. Ren, “Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges,” Robotics, vol. 8, no. 1, p. 4, 2019, https://doi.org/10.3390/robotics8010004.

V. François-Lavet, R. Fonteneau, and D. Ernst, “How to discount deep reinforcement learning: Towards new dynamic strategies. arXiv preprint arXiv:1512.02011, 2015, https://doi.org/10.48550/arXiv.1512.02011.

R. Liessner, J. Schmitt, A. Dietermann, and B. Bäker, “Hyperparameter optimization for deep reinforcement learning in vehicle energy management,” in Proc. 11th Int. Conf. Agents and Artificial Intelligence (ICAART), vol. 2, pp. 134–144, 2019, https://doi.org/10.5220/0007364701340144.

R. He, H. Lv, S. Zhang, D. Zhang, and H. Zhang, “Lane following method based on improved DDPG algorithm,” Sensors, vol. 21, no. 14, 2021, https://doi.org/10.3390/s21144827.

R. Amit, R. Meir, and K. Ciosek, “Discount factor as a regularizer in reinforcement learning,” In International conference on machine learning, pp. 269-278, 2020, https://doi.org/10.48550/arXiv.2007.02040.

A. Sharma, R. Gupta, K. Lakshmanan, and A. Gupta, “Transition based discount factor for model free algorithms in reinforcement learning,” Symmetry, vol. 13, no. 7, 2021, https://doi.org/10.3390/sym13071197.

A. Franceschetti, E. Tosello, N. Castaman, and S. Ghidoni, "Robotic arm control and task training through deep reinforcement learning," in Proc. Int. Conf. Intell. Autonomous Syst., pp. 532–550, 2021, https://doi.org/10.1007/978-3-030-95892-3_41.

A. Franceschetti, E. Tosello, and N. Castaman, “Robotic arm control and task training through deep reinforcement learning,” In International Conference on Intelligent Autonomous Systems, pp. 532-550, 2021, https://doi.org/10.1007/978-3-030-95892-3_41.

J. Baek, H. Jun, J. Park, H. Lee, and S. Han, “Sparse variational deterministic policy gradient for continuous real-time control,” IEEE Trans. Ind. Electron., vol. 68, no. 10, pp. 9800–9810, 2020, https://doi.org/10.1109/TIE.2020.3021607.

A. Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.". 2022. https://books.google.co.id/books?hl=id&lr=&id=X5ySEAAAQBAJ.

Y. Zheng, S. W. Luo, and Z. A. Lv, “Active exploration planning in reinforcement learning for inverted pendulum system control,” in Proc. Int. Conf. Machine Learning and Cybernetics (ICMLC), vol. 2006, no. Aug., pp. 2805–2809, 2006, https://doi.org/10.1109/ICMLC.2006.259002.

A. Zeynivand and H. Moodi, "Swing-up Control of a Double Inverted Pendulum by Combination of Q-Learning and PID Algorithms," 2022 8th International Conference on Control, Instrumentation and Automation (ICCIA), pp. 1-5, 2022, https://doi.org/10.1109/ICCIA54998.2022.9737201.

D. R. Rinku and M. AshaRani, “Reinforcement learning based multi-core scheduling (RLBMCS) for real-time systems,” Int. J. Electr. Comput. Eng., vol. 10, no. 2, pp. 1805–1813, 2020, https://doi.org/10.11591/ijece.v10i2.pp1805-1813.

T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015, https://doi.org/10.48550/arXiv.1509.02971.

S. Wen, J. Chen, S. Wang, H. Zhang, and X. Hu, “Path planning of humanoid arm based on deep deterministic policy gradient,” in Proc. IEEE Int. Conf. Robot. Biomimetics (ROBIO), pp. 1755–1760, 2018, https://doi.org/10.1109/ROBIO.2018.8665248.

L. Fan et al., “Optimal scheduling of microgrid based on deep deterministic policy gradient and transfer learning,” Energies, vol. 14, no. 3, pp. 1–15, 2021, https://doi.org/10.3390/en14030584.

A. Ilyas, L. Engstrom, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, and A. Madry, “A closer look at deep policy gradients,” arXiv preprint arXiv:1811.02553, 2018, https://doi.org/10.48550/arXiv.1811.02553.

C. A. M. Escobar, C. M. Pappalardo, and D. Guida, “A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole,” Applied Sciences, vol. 10, no. 24, pp. 1–19, 2020, https://doi.org/10.3390/app10249013.

O. Sigaud and F. Stulp, “Policy search in continuous action domains: An overview,” Neural Networks, vol. 113, pp. 28–40, 2019, https://doi.org/10.1016/j.neunet.2019.01.011.

M. Riedmiller, J. Peters, and S. Schaal, “Evaluation of policy gradient methods and variants on the cart-pole benchmark,” in Proc. IEEE Symp. Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 254–261, 2007, https://doi.org/10.1109/ADPRL.2007.368196.

V. François-Lavet et al., “An introduction to deep reinforcement learning,” Found. Trends Mach. Learn., vol. 11, no. 3–4, pp. 219–354, 2018, https://doi.org/10.1561/2200000071.

E. H. H. Sumiea et al., "Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks," in IEEE Access, vol. 11, pp. 139771-139784, 2023, https://doi.org/10.1109/ACCESS.2023.3341507.

D. Bates, “A hybrid approach for reinforcement learning using virtual policy gradient for balancing an inverted pendulum,” arXiv preprint arXiv:2102.08362, 2021, https://doi.org/10.48550/arXiv.2102.08362.

R. Zeng et al., “Manipulator control method based on deep reinforcement learning,” in Chinese Control and Decision Conference (CCDC), pp. 415–420, 2020, https://doi.org/10.1109/CCDC49329.2020.9164440.

S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 3389–3396, 2017, https://doi.org/10.1109/ICRA.2017.7989385.

C. Liu, A. G. Lonsberry, M. J. Nandor, M. L. Audu, A. J. Lonsberry, and R. F. Kirsch, “Reinforcement learning-based lower extremity exoskeleton control for sit-to-stand assistance,” in Proc. IEEE Int. Conf. Rehabil. Robot. (ICORR), pp. 328–333, 2019, https://doi.org/10.1109/ICORR.2019.8779493.

B. Singh, R. Kumar, and V. P. Singh, “Reinforcement learning in robotic applications: a comprehensive survey,” Artificial Intelligence Review, vol. 55, no. 2, pp. 945-990, 2022, https://doi.org/10.1007/s10462-021-09997-9.

Y. Yang, Z. Guo, H. Xiong, D. -W. Ding, Y. Yin and D. C. Wunsch, "Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning," in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 12, pp. 3735-3747, Dec. 2019, https://doi.org/10.1109/TNNLS.2019.2897814.

Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” In International conference on machine learning, pp. 1329-1338, 2016, https://doi.org/10.48550/arXiv.1604.06778.

T. Seyde, P. Werner, W. Schwarting, I., Gilitschenski, M. Riedmiller, D. Rus, and M. Wulfmeier, “Solving continuous control via q-learning,” arXiv preprint arXiv:2210.12566, 2022, https://doi.org/10.48550/arXiv.2210.12566.

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” In International conference on machine learning, pp. 1587-1596, 2018, https://doi.org/10.48550/arXiv.1802.09477.

Downloads

Published

2025-04-12

How to Cite

[1]
A. Surriani, H. Maghfiroh, O. Wahyunggoro, A. I. Cahyadi, and H. R. Fajrin, “Discount Factor Parametrization for Deep Reinforcement Learning for Inverted Pendulum Swing-up Control”, Buletin Ilmiah Sarjana Teknik Elektro, vol. 7, no. 1, pp. 56–67, Apr. 2025.

Issue

Section

Artikel

Most read articles by the same author(s)