Discount Factor Parametrization for Deep Reinforcement Learning for Inverted Pendulum Swing-up Control
DOI:
https://doi.org/10.12928/biste.v7i1.10268Keywords:
Discount Factor, Single Swing-up Inverted Pendulum, Deep Reinforcement Learning (DRL), Deep Deterministic Policy Gradient (DDPG)Abstract
This study explores the application of deep reinforcement learning (DRL) to solve the control problem of a single swing-up inverted pendulum. The primary focus is on investigating the impact of discount factor parameterization within the DRL framework. Specifically, the Deep Deterministic Policy Gradient (DDPG) algorithm is employed due to its effectiveness in handling continuous action spaces. A range of discount factor values is tested to evaluate their influence on training performance and stability. The results indicate that a discount factor of 0.99 yields the best overall performance, enabling the DDPG agent to successfully learn a stable swing-up strategy and maximize cumulative rewards. These findings highlight the critical role of the discount factor in DRL-based control systems and offer insights for optimizing learning performance in similar nonlinear control problems.
References
M. Hesse, J. Timmermann, E. Hüllermeier, and A. Trächtler, “A reinforcement learning strategy for the swing-up of the double pendulum on a cart,” Procedia Manufacturing, vol. 24, pp. 15–20, 2018, https://doi.org/10.1016/j.promfg.2018.06.004.
C. A. M. Escobar, C. M. Pappalardo, and D. Guida, “A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole,” Applied Sciences, vol. 10, no. 24, pp. 1–19, 2020, https://doi.org/10.3390/app10249013.
T. Morimura, H. Hachiya, M. Sugiyama, T. Tanaka, and H. Kashima, Parametric return density estimation for reinforcement learning. arXiv preprint arXiv:1203.3497, 2012, https://doi.org/10.48550/arXiv.1203.3497.
S. Meyn. Control systems and reinforcement learning. Cambridge University Press. 2022. https://books.google.co.id/books?hl=id&lr=&id=UZNsEAAAQBAJ.
Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, “Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation,” Robotics and Autonomous Systems, vol. 112, pp. 72–83, 2019, https://doi.org/10.1016/j.robot.2018.11.004.
S. Bhagat, H. Banerjee, Z. H. Tse, and H. Ren, “Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges,” Robotics, vol. 8, no. 1, p. 4, 2019, https://doi.org/10.3390/robotics8010004.
V. François-Lavet, R. Fonteneau, and D. Ernst, “How to discount deep reinforcement learning: Towards new dynamic strategies. arXiv preprint arXiv:1512.02011, 2015, https://doi.org/10.48550/arXiv.1512.02011.
R. Liessner, J. Schmitt, A. Dietermann, and B. Bäker, “Hyperparameter optimization for deep reinforcement learning in vehicle energy management,” in Proc. 11th Int. Conf. Agents and Artificial Intelligence (ICAART), vol. 2, pp. 134–144, 2019, https://doi.org/10.5220/0007364701340144.
R. He, H. Lv, S. Zhang, D. Zhang, and H. Zhang, “Lane following method based on improved DDPG algorithm,” Sensors, vol. 21, no. 14, 2021, https://doi.org/10.3390/s21144827.
R. Amit, R. Meir, and K. Ciosek, “Discount factor as a regularizer in reinforcement learning,” In International conference on machine learning, pp. 269-278, 2020, https://doi.org/10.48550/arXiv.2007.02040.
A. Sharma, R. Gupta, K. Lakshmanan, and A. Gupta, “Transition based discount factor for model free algorithms in reinforcement learning,” Symmetry, vol. 13, no. 7, 2021, https://doi.org/10.3390/sym13071197.
A. Franceschetti, E. Tosello, N. Castaman, and S. Ghidoni, "Robotic arm control and task training through deep reinforcement learning," in Proc. Int. Conf. Intell. Autonomous Syst., pp. 532–550, 2021, https://doi.org/10.1007/978-3-030-95892-3_41.
A. Franceschetti, E. Tosello, and N. Castaman, “Robotic arm control and task training through deep reinforcement learning,” In International Conference on Intelligent Autonomous Systems, pp. 532-550, 2021, https://doi.org/10.1007/978-3-030-95892-3_41.
J. Baek, H. Jun, J. Park, H. Lee, and S. Han, “Sparse variational deterministic policy gradient for continuous real-time control,” IEEE Trans. Ind. Electron., vol. 68, no. 10, pp. 9800–9810, 2020, https://doi.org/10.1109/TIE.2020.3021607.
A. Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.". 2022. https://books.google.co.id/books?hl=id&lr=&id=X5ySEAAAQBAJ.
Y. Zheng, S. W. Luo, and Z. A. Lv, “Active exploration planning in reinforcement learning for inverted pendulum system control,” in Proc. Int. Conf. Machine Learning and Cybernetics (ICMLC), vol. 2006, no. Aug., pp. 2805–2809, 2006, https://doi.org/10.1109/ICMLC.2006.259002.
A. Zeynivand and H. Moodi, "Swing-up Control of a Double Inverted Pendulum by Combination of Q-Learning and PID Algorithms," 2022 8th International Conference on Control, Instrumentation and Automation (ICCIA), pp. 1-5, 2022, https://doi.org/10.1109/ICCIA54998.2022.9737201.
D. R. Rinku and M. AshaRani, “Reinforcement learning based multi-core scheduling (RLBMCS) for real-time systems,” Int. J. Electr. Comput. Eng., vol. 10, no. 2, pp. 1805–1813, 2020, https://doi.org/10.11591/ijece.v10i2.pp1805-1813.
T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015, https://doi.org/10.48550/arXiv.1509.02971.
S. Wen, J. Chen, S. Wang, H. Zhang, and X. Hu, “Path planning of humanoid arm based on deep deterministic policy gradient,” in Proc. IEEE Int. Conf. Robot. Biomimetics (ROBIO), pp. 1755–1760, 2018, https://doi.org/10.1109/ROBIO.2018.8665248.
L. Fan et al., “Optimal scheduling of microgrid based on deep deterministic policy gradient and transfer learning,” Energies, vol. 14, no. 3, pp. 1–15, 2021, https://doi.org/10.3390/en14030584.
A. Ilyas, L. Engstrom, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, and A. Madry, “A closer look at deep policy gradients,” arXiv preprint arXiv:1811.02553, 2018, https://doi.org/10.48550/arXiv.1811.02553.
C. A. M. Escobar, C. M. Pappalardo, and D. Guida, “A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole,” Applied Sciences, vol. 10, no. 24, pp. 1–19, 2020, https://doi.org/10.3390/app10249013.
O. Sigaud and F. Stulp, “Policy search in continuous action domains: An overview,” Neural Networks, vol. 113, pp. 28–40, 2019, https://doi.org/10.1016/j.neunet.2019.01.011.
M. Riedmiller, J. Peters, and S. Schaal, “Evaluation of policy gradient methods and variants on the cart-pole benchmark,” in Proc. IEEE Symp. Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 254–261, 2007, https://doi.org/10.1109/ADPRL.2007.368196.
V. François-Lavet et al., “An introduction to deep reinforcement learning,” Found. Trends Mach. Learn., vol. 11, no. 3–4, pp. 219–354, 2018, https://doi.org/10.1561/2200000071.
E. H. H. Sumiea et al., "Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks," in IEEE Access, vol. 11, pp. 139771-139784, 2023, https://doi.org/10.1109/ACCESS.2023.3341507.
D. Bates, “A hybrid approach for reinforcement learning using virtual policy gradient for balancing an inverted pendulum,” arXiv preprint arXiv:2102.08362, 2021, https://doi.org/10.48550/arXiv.2102.08362.
R. Zeng et al., “Manipulator control method based on deep reinforcement learning,” in Chinese Control and Decision Conference (CCDC), pp. 415–420, 2020, https://doi.org/10.1109/CCDC49329.2020.9164440.
S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 3389–3396, 2017, https://doi.org/10.1109/ICRA.2017.7989385.
C. Liu, A. G. Lonsberry, M. J. Nandor, M. L. Audu, A. J. Lonsberry, and R. F. Kirsch, “Reinforcement learning-based lower extremity exoskeleton control for sit-to-stand assistance,” in Proc. IEEE Int. Conf. Rehabil. Robot. (ICORR), pp. 328–333, 2019, https://doi.org/10.1109/ICORR.2019.8779493.
B. Singh, R. Kumar, and V. P. Singh, “Reinforcement learning in robotic applications: a comprehensive survey,” Artificial Intelligence Review, vol. 55, no. 2, pp. 945-990, 2022, https://doi.org/10.1007/s10462-021-09997-9.
Y. Yang, Z. Guo, H. Xiong, D. -W. Ding, Y. Yin and D. C. Wunsch, "Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning," in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 12, pp. 3735-3747, Dec. 2019, https://doi.org/10.1109/TNNLS.2019.2897814.
Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” In International conference on machine learning, pp. 1329-1338, 2016, https://doi.org/10.48550/arXiv.1604.06778.
T. Seyde, P. Werner, W. Schwarting, I., Gilitschenski, M. Riedmiller, D. Rus, and M. Wulfmeier, “Solving continuous control via q-learning,” arXiv preprint arXiv:2210.12566, 2022, https://doi.org/10.48550/arXiv.2210.12566.
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” In International conference on machine learning, pp. 1587-1596, 2018, https://doi.org/10.48550/arXiv.1802.09477.

Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Atikah Surriani, Hari Maghfiroh, Oyas Wahyunggoro, Adha Imam Cahyadi, Hanifah Rahmi Fajrin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This journal is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.