Papers
Topics
Authors
Recent
Search
2000 character limit reached

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

Published 6 May 2024 in cs.RO and cs.AI | (2405.03113v1)

Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel et al., “Mastering atari, go, chess and shogi by planning with a learned model,” Nature, vol. 588, no. 7839, pp. 604–609, 2020.
  2. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv preprint arXiv:2307.15818, 2023.
  3. O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, C. Xu, J. Luo et al., “Octo: An open-source generalist robot policy,” 2023.
  4. Y. Chebotar, K. Hausman, Y. Lu, T. Xiao, D. Kalashnikov, J. Varley, A. Irpan, B. Eysenbach, R. Julian, C. Finn et al., “Actionable models: Unsupervised offline reinforcement learning of robotic skills,” arXiv preprint arXiv:2104.07749, 2021.
  5. H. Sikchi, R. Chitnis, A. Touati, A. Geramifard, A. Zhang, and S. Niekum, “Score models for offline goal-conditioned reinforcement learning,” arXiv preprint arXiv:2311.02013, 2023.
  6. B. Eysenbach, S. Levine, and R. R. Salakhutdinov, “Replacing rewards with examples: Example-based policy search via recursive classification,” Advances in Neural Information Processing Systems, vol. 34, pp. 11 541–11 552, 2021.
  7. B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” arXiv preprint arXiv:1802.06070, 2018.
  8. S. Park, O. Rybkin, and S. Levine, “Metra: Scalable unsupervised rl with metric-aware abstraction,” arXiv preprint arXiv:2310.08887, 2023.
  9. T. Zahavy, Y. Schroecker, F. Behbahani, K. Baumli, S. Flennerhag, S. Hou, and S. Singh, “Discovering policies with domino: Diversity optimization maintaining near optimality,” arXiv preprint arXiv:2205.13521, 2022.
  10. C. Chuck, K. Black, A. Arjun, Y. Zhu, and S. Niekum, “Granger-causal hierarchical skill discovery,” arXiv preprint arXiv:2306.09509, 2023.
  11. C. Chuck, S. Chockchowwat, and S. Niekum, “Hypothesis-driven skill discovery for hierarchical deep reinforcement learning,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 5572–5579.
  12. M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman et al., “Do as i can, not as i say: Grounding language in robotic affordances,” arXiv preprint arXiv:2204.01691, 2022.
  13. J. Hejna, R. Rafailov, H. Sikchi, C. Finn, S. Niekum, W. B. Knox, and D. Sadigh, “Contrastive prefence learning: Learning from human feedback without rl,” arXiv preprint arXiv:2310.13639, 2023.
  14. H. Sikchi, A. Saran, W. Goo, and S. Niekum, “A ranking game for imitation learning,” arXiv preprint arXiv:2202.03481, 2022.
  15. H. Sikchi, Q. Zheng, A. Zhang, and S. Niekum, “Dual rl: Unification and new methods for reinforcement and imitation learning,” arXiv preprint arXiv:2302.08560, 2023.
  16. T. Ni, H. Sikchi, Y. Wang, T. Gupta, L. Lee, and B. Eysenbach, “f-irl: Inverse reinforcement learning via state marginal matching,” in Conference on Robot Learning.   PMLR, 2021, pp. 529–551.
  17. E. Catto. [Online]. Available: https://box2d.org/
  18. Y. Zhu, J. Wong, A. Mandlekar, R. Martín-Martín, A. Joshi, S. Nasiriany, and Y. Zhu, “robosuite: A modular simulation framework and benchmark for robot learning,” arXiv preprint arXiv:2009.12293, 2020.
  19. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  20. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in neural information processing systems, vol. 30, 2017.
  21. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  22. S. Huang, R. F. J. Dossa, C. Ye, J. Braga, D. Chakraborty, K. Mehta, and J. G. Araújo, “Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms,” Journal of Machine Learning Research, vol. 23, no. 274, pp. 1–18, 2022. [Online]. Available: http://jmlr.org/papers/v23/21-1342.html
  23. I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” arXiv, 2021.
  24. P. Liu, D. Tateo, H. Bou-Ammar, and J. Peters, “Efficient and reactive planning for high speed robot air hockey,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 586–593.
  25. A. Taitler and N. Shimkin, “Learning control for air hockey striking using deep reinforcement learning,” 2017.
  26. J. Jankowski, A. Marić, and S. Calinon, “Airlihockey: Highly reactive contact control and stochastic optimal shooting,” 2024.
  27. A. Namiki, S. Matsushita, T. Ozeki, and K. Nonami, “Hierarchical processing architecture for an air-hockey robot system,” in 2013 IEEE International Conference on Robotics and Automation, 2013, pp. 1187–1192.
  28. M. Ogawa, S. Shimizu, T. Kadogawa, T. Hashizume, S. Kudoh, T. Suehiro, Y. Sato, and K. Ikeuchi, “Development of air hockey robot improving with the human players,” in IECON 2011 - 37th Annual Conference of the IEEE Industrial Electronics Society, 2011, pp. 3364–3369.
  29. B. Bishop and M. Spong, “Vision-based control of an air hockey playing robot,” IEEE Control Systems Magazine, vol. 19, no. 3, pp. 23–32, 1999.
  30. A. AlAttar, L. Rouillard, and P. Kormushev, “Autonomous air-hockey playing cobot using optimal control and vision-based bayesian tracking,” in Towards Autonomous Robotic Systems, K. Althoefer, J. Konstantinova, and K. Zhang, Eds.   Cham: Springer International Publishing, 2019, pp. 358–369.
  31. H. Alizadeh, H. Moradi, and M. N. Ahmadabadi, “Automatic calibration of an air hockey robot,” in 2013 First RSI/ISM International Conference on Robotics and Mechatronics (ICRoM), 2013, pp. 107–112.
  32. K. Tadokoro, S. Fukuda, and A. Namiki, “Development of air hockey robot with high-speed vision and high-speed wrist,” Journal of Robotics and Mechatronics, vol. 34, no. 5, pp. 956–964, 2022.
  33. P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What went wrong? closing the sim-to-real gap via differentiable causal discovery,” in Proceedings of The 7th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, J. Tan, M. Toussaint, and K. Darvish, Eds., vol. 229.   PMLR, 06–09 Nov 2023, pp. 734–760. [Online]. Available: https://proceedings.mlr.press/v229/huang23c.html
  34. “Robot air hockey challenge 2023,” https://air-hockey-challenge.robot-learning.net/home, accessed: 2024-03-28.
  35. A. Orsula, “Learning to play air hockey with model-based deep reinforcement learning,” Air Hockey Challenge at Advances in neural information processing systems, 2023.
  36. M. E. B. V. de Bakker3Atalay, D. Ö. E. Yagmurlu, M. F. Z. J. D. Yang, H. Zhou, X. Jia, O. Celik, F. Otto, R. Lioutikov, and G. Neumann, “Air hockey challenge 2023: Air-hockit team report,” Air Hockey Challenge at Advances in neural information processing systems, 2023.
  37. F. Minnucci, “Applying rule-based controllers and reinforcement learning to control a general purpose robot: the air hockey challenge case,” Air Hockey Challenge at Advances in neural information processing systems, 2023.
  38. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems.   IEEE, 2012, pp. 5026–5033.
  39. O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 1987.
  40. G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
  41. D. A. Pomerleau, “Efficient training of artificial neural networks for autonomous navigation,” Neural computation, vol. 3, no. 1, pp. 88–97, 1991.
  42. E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “Bc-z: Zero-shot task generalization with robotic imitation learning,” in Conference on Robot Learning.   PMLR, 2022, pp. 991–1002.
  43. J. Ho and S. Ermon, “Generative adversarial imitation learning,” Advances in neural information processing systems, vol. 29, 2016.
  44. S. K. S. Ghasemipour, R. Zemel, and S. Gu, “A divergence minimization perspective on imitation learning methods,” in Conference on robot learning.   PMLR, 2020, pp. 1259–1277.
  45. S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement learning without exploration. corr abs/1812.02900 (2018),” arXiv preprint arXiv:1812.02900, 2018.
  46. A. Kumar, R. Agarwal, T. Ma, A. Courville, G. Tucker, and S. Levine, “Dr3: Value-based deep reinforcement learning requires explicit regularization,” arXiv preprint arXiv:2112.04716, 2021.
  47. S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” in Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
  48. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
  49. J. P. Hanna, S. Desai, H. Karnan, G. Warnell, and P. Stone, “Grounded action transformation for sim-to-real reinforcement learning,” Machine Learning, vol. 110, no. 9, pp. 2469–2499, 2021.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.