skip to main content
article
Free access

Transfer Learning for Reinforcement Learning Domains: A Survey

Published: 01 December 2009 Publication History

Abstract

The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.

References

[1]
Agnar Aamodt and Enric Plaza. Case-based reasoning: foundational issues, methodological variations, and system approaches, 1994.
[2]
Pieter Abbeel and Andrew Y. Ng. Exploration and apprenticeship learning in reinforcement learning. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 1-8, 2005.
[3]
David Andre and Stuart J. Russell. State abstraction for programmable reinforcement learning agents. In Proc. of the Eighteenth National Conference on Artificial Intelligence, pages 119-125, 2002.
[4]
Andreas Argyrious, Theodoros Evgenion, and Massimiliano Pontil. Multitask reinforcement learning on the distribution of MDPs. Machine Learning, 2007.
[5]
Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. Vision-based behavior acquisition for a shooting robot by using a reinforcement learning. In Proceedings of IAPR/IEEE Workshop on Visual Behaviors-1994, pages 112-118, 1994.
[6]
Mehran Asadi and Manfred Huber. Effective control knowledge transfer through learning skill and representation hierarchies. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2054-2059, 2007.
[7]
Authur Asuncion and David J. Newman. UCI machine learning repository, 2007. URL http: //www.ics.uci.edu/~mlearn/MLRepository.html.
[8]
Christopher G. Atkeson and Juan C. Santamaria. A comparison of direct and model-based reinforcement learning. In Proceedings of the 1997 International Conference on Robotics and Automation, 1997.
[9]
Bikramjit Banerjee and Peter Stone. General game learning using knowledge transfer. In The 20th International Joint Conference on Artificial Intelligence, pages 672-677, January 2007.
[10]
Bikramjit Banerjee, Yaxin Liu, and G. Michael Youngblood. ICML workshop on "Structural knowledge transfer for machine learning", June 2006.
[11]
Jonathan Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149-198, 2000.
[12]
Jonathan Baxter and Peter L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319-350, 2001.
[13]
Richard E. Bellman. Dynamic Programming. Princeton University Press, 1957.
[14]
Richard E. Bellman. A problem in the sequential design of experiments. Sankhya, 16:221-229, 1956.
[15]
Shai Ben-David and Reba Schuller Borbely. A notion of task relatedness yielding provable multipletask learning guarantees. Machine Learning, 73:273-287, 2008.
[16]
Darrin C. Bentivegna, Christopher G. Atkeson, and Gordon Cheng. Learning from observation and practice using primitives. In AAAI 2004 Fall Symposium on Real-life Reinforcement Learning, October 2004.
[17]
Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27 (4):819-840, November 2002.
[18]
Michael H. Bowling and Manuela M. Veloso. Bounding the suboptimality of reusing subproblem. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pages 1340-1347, San Francisco, CA, USA, 1999.
[19]
James L. Carroll and Kevin Seppi. Task similarity measures for transfer in reinforcement learning task libraries. Proceedings of 2005 IEEE International Joint Conference on Neural Networks, 2: 803-808, 2005.
[20]
Rich Caruana. Learning many related tasks at the same time with backpropagation. In Advances in Neural Information Processing Systems 7, pages 657-664, 1995.
[21]
Rich Caruana. Multitask learning. Machine Learning, 28:41-75, 1997.
[22]
Dongkyu Choi, Tolgo Konik, Negin Nejati, Chunki Park, and Pat Langley. Structural transfer of cognitive skills. In Proceedings of the Eighth International Conference on Cognitive Modeling, 2007.
[23]
William W. Cohen. Fast effective rule induction. In International Conference on Machine Learning, pages 115-123, 1995.
[24]
Marco Colombetti and Marco Dorigo. Robot shaping: developing situated agents through learning. Technical Report TR-92-040, International Computer Science Institute, Berkeley, CA, 1993.
[25]
Robert H. Crites and Andrew G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017-1023, Cambridge, MA, 1996. MIT Press.
[26]
Tom Croonenborghs, Kurt Driessens, and Maurice Bruynooghe. Learning relational options for inductive transfer in relational reinforcement learning. In Proceedings of the Seventeenth Conference on Inductive Logic Programming, 2007.
[27]
DARPA. Transfer learning proposer information pamphlet, BAA #05-29, 2005.
[28]
Thomas Dean and Robert Givan. Model minimization inMarkov decision processes. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 106-111, 1997.
[29]
Richard Dearden, Nir Friedman, and David Andre. Model based Bayesian exploration. In Proceedings of the 1999 Conference on Uncertainty in Artificial Intelligence, pages 150-159, 1999.
[30]
AArthur Dempster, Nan Laird, and Donald Rubin. Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Statistical Soc. Set. B (methodological), 39:1-38, 1977.
[31]
Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.
[32]
Chris Drummond. Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research, 16:59-104, 2002.
[33]
Saso Dzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning. Machine Learning, 43(1/2):5-52, April 2001.
[34]
Tom Erez and William D. Smart. What does shaping mean for computational reinforcement learning? In Proceedings of the Seventh IEEE International Conference on Development and Learning, pages 215-219, 2008.
[35]
Damien Ernst, Pierre Geurts, and Louis Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.
[36]
Kimberly Ferguson and Sridhar Mahadevan. Proto-transfer learning in Markov decision processes using spectral methods. In Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, June 2006.
[37]
Alan Fern, Sungwook Yoon, and Robert Givan. Approximate policy iteration with a policy language bias. In Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
[38]
Fernando Fernandez and Manuela Veloso. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents and Multiagent Systems, 2006.
[39]
Norm Ferns, Pablo Castro, Prakash Panangaden, and Doina Precup. Methods for computing state similarity in Markov decision processes. In Proceedings of the 22nd Conference on Uncertainty in Artificial intelligence, pages 174-181, 2006.
[40]
Norm Ferns, Prakash Panangaden, and Doina Precup. Metrics for Markov decision processes with infinite state spaces. In Proceedings of the 2005 Conference on Uncertainty in Artificial Intelligence, pages 201-208, 2005.
[41]
David Foster and Peter Dayan. Structure in the space of value functions. Machine Learning, 49 (1/2):325-346, 2004.
[42]
Allen Ginsberg. Theory revision via prior operationalization. In Proceedings of the 1988 National Conference on Artificial Intelligence, pages 590-595, 1988.
[43]
Carlos Guestrin, Daphne Koller, Chris Gearhart, and Neal Kanodia. Generalizing plans to new environments in relational MDPs. In International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, August 2003.
[44]
Okhtay Ilghami, Hector Munoz-Avila, Dana S. Nau, and David W. Aha. Learning approximate preconditions for methods in hierarchical plans. In ICML '05: Proceedings of the 22nd International Conference on Machine learning, pages 337-344, 2005.
[45]
Nicholas K. Jong and Peter Stone. Model-based exploration in continuous state spaces. In The Seventh Symposium on Abstraction, Reformulation, and Approximation, July 2007.
[46]
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4:237-285, May 1996.
[47]
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99-134, 1998.
[48]
Zsolt Kalmár and Csaba Szepesvári. An evaluation criterion for macro learning and some results. Technical Report TR-99-01, Mindmaker Ltd., 1999.
[49]
Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning, pages 260-268. Morgan Kaufmann, San Francisco, CA, 1998.
[50]
W. Bradley Knox and Peter Stone. TAMER: training an agent manually via evaluative reinforcement. In IEEE 7th International Conference on Development and Learning, August 2008.
[51]
J. Zico Kolter, Pieter Abbeel, and Andrew Ng. Hierarchical apprenticeship learning with application to quadruped locomotion. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 769-776. MIT Press, Cambridge, MA, 2008.
[52]
George Konidaris and Andrew Barto. Autonomous shaping: knowledge transfer in reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, pages 489- 496, 2006.
[53]
George Konidaris and Andrew G. Barto. Building portable options: skill transfer in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 895-900, 2007.
[54]
Gregory Kuhlmann and Peter Stone. Graph-based domain mapping for transfer learning in general games. In Proceedings of The Eighteenth European Conference onMachine Learning, September 2007.
[55]
Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
[56]
John E. Laird, Paul S. Rosenbloom, and Allen Newell. Chunking in soar: the anatomy of a general learning mechanism. Machine Learning, 1(1):11-46, 1986.
[57]
Alessandro Lazaric. Knowledge Transfer in Reinforcement Learning. PhD thesis, Politecnico di Milano, 2008.
[58]
Bethany R. Leffler, Michael L. Littman, and Timothy Edmunds. Efficient reinforcement learning with relocatable action models. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pages 572-577, 2007.
[59]
Lihong Li, Thomas J. Walsh, and Michael L. Littman. Towards a unified theory of state abstraction for MDPs. In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics, pages 531-539, 2006.
[60]
Yaxin Liu and Peter Stone. Value-function-based transfer for reinforcement learning using structure mapping. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 415-20, July 2006.
[61]
Richard Maclin and Jude W. Shavlik. Creating advice-taking reinforcement learners. Machine Learning, 22(1-3):251-281, 1996.
[62]
Richard Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, and Edward Wild. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the 20th National Conference on Artificial Intelligence, 2005.
[63]
Michael G. Madden and Tom Howley. Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review, 21(3-4):375-398, 2004.
[64]
Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8:2169-2231, 2007.
[65]
Maja J. Mataric. Reward functions for accelerated learning. In International Conference on Machine Learning, pages 181-189, 1994.
[66]
John McCarthy. A tough nut for proof procedures. Technical Report Sail AI Memo 16, Computer Science Department, Stanford University, 1964.
[67]
Neville Mehta, Sriraam Natarajan, Prasad Tadepalli, and Alan Fern. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 73(3):289-312, 2008.
[68]
Lilyana Mihalkova and Raymond J. Mooney. Transfer learning by mapping with minimal target data. In Proceedings of the AAAI-08 Workshop on Transfer Learning for Complex Tasks, July 2008.
[69]
Robin Milner. A Calculus of Communicating Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1982.
[70]
AndrewMoore. Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces. In Machine Learning: Proceedings of the Eighth International Conference, June 1991.
[71]
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: reinforcement learning with less data and less real time. Machine Learning, 13:103-130, October 1993.
[72]
Andrew Y. Ng and Michael Jordan. PEGASUS: a policy search method for large MDPs and POMDPs. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000.
[73]
Andrew Y. Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 1999.
[74]
Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. Inverted autonomous helicopter flight via reinforcement learning. In International Symposium on Experimental Robotics, 2004.
[75]
Dirk Ormoneit and Saunak Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3): 161-178, 2002.
[76]
Theodore J. Perkins and Doina Precup. Using options for knowledge transfer in reinforcement learning. Technical Report UM-CS-1999-034, The University of Massachusetts at Amherst, 1999.
[77]
Caitlin Phillips. Knowledge transfer in Markov decision processes. Technical report, McGill University, School of Computer Science, 2006. URL http://www.cs.mcgill.ca/~cphill/CDMP/ summary.pdf.
[78]
Bob Price and Craig Boutilier. Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 19:569-629, 2003.
[79]
Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994.
[80]
Jan Ramon, Kurt Driessens, and Tom Croonenborghs. Transfer learning in reinforcement learning problems through partial policy recycling. In Proceedings of The Eighteenth European Conference on Machine Learning, September 2007.
[81]
Balaraman Ravindran and Andrew G. Barto. Model minimization in hierarchical reinforcement learning. In Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation, 2002.
[82]
Balaraman Ravindran and Andrew G. Barto. An algebraic approach to abstraction in reinforcement learning. In Proceedings of the Twelfth Yale Workshop on Adaptive and Learning Systems, pages 109-114, 2003a.
[83]
Balaraman Ravindran and Andrew G. Barto. Relativized options: choosing the right transformation. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pages 608-615, Menlo Park, CA, August 2003b. AAAI Press.
[84]
Daniel M. Roy and Leslie P. Kaelbling. Efficient Bayesian task-level transfer learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007.
[85]
Gavin Rummery andMahesan Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, 1994.
[86]
Manish Saggar, Thomas D'Silva, Nate Kohl, and Peter Stone. Autonomous learning of stable quadruped locomotion. In Gerhard Lakemeyer, Elizabeth Sklar, Domenico Sorenti, and Tomoichi Takahashi, editors, RoboCup-2006: Robot Soccer World Cup X, volume 4434 of Lecture Notes in Artificial Intelligence, pages 98-109. Springer Verlag, Berlin, 2007.
[87]
Oliver G. Selfridge, Richard S. Sutton, and Andrew G. Barto. Training and tracking in robotics. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pages 670- 672, 1985.
[88]
Lloyd S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39(10):1095-1100, October 1953.
[89]
Manu Sharma, Michael Holmes, Juan Santamaria, Arya Irani, Charles Isbell, and Ashwin Ram. Transfer learning in real-time strategy games using hybrid CBR/RL. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, 2007.
[90]
Alexander A. Sherstov and Peter Stone. Improving action selection in MDP's via knowledge transfer. In Proceedings of the Twentieth National Conference on Artificial Intelligence, July 2005.
[91]
Danny Silver, Goekhan Bakir, Kristin Bennett, Rich Caruana, Massimiliano Pontil, Stuart Russell, and Prasad Tadepalli. NIPS workshop on "Inductive transfer: 10 years later", December 2005.
[92]
Özgür Simsek and Andrew G. Barto. An intrinsic reward mechanism for efficient exploration. In Proceedings of the Twenty-Third International Conference on Machine Learning, 2006.
[93]
Satinder Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158, 1996.
[94]
Satinder P. Singh. Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8:323-339, 1992.
[95]
Burrhus F. Skinner. Science and Human Behavior. Colliler-Macmillian, 1953.
[96]
Vishal Soni and Satinder Singh. Using homomorphisms to transfer options across continuous reinforcement learning domains. In Proceedings of the Twenty First National Conference on Artificial Intelligence, July 2006.
[97]
Ashwin Srinivasan. The aleph manual, 2001.
[98]
Peter Stone andManuela Veloso. Multiagent systems: a survey froma machine learning perspective. Autonomous Robots, 8(3):345-383, July 2000.
[99]
Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement learning for RoboCupsoccer keepaway. Adaptive Behavior, 13(3):165-188, 2005.
[100]
Funlade T. Sunmola and Jeremy L. Wyatt. Model transfer for Markov decision tasks via parameter matching. In Proceedings of the 25th Workshop of the UK Planning and Scheduling Special Interest Group (PlanSIG 2006), December 2006.
[101]
Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
[102]
Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.
[103]
Richard S. Sutton, Doina Precup, and Satinder P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181- 211, 1999.
[104]
Richard S. Sutton, Anna Koop, and David Silver. On the role of tracking in stationary environments. In Proceedings of the 24th International Conference on Machine Learning, 2007.
[105]
Samarth Swarup and Sylvian R. Ray. Cross-domain knowledge transfer using structured representations. In Proceedings of the Twenty First National Conference on Artificial Intelligence, July 2006.
[106]
Umar Syed and Robert Schapier. A multiplicative weights algorithm for apprenticeship learning. In Advances in Neural Information Processing Systems 21, 2007.
[107]
Erik Talvitie and Satinder Singh. An experts algorithm for transfer learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, 2007.
[108]
Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning on the distribution of MDPs. Transactions of the Institute of Electrical Engineers of Japan. C, 123(5):1004-1011, 2003.
[109]
Brian Tanner, Adam White, and Richard S. Sutton. RL Glue and codecs, 2008. http://mloss. org/software/view/151/.
[110]
Matthew E. Taylor and Peter Stone. Representation transfer for reinforcement learning. In AAAI 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development, November 2007a.
[111]
Matthew E. Taylor and Peter Stone. Cross-domain transfer for reinforcement learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning, June 2007b.
[112]
Matthew E. Taylor, Peter Stone, and Yaxin Liu. Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8(1):2125-2167, 2007a.
[113]
Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer via inter-task mappings in policy search reinforcement learning. In The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, May 2007b.
[114]
Matthew E. Taylor, Alan Fern, Kurt Driessens, Peter Stone, Richard Maclin, and Jude Shavlik. AAAI workshop on "Transfer learning for complex tasks", July 2008a.
[115]
Matthew E. Taylor, Nicholas Jong, and Peter Stone. Transferring instances for model-based reinforcement learning. In Proceedings of the Adaptive Learning Agents and Multi-Agent Systems (ALAMAS+ALAG) workshop at AAMAS-08, May 2008b.
[116]
Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. Transferring instances for model-based reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 488-505, September 2008c.
[117]
Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
[118]
Edward L. Thorndike and Robert S. Woodworth. The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review, 8:247-261, 1901.
[119]
Sebastian Thrun. Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing Systems, volume 8, pages 640-646, 1996.
[120]
Sebastian Thrun and Lorien Pratt, editors. Learning to learn. Kluwer Academic Publishers, Norwell, MA, USA, 1998.
[121]
Lisa Torrey, Trevor Walker, Jude W. Shavlik, and Richard Maclin. Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Proceedings of the Sixteenth European Conference on Machine Learning, pages 412-424, 2005.
[122]
Lisa Torrey, Jude W. Shavlik, Trevor Walker, and Richard Maclin. Skill acquisition via transfer learning and advice taking. In Proceedings of the Sixteenth European Conference on Machine Learning, pages 425-436, 2006.
[123]
Lisa Torrey, Jude W. Shavlik, Trevor Walker, and Richard Maclin. Relational macros for transfer in reinforcement learning. In Proceedings of the Seventeenth Conference on Inductive Logic Programming, 2007.
[124]
Thomas J. Walsh, Lihong Li, and Michael L. Littman. Transferring state abstractions between MDPs. In Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, June 2006.
[125]
Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.
[126]
Shimon Whiteson, Adam White, Brian Tanner, Richard S. Sutton Sutton, Doina Precup, Peter Stone, Michael Littman, Nikos Vlassis, and Martin Riedmiller. ICML workshop on "The 2008 RLcompetition", July 2008.
[127]
Gerhard Widmer and Miroslav Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69-101, 1996.
[128]
Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
[129]
Aaron Wilson, Alan Fern, Soumya Ray, and Prasad Tadepalli. Multi-task reinforcement learning: a hierarchical Bayesian approach. In ICML '07: Proceedings of the 24th international conference on Machine learning, pages 1015-1022, 2007.
[130]
Wei Zhang and Thomas G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of the International Joint Conference on Artificial Intelligence, 1995.

Cited By

View all

Index Terms

  1. Transfer Learning for Reinforcement Learning Domains: A Survey
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 10, Issue
    12/1/2009
    2936 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents

    Publisher

    JMLR.org

    Publication History

    Published: 01 December 2009
    Published in JMLR Volume 10

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)176
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media

      翻译: