Title: | A taxonomy for similarity metrics between Markov decision processes
|
Author: | García Polo, Francisco Javier
Visús, Álvaro
Fernández Rebollo, Fernando
|
Affiliation: | Universidade de Santiago de Compostela. Departamento de Electrónica e Computación
|
Subject: | Markov decision processes | Similarity metrics | Transfer learning | |
Date of Issue: | 2022
|
Publisher: | Springer
|
Citation: | García, J., Visús, Á. & Fernández, F. A taxonomy for similarity metrics between Markov decision processes. Mach Learn 111, 4217–4247 (2022). https://doi.org/10.1007/s10994-022-06242-4
|
Abstract: | Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the knowledge acquired in the learning of a set of source tasks to a new learning process in a target task, assuming that the target and source tasks are close enough. In recent years, transfer learning has succeeded in making reinforcement learning (RL) algorithms more efficient (e.g., by reducing the number of samples needed to achieve (near-)optimal performance). Transfer in RL is based on the core concept of similarity: whenever the tasks are similar, the transferred knowledge can be reused to solve the target task and significantly improve the learning performance. Therefore, the selection of good metrics to measure these similarities is a critical aspect when building transfer RL algorithms, especially when this knowledge is transferred from simulation to the real world. In the literature, there are many metrics to measure the similarity between MDPs, hence, many definitions of similarity or its complement distance have been considered. In this paper, we propose a categorization of these metrics and analyze the definitions of similarity proposed so far, taking into account such categorization. We also follow this taxonomy to survey the existing literature, as well as suggesting future directions for the construction of new metrics |
Publisher version: | https://doi.org/10.1007/s10994-022-06242-4 |
URI: | http://hdl.handle.net/10347/29969
|
DOI: | 10.1007/s10994-022-06242-4 |
ISSN: | 0885-6125
|
E-ISSN: | 1573-0565
|
Rights: | © The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ Atribución 4.0 Internacional Atribución 4.0 Internacional
|