Intrinsically-Motivated Reinforcement Learning

My Ph.D thesis focused in developing flexible and robust mechanisms for autonomous agents by using the computational framework of reinforcement learning (RL). Within the field of machine learning, RL is the discipline concerned with providing mechanisms that allow an agent to accomplish a task through trial-and-error interactions with a dynamic and sometimes uncertain and unreliable environment. Furthermore, agents usually suffer from perceptual, motor and adaptive limitations, i.e., they often do not have access to “all” the information required to make the best decisions and normally do not know the environment’s dynamics or the exact consequences of their actions. As a consequence, standard RL techniques present several design challenges, especially when dealing with complex problems often involving a great amount of fine-tuning and user expert knowledge.

[bibshow file=my-publications.bib show_links=1 format=custom-ieee template=custom-bibshow highlight=”P. Sequeira”]

During my Ph.D I incorporated ideas and mechanisms from biology, psychology and related areas as a way to augment RL agents’ perceptual and behavioral capabilities. Specifically, I focused on the reward mechanism embedded in the agent. A major problem here is to design reward mechanisms that allow an agent to learn the task as efficiently as possible having into account its limitations. Another challenge is creating rewards that are generic enough to be used in a wide variety of situations, i.e., not only for a specific task or domain.

To that effect, I followed the framework for intrinsically-motivated reinforcement learning (IRML), an extension to RL where an agent is rewarded for behaviors other than those strictly related to the task being accomplished, e.g., by exploring or playing with elements of its environment. The work in [bibcite key=sequeira2011acii,sequeira2013phdthesis,sequeira2014adb] presents a set of four domain-independent reward features based on appraisal theories of emotions, that combined can guide the learning process of IMRL agents acting in different dynamic environments over which they have only limited perceptions.

I also proposed a mechanism based on genetic programming (GP) to evolve general-purpose, domain-independent intrinsic reward functions, i.e., that do not require prior domain-knowledge or manual design of the agent’s reward function for particular tasks. The work in [bibcite key=sequeira2013phdthesis,sequeira2014jaamas,sequeira2016aamas,sequeira2017sas] uses information that is produced by any standard RL algorithm as the building blocks for GP in order to discover interesting sources of information to complement the agent’s perceptual capabilities in partially-observable environments.

Furthermore, I extended the IMRL framework to multiagent settings involving complex interdependences between the behaviors of several interacting agents, each using a particular intrinsic reward function. In particular, the work in [bibcite key=sequeira2011icdl,sequeira2013phdthesis] presents generic social rewards inspired in natural mechanisms of reciprocation, by which animals evaluate the kindness of others’ actions and respond accordingly, that enable the emergence of socially-ware individual behaviors within competitive multiagent scenarios.

Overall, the results in these articles show that a careful consideration of the reward features, i.e., a different weight for each feature, successfully mitigates the agents’ limitations allowing them to learn the underlying task. In addition, depending on the characteristics and dynamics of each environment, a different feature weighting scheme may be necessary for the agent to overcome its limitations, i.e. the agent has to be adapted to its environment. In multiagent IMRL, the proposed rewarding mechanism enables the emergence of “socially-aware” behaviors that drive the agents to learn strategies that exchange immediate individual gains for long-run group benefits.

Related Publications

[/bibshow]