Associative Learning in Factored MDPs

Associative learning is a paradigm from behaviorism that posits that learning occurs whenever a change in behavior is observed. Classical conditioning is one of the best-known associative learning paradigms and is one of the most basic survival tools found in nature by allowing organisms to expand the range of contexts where some of their already-known behaviors can be applied. By associating co-occurring stimuli from the environment, the organism can activate innate phylogenetic responses (e.g., fight or flight responses) to new and previously unknown situations.

[bibshow file=my-publications.bib show_links=1 format=custom-ieee template=custom-bibshow highlight=”P. Sequeira”]

Inspired by the classical conditioning paradigm, I developed a mechanism for reinforcement learning (RL) agents that takes advantage of the structure of their perceptions. Because the environments that RL agents inhabit are usually dynamic and unpredictable, the idea is to provide them with mechanisms to distinguish the features that are perceived from the state of their environment, focusing on those that seem more promising to achieve its goals, while ignoring others that do not.

Specifically, I focused on scenarios where the state of the agent can be described by a finite set of state-variables (i.e., where the state is factored). The work in [bibcite key=sequeira2010admi,sequeira2013phdthesis] presents a sensory tree based on a transactional pattern mining technique to store statistical information about the states’ factors online, i.e., while the agent interacts with the environment. This tree structure was later used to build an associative metric that, combined with a traditional RL algorithm and a spreading procedure, updates the estimated values of states that are structurally similar to the current state perceived by the agent [bibcite key=sequeira2013epia,sequeira2012techrep2,sequeira2013phdthesis]. The results show that such mechanism allows for a more efficient exploration/learning of the environment and is able to achieve similar results to algorithms that require full information of the environment’s dynamics, whereas the proposed method allows for gradual and adaptive improvements while the agent is learning.

Interestingly, this associative metric is also able to produce behavioral phenomena associated with the classical conditioning paradigm. In particular, [bibcite key=sequeira2012techrep2,sequeira2013phdthesis] analyze the impact of the associative metric in typified conditioning experiments, showing that combining it with standard TD(0) learning leads to the replication of common phenomena described in the classical conditioning literature.

Related Publications