Reinforcement Learning

Return to Courses

Reinforcement Learning was a graduate-level course on agents that must learn, plan, and act in complex, non-deterministic environments. The course featured study of the main theory and approaches of Reinforcement Learning (RL), along with common software libraries and packages used to implement and test RL algorithms. RL also featured a programming component in the form of assignments and a final project.

Topics included:

Multi-armed bandits
Q-Learning
MDPs
Dynamic Programming
Monte Carlo Methods
Temporal Difference Methods
n-Step TD
Planning and Learning
Function Approximation
Neural Networks for Control

Final Project

Reinforcement Learning required a final project which incorporated topics from the class. My group elected to create a stock-trading agent.

Final Project

For a final project, my group created a stock-trading agent by combining various extensions of the DQN algorithm, sentiment analysis, technical indicators, autocorrelation data, and a policy we created called LocalMax to create a method which we called rTDQN.

The salient building blocks of rTDQN can be seen here:

An example heatmap of autocorrelation data that was fed to the agent at each time step can be seen here:

Finally, we can see that our model (bottom-right) outperformed any other model in isolation, and that our LocalMax policy outperforms the standard greedy policy used for trading agents.

Our full paper can be found below:

Example Homework

Homework reports with associated programming assignments were also part of this course. An example report can be found below