Hello!
As always, I am trying to figure out how the Neocortex works in order to exploit its properties for reinforcement learning.
I believe I have finally found something substantial, at least in how complete it is. My latest model has several bizarre features. Whether or not the whole thing actually works remains to be seen as I am still in the process of coding it. This model is built of components that I have already shown work, so the question is whether the combination of these components leads to desired properties.
The fundamental idea behind this theory I came up with as a submission to Numenta’s HTM challenge. It is as follows: Every cortical column is in itself a tiny reinforcement learning agent that learns to control information flow in order to maximize a reward signal.
There are three important modules to this new system:
- A bottom-up sparse coding hierarchy
- A top-down prediction hierarchy
- The gating SDRRL units (reinforcement learners)
So, I decided to use my previous SDRRL algorithm for this task, but really any reinforcement learning agent should work.
Sparse codes are extracted in a bottom up fashion. However, unlike typical hierarchical sparse coding, the inputs from one layer to the next are modulated by the SDRRL units – this way, the column can learn to drive attention to certain inputs. Each SDRRL unit itself receives sparse codes in a local radius as input, and along with this attention gate, it has a prediction learning gate and a sparse code learning gate. This makes 3 gates in total, although the exact amount may change as I develop this theory further.
The top-down predictive hierarchy learns to predict the sparse codes of the next timestep, but its learning rate is modulated by SDRRL. This way, SDRRL can choose to only predict things that lead to higher rewards in the future – considering that some of the predicted inputs may actually be actions as well, this allows the system to select actions.
The system as a whole gates information flow in a “reinforcement-learning-modulated” fashion, so instead of the purely unsupervised learning typical associated with hierarchical sparse coding/prediction, it “bends” the process towards important information and rewarding prediction-actions.
Below is a diagram of a single column of this model:
Well, on to coding the thing! I am developing a CPU version first, and then a multithreaded/GPU version in OpenCL.
Until next time!
~CireNeikual