NeoRL: How it works

Hello everyone,

I would like to give an explanation of the algorithm behind NeoRL (GPU) and NeoRL-CPU (Available here and here). In this post, I will only go over the predictive hierarchy, since the reinforcement learning is still work-in-progress.

Overview

So let’s start with the basic idea behind the algorithm.

NeoRL operates on the theory that the neocortex is a bidirectional predictive hierarchy, similar to HTM (Hierarchical Temporal Memory). However, it differentiates from HTM in several important aspects:

  • Includes temporal pooling
  • Full hierarchy support of multiple regions
  • Does not use columns for predictions, these are reserved for reinforcement learning
  • Uses spiking neurons with “explaining-away” characteristics
  • Continuous inputs and predictions

The connectivity scheme of NeoRL is sort of like that of a convolutional neural network, but the weights are not shared. Also, as mentioned before, it is a bidirectional hierarchy unlike convolutional neural networks. The basic idea is that features are extracted upwards and predictions come downwards. The predictions can then influence the feature extraction to improve the features, which in turn results in better predictions.

NeoRL uses this hierarchy to predict the next timestep. The input is a 2D field (although theoretically it can be any dimension) of scalars, and the predictions are another 2D field of the same dimensions as the input. This makes it sort of like a predictive autoencoder.

IPRSDR_diag

Why predict only one timestep? Well, for one, it’s the theoretical minimum for building a world model. Why is this? Well, if your model understands how to predict the next timestep, then it can predict the timestep after that based on the timestep it just predicted, and then predict from that, and so on.

Now, let’s go into how the spatio-temporal features are extracted (upwards flow).

Spatio-Temporal Feature Extraction

Spatial-temporal feature extraction is based on sparse coding, or SDRs (sparse distributed representations). With SDRs/sparse coding, one attempts to find a sparse (few active) set of bases that can be used to reconstruct the input. This is not unlike a sparse autoencoder, however the way NeoRL does it is a bit different.

Below is an image of the FISTA algorithm being used to extract sparse codes.

IRSDR_codes

The algorithm used in NeoRL is similar to ISTA, but uses spiking neurons to produces time-averaged codes that are always in the [0, 1] range. I found that ISTA seems to not have strict enough bounding for codes, so I opted for something between ISTA and another algorithm called SAILnet (here).

The result works as follows:

  1. Excite neurons from reconstruction error
  2. Inhibit neurons from each other (lateral connectivity)
  3. Reconstruct the average firing rates of the neurons to get a reconstruction error (input – reconstruction)
  4. Repeat 1-3

Like ISTA, neurons are activated off of reconstruction error, but like SAILnet the codes are formed by having a spiked neuron inhibit its neighbors. This is performed sequentially for some iterations until a stable sparse code has been formed.

Once the code has been formed, the feed-forward and lateral weights are updated through Hebbian and anti-Hebbian learning rules respectively, based on the average spiking activities and the reconstruction thereof.

In order to extend this idea to the time domain, one can add an extra set of recurrent connections. This takes the previous average spiking activities and feeds them back in to the current cycle. So it will try to form sparse codes not only of the input, but also itself. This leads to a history compression algorithm.

However, there is one more trick we can do: In order to make the representation as efficient as possible, we can only compress the history/spatial features that lead to low prediction errors. This is accomplished through eligibility traces.

Eligibility traces are often used in reinforcement learning to propagate reward signals back in time to address the distal reward problem. In our case, we are using them as a replacement for backpropagation through time (BPTT), which is typically used with the LSTM algorithm. Instead of having to save a history buffer and update on that to a fixed-length horizon, we can easily propagate prediction errors to past codes with an infinite horizon (well, limited by floating-point accuracy of course).

The idea is that instead of updating the weights for the sparse coder directly, we instead use the weight change we would have applied to increment the eligibility trace. This trace then decays exponentially, giving newer samples more importance. Then, when the prediction error is below average, we want to update on those traces (since the past updates were good for us).

Here is an example of how such a trace variable can look (plotted over time). It’s not shown in the image, but the trace can also be negative.

AccumulatingTrace

 

So that’s how a single layer performs feature extraction.

Prediction

The prediction in NeoRL is very simple. It’s essentially a multilayer perceptron with thresholded units that points in the opposite direction of the feature extraction hierarchy. It can be thought of as overlaying the feature extraction hierarchy – each prediction neuron tries to learn a mapping from local (lateral and feedback) inputs to the next state of the feature extractor neuron it is associated with.

Each layer receives local (lateral) features along with the predictions of the next higher layer (if there is one) as input. It is then trained with a standard time-delayed perceptron learning rule to produce the next SDR at each layer.

The prediction errors are kept around to feed back to the feature extractors as explained earlier. We keep a decaying average of the error, and if the current error is less than the average, we “reward” the feature extractor here, otherwise we do nothing.

NeoRL predicting a periodic function (Red – actual, blue – prediction):

neorl_test

Conclusion

 

NeoRL is still in early stages of development, as is NeoRL-CPU. I hope I was able to give a decent explanation of what is going on in the algorithm. I will post explanations of the reinforcement learning extension as soon as it is up and running as well. Feel free to try out the library yourself!

Until next time,

CireNeikual

Leave a Reply

Your email address will not be published. Required fields are marked *