Reinforcement Learning Demo!

Hello!

I created a new reinforcement learning algorithm, and thanks to this new website, I have a three.js demo for it in this post!

The reinforcement learning algorithm is a combination of my one-iteration sparse distributed representation unsupervised learning algorithm as well as a version of the continuous actor-critic learning automaton with eligibility traces.

It works entirely without backpropagation! It also doesn’t use stochastic sampling from a replay buffer. The SDRs assure that there is little to no catastrophic interference. Everything is updated in one go over the weights per timestep.

This algorithm is still a bit of a prototype, but I think it works well enough to warrant a demo!

When running the demo, you can speed up time by dragging the slider in the controls menu.

The bits at the top left represent the current SDR.

The agent should learn to crawl withing a few seconds with the speed turned up to max.
It may get stuck at times, if this is a case just refresh the page!

Have fun!

2 thoughts on “Reinforcement Learning Demo!”

samim says:

October 10, 2015 at 7:43 am

nice work, looks very interesting. Any chance you´ll put this nicely packaged on github? 😉

Reply
1. ericlaukien@gmail.com says:
  
  October 10, 2015 at 8:01 pm
  
  Hello!
  
  I have a C++ version on GitHub right now actually!
  
  https://github.com/222464/BIDInet/blob/master/BIDInet/source/deep/SelfOptimizingUnit.h
  
  It isn’t particularly nicely packaged yet though. However, the code for the agent itself should be very readable. I plan on making a little tutorial on how to make this thing, during which I will clean everything up!
  
  The C++ version fixes several bugs present in the JS version. I plan on using this tiny reinforcement learning algorithm (the aim was to use as few resources as possible) to make a larger hierarchical swarm intelligence model.
  
  Thank you for posting!
  
  Reply

Reinforcement Learning Demo!

Related

2 thoughts on “Reinforcement Learning Demo!”

Leave a Reply to samim Cancel reply