V(s) is calculated by a neural network with no hidden layers and a linear output function (in the case of my implementation on GitHub). It works the best with no hidden layers for some reason. Thus its nothing more than a linear classifier.

I never bothered to understand fully the mathematical reason why the variance can be reduced this way since I only needed the models to work. But the derivation for this can be found here :

Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., and Lee, M. (2007). Incremental natural actor-critic algorithms. In Neural Information Processing Systems 21.

Deep Learning & AI Software Developer | MSc. Physics | Educator|

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store