Hello. Sorry for the late response. I was on holidays.

1 min readDec 1, 2018

During training you sample a batch of <s(t’),a’,r’,s(t’+1)> from experience replay. Let’s say 64 instances. You are using s(t+1) to calculate the Q value for the target: Q(s(t’+1)) by the target network. And you are using s(t) to calculate the normal Q value: Q(s(t’)) by the Q-network. The objective is the minimization between those two values.

So the batch sampling from the experience replay is used by both networks.

I would refer you to the Github repository where I implemented deep q learning: https://github.com/artem-oppermann/Deep-Reinforcement-Learning/blob/master/src/q%20learning/dqn_model.py

About theta you are right. Where I calculate y_i the theta belongs to the target-network.

Why experience replay works is a hard question. It can be shown mathematically, but I would take a long time to show it here. I would recommend you to check out the original Paper on Deep-Q Learning: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Written by Artem Oppermann

No responses yet