Target Network¶

Implements the target network techniques in deep reinforcement learning. In short, the idea is to estimate reference Qvalues not from the current agent state, but from an earlier snapshot of weights. This is done to decorrelate target and predicted Qvalues/state_values and increase stability of learning algorithm.

Some notable alterations of this technique: - Standard approach with older NN snapshot – https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Moving average of weights

– http://arxiv.org/abs/1509.02971

Double Q-learning and other clever ways of training with target network

– http://arxiv.org/pdf/1509.06461.pdf

Here we implement a generic TargetNetwork class that supports both standard and moving average approaches through “moving_average_alpha” parameter of “load_weights”.

class agentnet.target_network.TargetNetwork(original_network_outputs, bottom_layers=(), share_inputs=True, name='target_net.')[source]¶

A generic class for target network techniques. Works by creating a deep copy of the original network and synchronizing weights through “load_weights” method.

If you just want to duplicate lasagne layers with or without sharing params, use agentnet.utils.clone.clone_network

Parameters:	original_network_outputs (lasagne.layers.Layer or a list/tuple of such) – original network outputs to be cloned for target network bottom_layers (lasagne.layers.Layer or a list/tuple/dict of such.) – the layers that should be shared between networks. share_inputs (bool) – if True, all InputLayers will still be shared even if not mentioned in bottom_layers
Snippet:

#build network from lasagne.layers l_in = InputLayer([None,10]) l_d0 = DenseLayer(l_in,20) l_d1 = DenseLayer(l_d0,30) l_d2 = DenseLayer(l_d1,40) other_l_d2 = DenseLayer(l_d1,41)

# TargetNetwork that copies all the layers BUT FOR l_in full_clone = TargetNetwork([l_d2,other_l_d2]) clone_d2, clone_other_d2 = full_clone.output_layers

# only copy l_d2 and l_d1, keep l_d0 and l_in from original network, do not clone other_l_d2 partial_clone = TargetNetwork(l_d2,bottom_layers=(l_d0)) clone_d2 = partial_clone.output_layers

do_something_with_l_d2_weights()

#synchronize parameters with original network partial_clone.load_weights()

#OR set clone_params = 0.33*original_params + (1-0.33)*previous_clone_params partial_clone.load_weights(0.33)

load_weights(moving_average_alpha=1)[source]¶

Loads the weights from original network into target network. Should usually be called whenever you want to synchronize the target network with the one you train.

When using moving average approach, one should specify which fraction of new weights is loaded through moving_average_alpha param (e.g. moving_average_alpha=0.1)

Parameters:	moving_average_alpha – If 1, just loads the new weights. Otherwise target_weights = alphaoriginal_weights + (1-alpha)target_weights