Agent and Recurrence¶

The Agent abstraction and core AgentNet functionality lies here.

Agent¶

MDPAgent provides a high-level interface for quick implementation of classic MDP agents with continuous, discrete or mixed action space, arbitrary recurrent agent memory and decision making policy.

If you are up to something more sophisticated, try agentnet.agent.recurrence.Recurrence,: which is a lasagne layer for custom recurrent networks.

class agentnet.agent.mdp_agent.Agent[source]¶
Alias for MDPAgent

class agentnet.agent.mdp_agent.MDPAgent(observation_layers=(), agent_states={}, policy_estimators=(), action_layers=())[source]¶

A generic agent within MDP (markov decision process) abstraction. Basically wraps Recurrence layer to interact between agent and environment. Note for developers: if you want to get acquainted with this code, we suggest reading [Recurrence](http://agentnet.readthedocs.io/en/master/modules/agent.html#module-agentnet.agent.recurrence) first.

Parameters:

observation_layers (lasagne.layers.InputLayer or a list of such) – agent observation(s)
action_layers (resolver.BaseResolver child instance or any appropriate layer or a tuple of such, that can be fed into environment to get next state and observation.) – agent’s action(s), or whatever is fed into your environment as agent actions.
agent_states (collections.OrderedDict or dict) – OrderedDict{ memory_output: memory_input}, where memory_output: lasagne layer generates first agent state (before any interaction) determines new agent state given previous agent state and an observation memory_input: lasagne.layers.InputLayer that is used as “previous state” input for memory_output
policy_estimators (lasagne.Layer child instance (e.g. Q-values) or a tuple of such instances (e.g. state value + action probabilities for a2c)) – whatever determines agent policy (or whatever you want to work with later). - Q_values (and target network q-values) for q-learning - action probabilities for reinforce - action probabilities and state values (also possibly target network) for actor-critic - whatever intermediate state you want. e.g. if you want to penalize network for activations of layer l_dense_1 later, you will need to add it to policy_estimators.

get_react_function(output_flags={}, function_flags={'allow_input_downcast': True})[source]¶

compiles and returns a function that performs one step of agent network

Returns:	a theano function. The returned function takes all observation inputs, followed by all agent memories. It’s outputs are all actions, followed by all new agent memories By default, the function will have allow_input_downcast=True, you can override it in function parameters
Return type:	theano.function
Example:

The regular use case would look something like this: (assuming agent is an MDPagent with single observation, single action and 2 memory slots) >> react = agent.get_react_function >> action, new_mem0, new_mem1 = react(observation, mem0, mem1)

get_zeros_like_memory(batch_size=1)[source]¶: Returns a list of tensors matching initial agent memory, filled with zeros :param batch_size: how many parallel session memories to store :return: list of numpy arrays filled with zeros zeros with shape/dtype matching agent memory

get_all_params(**kwargs)[source]¶: calls lasagne.layers.get_all_params(all_agent_layers,**kwargs)

get_all_param_values(**kwargs)[source]¶: calls lasagne.layers.get_all_param_values(all_agent_layers,**kwargs)

set_all_param_values(values, **kwargs)[source]¶: calls lasagne.layers.set_all_param_values(all_agent_layers,values,**kwargs)

get_sessions(environment, session_length=10, batch_size=None, initial_env_states='zeros', initial_observations='zeros', initial_hidden='zeros', experience_replay=False, unroll_scan=True, return_automatic_updates=False, optimize_experience_replay=None, **kwargs)[source]¶

Returns history of agent interaction with environment for given number of turns:

Parameters:	environment (BaseEnvironment) – an environment to interact with session_length (int) – how many turns of interaction shall there be for each batch batch_size (int or theano.tensor.TensorVariable) – amount of independent sessions [number or symbolic]. irrelevant if experience_replay=True (will be inferred automatically also irrelevant if there’s at least one input or if you manually set any initial_. experience_replay* (bool) – whether or not to use experience replay if True, assumes environment to have a pre-defined sequence of observations and actions (as env.observations etc.) The agent will then observe sequence of observations and will be forced to take recorded actions via get_output(...,{action_layer=recorded_action} Saves some time by directly using environment.observations (list of sequences) instead of calling environment.get_action_results via environment.as_layers(...). Note that if this parameter is false, agent will be allowed to pick any actions during experience replay unroll_scan – whether use theano.scan or lasagne.utils.unroll_scan return_automatic_updates – whether to append automatic updates to returned tuple (as last element) kwargs (several kw flags (flag=value,flag2=value,..)) – optional flags to be sent to NN when calling get_output (e.g. deterministic = True) initial_something – layers providing initial values for all variables at 0-th time step ‘zeros’ default means filling variables with zeros Initial values are NOT included in history sequences optimize_experience_replay – deprecated, use experience_replay
Returns:	state_seq,observation_seq,hidden_seq,action_seq,policy_seq, for environment state, observation, hidden state, chosen actions and agent policy respectively each of them having dimensions of [batch_i,seq_i,...] time synchronization policy: env_states[:,i] was observed as observation[:,i] BASED ON WHICH agent generated his policy[:,i], resulting in action[:,i], and also updated his memory from hidden[:,i-1] to hiden[:,i]
Return type:	tuple of Theano tensors

get_automatic_updates(recurrent=True)[source]¶: Gets all random state updates that happened inside scan. :param recurrent: if True, appends automatic updates from previous layers :return: theano.OrderedUpdates with all automatic updates

as_recurrence(environment, session_length=10, batch_size=None, initial_env_states='zeros', initial_observations='zeros', initial_hidden='zeros', recurrence_name='AgentRecurrence', unroll_scan=True)[source]¶

Returns a Recurrence lasagne layer that contains :

Parameters:	environment (BaseEnvironment) – an environment to interact with session_length (int) – how many turns of interaction shall there be for each batch batch_size (int or theano.tensor.TensorVariable) – amount of independent sessions [number or symbolic]. irrelevant if there’s at least one input or if you manually set any initial_. initial_something* – layers providing initial values for all variables at 0-th time step ‘zeros’ default means filling variables with zeros Initial values are NOT included in history sequences flags: optional flags to be sent to NN when calling get_output (e.g. deterministic = True) unroll_scan – whether use theano.scan or lasagne.utils.unroll_scan
Returns:	Recurrence instance that returns [agent memory states] + [env states] + [env_observations] + [agent policy] + [action_layers outputs] all concatenated into one list
Return type:	agentnet.agent.recurrence.Recurrence

as_replay_recurrence(environment, session_length=10, initial_hidden='zeros', recurrence_name='ReplayRecurrence', unroll_scan=True)[source]¶

returns a Recurrence lasagne layer that contains.

Parameters:

environment (SessionBatchEnvironment or SessionPoolEnvironment) – an environment to interact with
session_length (int) – how many turns of interaction shall there be for each batch
initial_something – layers providing initial values for all variables at 0-th time step ‘zeros’ default means filling variables with zeros Initial values are NOT included in history sequences flags: optional flags to be sent to NN when calling get_output (e.g. deterministic = True)
unroll_scan – whether use theano.scan or lasagne.utils.unroll_scan

Returns:

an agentnet.agent.recurrence.Recurrence instance that returns

[agent memory states] + [env states] + [env_observations] [agent policy] + [action_layers outputs]: all concatenated into one list

get_agent_reaction(prev_states={}, current_observations=(), **kwargs)[source]¶

Symbolic expression for a one-tick agent reaction

Parameters:	prev_states (a dict [memory output: prev memory state value]) – values for previous states current_observations (a list of inputs where i-th input corresponds to i-th input slot from self.observations) – agent observations at this step kwargs – any flag that should be passed to the lasagne network for lasagne.layers.get_output method
Returns:	a tuple of [actions, new agent states] actions: a list of all action layer outputs new_states: a list of all new_state values, where i-th element corresponds to i-th self.state_variables key
Return type:	the return type description

Recurrence¶

AgentNet core abstraction is Recurrence - a lasagne container-layer that can hold: arbitrary graph and roll it for specified number of steps.

Apart from from MDP Agent, recurrence is also useful for arbitrary recurrent constructs e.g. convolutional RNN, attentive and/or augmented architectures etc. etc.

As Recurrence is a lasagne layer, one recurrence can be used as a part of computational graph of another recurrence.

class agentnet.agent.recurrence.Recurrence(input_nonsequences=OrderedDict(), input_sequences=OrderedDict(), tracked_outputs=(), state_variables=OrderedDict(), state_init='zeros', unroll_scan=True, n_steps=None, batch_size=None, mask_input=None, delayed_states=(), verify_graph=True, force_cast_types=False, name='YetAnotherRecurrence')[source]¶

A generic recurrent layer that works with a custom graph. Recurrence is a lasagne layer that takes an inner graph and rolls it for several steps using scan. Conversely, it can be used as any other lasagne layer, even as a part of another recurrence.

[tutorial on recurrence](https://github.com/yandexdataschool/AgentNet/blob/master

/examples/Custom%20rnn%20with%20recurrence.ipynb)

param input_nonsequences:

inputs that are same at each time tick. Technically it’s a dictionary that maps InputLayer from one-step graph to layers from the outer graph.

param input_sequences:

layers that represent time sequences, fed into graph tick by tick. This has to be a dict (one-step input -> sequence layer). All such sequences are iterated over FIRST AXIS (axis=1), since we consider their shape to be [batch, time, whatever_else...]

param tracked_outputs:

any layer from the one-state graph which outputs should be recorded at every time tick. Note that all state_variables are tracked separately, so their inclusion is not needed.

param state_variables:

a dictionary that maps next state variables to their respective previous state keys (new states) must be lasagne layers and values (previous states) must be InputLayers

Note that state dtype is defined thus:

if state key layer has output_dtype, than that type is used for the entire state

otherwise, theano.config.floatX is used

param state_init:

what are the default values for state_variables. In other words, what is prev_state for the first iteration. By default it’s T.zeros of the appropriate shape. Can be a dict mapping state OUTPUTS to their initialisations. if so, any states not mentioned in it will be considered zeros Can be a list of initializer layers for states in the order of dict.items() if so, it’s length must match len(state_variables)

param mask_input:

Boolean mask for sequences (like the same param in lasagne.layers.RecurrentLayer). When mask==1, computes next item as usual. Elif mask==0, next item is the copy of previous one.

param unroll_scan:

whether or not to use lasagne.utils.unroll_scan instead of theano.scan. Note that if unroll_scan == False, one should use .get_rng_updates after .get_output to collect automatic updates

param n_steps:
how many time steps will the recurrence roll for. If n_steps=None, tries to infer it. n_steps == None will only work when unroll_scan==False and there are at least some input sequences

param batch_size:

if the process has no inputs, this expression (int or theano scalar), this variable defines the batch size

param delayed_states:

any states mentioned in this list will be shifted 1 turn backwards - from init to n_steps -1. They will be padded with their initial values This is intended to allow flipping the recurrence graph to synchronize corresponding values. E.g. for MDP, if environment reaction follows agent action, synchronize observations with actions [at i-th turn agent sees i-th observation, than chooses i-th action and gets i-th reward]

param verify_graph:

whether to assert that all inner graph input layers are registered for the recurrence as inputs or prev states and all inputs/prev states are actually needed to compute next states/outputs. NOT the same as theano.scan(strict=True).

param force_cast_types:

if True, automatically converts layer types for layers to the declared type. Otherwise raises an error.

returns:

a tuple of sequences with shape [batch,tick, ...]

state variable sequences in order of dict.items()

tracked_outputs in given order

WARNING! this layer has a dictionary of outputs. It shouldn’t used further as an atomic lasagne layer. Instead, consider using my_recurrence[one_of_states_or_outputs] (see code below)
>>> import numpy as np
>>> import theano
>>> import agentnet
>>> from agentnet.memory import RNNCell
>>> from lasagne.layers import *
>>> sequence = InputLayer((None,None,3),name='input sequence')
>>> #one step
>>> inp = InputLayer((None,3))
>>> prev_rnn = InputLayer((None,10))
>>> rnn = RNNCell(prev_rnn,inp,name='rnn')
>>> #recurrence roll of the one-step graph above.
>>> rec = agentnet.Recurrence(input_sequences={inp:sequence},
...                          state_variables={rnn:prev_rnn},
...                          unroll_scan=False)
>>> weights = get_all_params(rec) #get weights
>>> print(weights)
>>> rnn_states = rec[rnn] #get rnn state sequence
>>> run = theano.function([sequence.input_var], get_output(rnn_states)) #compile applier function as any lasagne network
>>> run(np.random.randn(5,25,3)) #demo run

get_sequence_layers()[source]¶

returns history of agent interaction with environment for given number of turns.: [state_sequences] , [output sequences] - a list of all state sequences and a list of all output sequences Shape of each such sequence is [batch, tick, shape_of_one_state_or_output...]

get_one_step(prev_states={}, current_inputs={}, **get_output_kwargs)[source]¶

Applies one-step recurrence. :param prev_states: a dict {memory output: prev state} or a list of theano expressions for each prev state

Parameters:

current_inputs – a dictionary of inputs that maps {input layers -> theano expressions for them}, Alternatively, it can be a list where i-th input corresponds to i-th input slot from concatenated sequences and nonsequences self.input_nonsequences.keys() + self.input_sequences.keys()
get_output_kwargs – any flag that should be passed to the lasagne network for lasagne.layers.get_output method

Returns:

new_states: a list of all new_state values, where i-th element corresponds to i-th self.state_variables key new_outputs: a list of all outputs where i-th element corresponds to i-th self.tracked_outputs key

get_automatic_updates(recurrent=True)[source]¶: Gets all random state updates that happened inside scan. :param recurrent: if True, appends automatic updates from previous layers :return: theano.OrderedUpdates with all automatic updates

get_params(**tags)[source]¶: returns all params, including recurrent params from one-step network

get_output_for(inputs, accumulate_updates='warn', recurrence_flags={}, **kwargs)[source]¶

returns history of agent interaction with environment for given number of turns.

Parameters:	- [state init] + [input_nonsequences] + [input_sequences] (inputs) – Each part is a list of theano expressions for layers in the order they were provided when creating this layer. - a set of flags to be passed to the one step agent (recurrence_flags) – e.g. {deterministic=True}
Returns:	[state_sequences] + [output sequences] - a list of all states and all outputs sequences Shape of each such sequence is [batch, tick, shape_of_one_state_or_output...]