`flambe.nn.transformer_sru`¶

Module Contents¶

class flambe.nn.transformer_sru.TransformerSRU(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidrectional: bool = False, **kwargs: Dict[str, Any])[source]¶

Bases: flambe.nn.Module

A Transformer with an SRU replacing the FFN.

forward(self, src: torch.Tensor, tgt: torch.Tensor, src_mask: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None, tgt_key_padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]¶

Take in and process masked source/target sequences.

Parameters:

src (torch.Tensor) – the sequence to the encoder (required). shape: \((N, S, E)\).
tgt (torch.Tensor) – the sequence to the decoder (required). shape: \((N, T, E)\).
src_mask (torch.Tensor, optional) – the additive mask for the src sequence (optional). shape: \((S, S)\).
tgt_mask (torch.Tensor, optional) – the additive mask for the tgt sequence (optional). shape: \((T, T)\).
memory_mask (torch.Tensor, optional) – the additive mask for the encoder output (optional). shape: \((T, S)\).
src_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for src keys per batch (optional). shape: \((N, S)\).
tgt_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for tgt keys per batch (optional). shape: \((N, T)\).
memory_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for memory keys per batch (optional). shape” \((N, S)\).

Returns:

output (torch.Tensor) – The output sequence, shape: \((T, N, E)\).
Note ([src/tgt/memory]_mask should be filled with) – float(‘-inf’) for the masked positions and float(0.0) else. These masks ensure that predictions for position i depend only on the unmasked positions j and are applied identically for each sequence in a batch. [src/tgt/memory]_key_padding_mask should be a ByteTensor where False values are positions that should be masked with float(‘-inf’) and True values will be unchanged. This mask ensures that no information will be taken from position i if it is masked, and has a separate mask for each sequence in a batch.
Note (Due to the multi-head attention architecture in the) – transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode.

where S is the source sequence length, T is the target sequence length, N is the batchsize, E is the feature number

class flambe.nn.transformer_sru.TransformerSRUEncoder(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidirectional: bool = False, **kwargs: Dict[str, Any])[source]¶

Bases: flambe.nn.Module

A TransformerSRUEncoder with an SRU replacing the FFN.

forward(self, src: torch.Tensor, state: Optional[torch.Tensor] = None, mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None)[source]¶

Pass the input through the endocder layers in turn.

Parameters:

src (torch.Tensor) – The sequnce to the encoder (required).
state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
mask (torch.Tensor, optional) – The mask for the src sequence (optional).
padding_mask (torch.Tensor, optional) – The mask for the src keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.

_reset_parameters(self)[source]¶: Initiate parameters in the transformer model.

class flambe.nn.transformer_sru.TransformerSRUDecoder(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, **kwargs: Dict[str, Any])[source]¶

Bases: flambe.nn.Module

A TransformerSRUDecoderwith an SRU replacing the FFN.

forward(self, tgt: torch.Tensor, memory: torch.Tensor, state: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]¶

Pass the inputs (and mask) through the decoder layer in turn.

Parameters:	tgt (torch.Tensor) – The sequence to the decoder (required). memory (torch.Tensor) – The sequence from the last layer of the encoder (required). state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention). tgt_mask (torch.Tensor, optional) – The mask for the tgt sequence (optional). memory_mask (torch.Tensor, optional) – The mask for the memory sequence (optional). padding_mask (torch.Tensor, optional) – The mask for the tgt keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens. memory_key_padding_mask (torch.Tensor, optional) – The mask for the memory keys per batch (optional).
Returns:
Return type:	torch.Tensor

_reset_parameters(self)[source]¶: Initiate parameters in the transformer model.

class flambe.nn.transformer_sru.TransformerSRUEncoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidirectional: bool = False, **kwargs: Dict[str, Any])[source]¶

Bases: flambe.nn.Module

A TransformerSRUEncoderLayer with an SRU replacing the FFN.

forward(self, src: torch.Tensor, state: Optional[torch.Tensor] = None, src_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None)[source]¶

Pass the input through the endocder layer.

Parameters:

src (torch.Tensor) – The sequence to the encoder layer (required).
state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
src_mask (torch.Tensor, optional) – The mask for the src sequence (optional).
padding_mask (torch.Tensor, optional) – The mask for the src keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.

Returns:

torch.Tensor – Output Tensor of shape [S x B x H]
torch.Tensor – Output state of the SRU of shape [N x B x H]

class flambe.nn.transformer_sru.TransformerSRUDecoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, **kwargs: Dict[str, Any])[source]¶

Bases: flambe.nn.Module

A TransformerSRUDecoderLayer with an SRU replacing the FFN.

forward(self, tgt: torch.Tensor, memory: torch.Tensor, state: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]¶

Pass the inputs (and mask) through the decoder layer.

Parameters:	tgt (torch.Tensor) – The sequence to the decoder layer (required). memory (torch.Tensor) – The sequence from the last layer of the encoder (required). state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention). tgt_mask (torch.Tensor, optional) – The mask for the tgt sequence (optional). memory_mask (torch.Tensor, optional) – the mask for the memory sequence (optional). padding_mask (torch.Tensor, optional) – the mask for the tgt keys per batch (optional). memory_key_padding_mask (torch.Tensor, optional) – the mask for the memory keys per batch (optional).
Returns:	Output Tensor of shape [S x B x H]
Return type:	torch.Tensor

flambe.nn.transformer_sru¶

Module Contents¶

`flambe.nn.transformer_sru`¶