flambe.nn.transformer_sru

Module Contents

class flambe.nn.transformer_sru.TransformerSRU(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidrectional: bool = False, **kwargs: Dict[str, Any])[source]

Bases: flambe.nn.Module

A Transformer with an SRU replacing the FFN.

forward(self, src: torch.Tensor, tgt: torch.Tensor, src_mask: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None, tgt_key_padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]

Take in and process masked source/target sequences.

Parameters:
  • src (torch.Tensor) – the sequence to the encoder (required). shape: \((N, S, E)\).
  • tgt (torch.Tensor) – the sequence to the decoder (required). shape: \((N, T, E)\).
  • src_mask (torch.Tensor, optional) – the additive mask for the src sequence (optional). shape: \((S, S)\).
  • tgt_mask (torch.Tensor, optional) – the additive mask for the tgt sequence (optional). shape: \((T, T)\).
  • memory_mask (torch.Tensor, optional) – the additive mask for the encoder output (optional). shape: \((T, S)\).
  • src_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for src keys per batch (optional). shape: \((N, S)\).
  • tgt_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for tgt keys per batch (optional). shape: \((N, T)\).
  • memory_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for memory keys per batch (optional). shape” \((N, S)\).
Returns:

  • output (torch.Tensor) – The output sequence, shape: \((T, N, E)\).

  • Note ([src/tgt/memory]_mask should be filled with) – float(‘-inf’) for the masked positions and float(0.0) else. These masks ensure that predictions for position i depend only on the unmasked positions j and are applied identically for each sequence in a batch. [src/tgt/memory]_key_padding_mask should be a ByteTensor where False values are positions that should be masked with float(‘-inf’) and True values will be unchanged. This mask ensures that no information will be taken from position i if it is masked, and has a separate mask for each sequence in a batch.

  • Note (Due to the multi-head attention architecture in the) – transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode.

    where S is the source sequence length, T is the target sequence length, N is the batchsize, E is the feature number

class flambe.nn.transformer_sru.TransformerSRUEncoder(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidirectional: bool = False, **kwargs: Dict[str, Any])[source]

Bases: flambe.nn.Module

A TransformerSRUEncoder with an SRU replacing the FFN.

forward(self, src: torch.Tensor, state: Optional[torch.Tensor] = None, mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None)[source]

Pass the input through the endocder layers in turn.

Parameters:
  • src (torch.Tensor) – The sequnce to the encoder (required).
  • state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
  • mask (torch.Tensor, optional) – The mask for the src sequence (optional).
  • padding_mask (torch.Tensor, optional) – The mask for the src keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.
_reset_parameters(self)[source]

Initiate parameters in the transformer model.

class flambe.nn.transformer_sru.TransformerSRUDecoder(input_size: int = 512, d_model: int = 512, nhead: int = 8, num_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, **kwargs: Dict[str, Any])[source]

Bases: flambe.nn.Module

A TransformerSRUDecoderwith an SRU replacing the FFN.

forward(self, tgt: torch.Tensor, memory: torch.Tensor, state: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]

Pass the inputs (and mask) through the decoder layer in turn.

Parameters:
  • tgt (torch.Tensor) – The sequence to the decoder (required).
  • memory (torch.Tensor) – The sequence from the last layer of the encoder (required).
  • state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
  • tgt_mask (torch.Tensor, optional) – The mask for the tgt sequence (optional).
  • memory_mask (torch.Tensor, optional) – The mask for the memory sequence (optional).
  • padding_mask (torch.Tensor, optional) – The mask for the tgt keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.
  • memory_key_padding_mask (torch.Tensor, optional) – The mask for the memory keys per batch (optional).
Returns:

Return type:

torch.Tensor

_reset_parameters(self)[source]

Initiate parameters in the transformer model.

class flambe.nn.transformer_sru.TransformerSRUEncoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, bidirectional: bool = False, **kwargs: Dict[str, Any])[source]

Bases: flambe.nn.Module

A TransformerSRUEncoderLayer with an SRU replacing the FFN.

forward(self, src: torch.Tensor, state: Optional[torch.Tensor] = None, src_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None)[source]

Pass the input through the endocder layer.

Parameters:
  • src (torch.Tensor) – The sequence to the encoder layer (required).
  • state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
  • src_mask (torch.Tensor, optional) – The mask for the src sequence (optional).
  • padding_mask (torch.Tensor, optional) – The mask for the src keys per batch (optional). Should be True for tokens to leave untouched, and False for padding tokens.
Returns:

  • torch.Tensor – Output Tensor of shape [S x B x H]
  • torch.Tensor – Output state of the SRU of shape [N x B x H]

class flambe.nn.transformer_sru.TransformerSRUDecoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, sru_dropout: Optional[float] = None, **kwargs: Dict[str, Any])[source]

Bases: flambe.nn.Module

A TransformerSRUDecoderLayer with an SRU replacing the FFN.

forward(self, tgt: torch.Tensor, memory: torch.Tensor, state: Optional[torch.Tensor] = None, tgt_mask: Optional[torch.Tensor] = None, memory_mask: Optional[torch.Tensor] = None, padding_mask: Optional[torch.Tensor] = None, memory_key_padding_mask: Optional[torch.Tensor] = None)[source]

Pass the inputs (and mask) through the decoder layer.

Parameters:
  • tgt (torch.Tensor) – The sequence to the decoder layer (required).
  • memory (torch.Tensor) – The sequence from the last layer of the encoder (required).
  • state (Optional[torch.Tensor]) – Optional state from previous sequence encoding. Only passed to the SRU (not used to perform multihead attention).
  • tgt_mask (torch.Tensor, optional) – The mask for the tgt sequence (optional).
  • memory_mask (torch.Tensor, optional) – the mask for the memory sequence (optional).
  • padding_mask (torch.Tensor, optional) – the mask for the tgt keys per batch (optional).
  • memory_key_padding_mask (torch.Tensor, optional) – the mask for the memory keys per batch (optional).
Returns:

Output Tensor of shape [S x B x H]

Return type:

torch.Tensor