flambe.nlp.transformers

Package Contents

class flambe.nlp.transformers.PretrainedTransformerField(alias: str, cache_dir: Optional[str] = None, max_len_truncate: int = 500, add_special_tokens: bool = True, **kwargs)[source]

Bases: flambe.field.Field

Field intergation of the transformers library.

Instantiate this object using any alias available in the transformers library. More information can be found here:

https://huggingface.co/transformers/

padding_idx :int

Get the padding index.

Returns:The padding index in the vocabulary
Return type:int
vocab_size :int

Get the vocabulary length.

Returns:The length of the vocabulary
Return type:int
process(self, example: Union[str, Tuple[Any], List[Any], Dict[Any, Any]])

Process an example, and create a Tensor.

Parameters:example (str) – The example to process, as a single string
Returns:The processed example, tokenized and numericalized
Return type:torch.Tensor
class flambe.nlp.transformers.PretrainedTransformerEmbedder(alias: str, cache_dir: Optional[str] = None, padding_idx: Optional[int] = None, pool: bool = False, **kwargs)[source]

Bases: flambe.nn.Module

Embedder intergation of the transformers library.

Instantiate this object using any alias available in the transformers library. More information can be found here:

https://huggingface.co/transformers/

forward(self, data: torch.Tensor, token_type_ids: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.Tensor] = None, head_mask: Optional[torch.Tensor] = None)

Perform a forward pass through the network.

If pool was provided, will only return the pooled output of shape [B x H]. Otherwise, returns the full sequence encoding of shape [S x B x H].

Parameters:
  • data (torch.Tensor) – The input data of shape [B x S]
  • token_type_ids (Optional[torch.Tensor], optional) – Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token. Has shape [B x S]
  • attention_mask (Optional[torch.Tensor], optional) – FloatTensor of shape [B x S]. Masked values should be 0 for padding tokens, 1 otherwise.
  • position_ids (Optional[torch.Tensor], optional) – Indices of positions of each input sequence tokens in the position embedding. Defaults to the order given in the input. Has shape [B x S].
  • head_mask (Optional[torch.Tensor], optional) – Mask to nullify selected heads of the self-attention modules. Should be 0 for heads to mask, 1 otherwise. Has shape [num_layers x num_heads]
Returns:

If pool is True, returns a tneosr of shape [B x H], else returns an encoding for each token in the sequence of shape [B x S x H].

Return type:

torch.Tensor

__getattr__(self, name: str)

Override getattr to inspect config.

Parameters:name (str) – The attribute to fetch
Returns:The attribute
Return type:Any