flambe.nlp.transformers.openai

Intergation of the pytorch_transformers openai module

Module Contents

class flambe.nlp.transformers.openai.OpenAIGPTTextField(vocab_file: str, merges_file: str, max_len: int = 100, lower: bool = False)[source]

Bases: flambe.field.TextField, pytorch_transformers.OpenAIGPTTokenizer

Perform WordPiece tokenization.

Inspired by: https://github.com/huggingface/pytorch-pretrained-BERT/ blob/master/pytorch_pretrained_bert/tokenization_openai.py.

Note that this object requires a pretrained vocabulary.

classmethod from_alias(cls, path: str = 'openai-gpt', cache_dir: Optional[str] = None)[source]

Initialize from a pretrained tokenizer.

process(self, example: str)[source]

Process an example, and create a Tensor.

Parameters:example (str) – The example to process, as a single string
Returns:The processed example, tokenized and numericalized
Return type:torch.Tensor
class flambe.nlp.transformers.openai.OpenAIGPTEmbeddings(input_size_or_config: Union[int, pt.OpenAIGPTConfig] = 40478, embedding_size: int = 768, embedding_dropout: float = 0.1, embedding_freeze: bool = False, pad_index: int = 0, n_special: int = 0, n_positions: int = 512, initializer_range=0.02)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_openai.OpenAIGPTPreTrainedModel

Integrate the pytorch_pretrained_bert OpenAI embedding model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained OpenAI models. Simply used the from_pretrained class method when initializing the model.

classmethod from_alias(cls, path: str = 'openai-gpt', cache_dir: Optional[str] = None)[source]

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . openai-gpt
set_num_special_tokens(self, num_special_tokens)[source]

Update input embeddings with new embedding matrice if needed

forward(self, data: Tensor)[source]

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a float tensor, batch first
Returns:
  • torch.Tensor – The encoded output, as a float tensor, batch_first
  • torch.Tensor, optional – The padding mask if a pad index was given
class flambe.nlp.transformers.openai.OpenAIGPTEncoder(input_size_or_config: Union[int, pt.OpenAIGPTConfig] = 768, n_ctx: int = 512, n_layer: int = 12, n_head: int = 12, afn: Union[str, nn.Module] = 'gelu', resid_pdrop: float = 0.1, embd_pdrop: float = 0.1, attn_pdrop: float = 0.1, layer_norm_epsilon: float = 1e-05, initializer_range=0.02)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_openai.OpenAIGPTPreTrainedModel

Integrate the pytorch_pretrained_bert OpenAIGPT encoder model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained BERT models. Simply used the from_pretrained class method when initializing the model.

Currently available: . openai-gpt

classmethod from_alias(cls, path: str = 'openai-gpt', cache_dir: Optional[str] = None)[source]

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . openai-gpt
forward(self, data: Tensor, mask: Optional[Tensor] = None)[source]

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a long tensor
Returns:The encoded output, as a float tensor or the pooled output
Return type:torch.Tensor