flambe.nlp.transformers

Package Contents

class flambe.nlp.transformers.BERTTextField(vocab_file: str, sos_token: str = '[CLS]', eos_token: str = '[SEP]', do_lower_case: bool = False, max_len_truncate: int = 100, **kwargs)[source]

Bases: flambe.field.TextField, pytorch_transformers.BertTokenizer

Perform WordPiece tokenization.

Inspired by: https://github.com/huggingface/pytorch-pretrained-BERT/ blob/master/pytorch_pretrained_bert/tokenization.py.

Note that this object requires a pretrained vocabulary.

classmethod from_alias(cls, path: str = 'bert-base-cased', cache_dir: Optional[str] = None, do_lower_case: bool = False, max_len_truncate: int = 100, **kwargs)

Initialize from a pretrained tokenizer.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese
process(self, example: str)

Process an example, and create a Tensor.

Parameters:example (str) – The example to process, as a single string
Returns:The processed example, tokenized and numericalized
Return type:torch.Tensor
class flambe.nlp.transformers.BERTEmbeddings(input_size_or_config: Union[int, pt.BertConfig], embedding_size: int = 768, embedding_dropout: float = 0.1, embedding_freeze: bool = False, pad_index: int = 0, max_position_embeddings: int = 512, type_vocab_size: int = 2, **kwargs)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_bert.BertPreTrainedModel

Integrate the pytorch_pretrained_bert BERT word embedding model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained BERT models. Simply used the from_pretrained class method when initializing the model.

Currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese

classmethod from_alias(cls, path: str = 'bert-base-cased', cache_dir: Optional[str] = None, **kwargs)

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese
forward(self, data: Tensor)

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a float tensor, batch first
Returns:
  • torch.Tensor – The encoded output, as a float tensor, batch_first
  • torch.Tensor, optional – The padding mask if a pad index was given
class flambe.nlp.transformers.BERTEncoder(input_size_or_config: Union[int, pt.modeling_bert.BertConfig], hidden_size: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, pool_last: bool = False, **kwargs)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_bert.BertPreTrainedModel

Integrate the pytorch_pretrained_bert BERT encoder model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained BERT models. Simply used the from_pretrained class method when initializing the model.

Currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese

classmethod from_alias(cls, path: str = 'bert-base-cased', cache_dir: Optional[str] = None, pool_last: bool = False, **kwargs)

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese
forward(self, data: Tensor, mask: Optional[Tensor] = None)

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a long tensor
Returns:The encoded output, as a float tensor or the pooled output
Return type:torch.Tensor
class flambe.nlp.transformers.OpenAIGPTTextField(vocab_file: str, merges_file: str, max_len: int = 100, lower: bool = False)[source]

Bases: flambe.field.TextField, pytorch_transformers.OpenAIGPTTokenizer

Perform WordPiece tokenization.

Inspired by: https://github.com/huggingface/pytorch-pretrained-BERT/ blob/master/pytorch_pretrained_bert/tokenization_openai.py.

Note that this object requires a pretrained vocabulary.

classmethod from_alias(cls, path: str = 'openai-gpt', cache_dir: Optional[str] = None)

Initialize from a pretrained tokenizer.

process(self, example: str)

Process an example, and create a Tensor.

Parameters:example (str) – The example to process, as a single string
Returns:The processed example, tokenized and numericalized
Return type:torch.Tensor
class flambe.nlp.transformers.OpenAIGPTEmbeddings(input_size_or_config: Union[int, pt.OpenAIGPTConfig] = 40478, embedding_size: int = 768, embedding_dropout: float = 0.1, embedding_freeze: bool = False, pad_index: int = 0, n_special: int = 0, n_positions: int = 512, initializer_range=0.02)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_openai.OpenAIGPTPreTrainedModel

Integrate the pytorch_pretrained_bert OpenAI embedding model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained OpenAI models. Simply used the from_pretrained class method when initializing the model.

classmethod from_alias(cls, path: str = 'openai-gpt', cache_dir: Optional[str] = None)

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . openai-gpt
set_num_special_tokens(self, num_special_tokens)

Update input embeddings with new embedding matrice if needed

forward(self, data: Tensor)

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a float tensor, batch first
Returns:
  • torch.Tensor – The encoded output, as a float tensor, batch_first
  • torch.Tensor, optional – The padding mask if a pad index was given
class flambe.nlp.transformers.OpenAIGPTEncoder(input_size_or_config: Union[int, pt.OpenAIGPTConfig] = 768, n_ctx: int = 512, n_layer: int = 12, n_head: int = 12, afn: Union[str, nn.Module] = 'gelu', resid_pdrop: float = 0.1, embd_pdrop: float = 0.1, attn_pdrop: float = 0.1, layer_norm_epsilon: float = 1e-05, initializer_range=0.02)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_openai.OpenAIGPTPreTrainedModel

Integrate the pytorch_pretrained_bert OpenAIGPT encoder model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained BERT models. Simply used the from_pretrained class method when initializing the model.

Currently available: . openai-gpt

classmethod from_alias(cls, path: str = 'openai-gpt', cache_dir: Optional[str] = None)

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . openai-gpt
forward(self, data: Tensor, mask: Optional[Tensor] = None)

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a long tensor
Returns:The encoded output, as a float tensor or the pooled output
Return type:torch.Tensor
class flambe.nlp.transformers.AdamW[source]

Bases: flambe.Component, pytorch_transformers.optimization.Optimizer

class flambe.nlp.transformers.ConstantLRSchedule[source]

Bases: flambe.Component, pytorch_transformers.ConstantLRSchedule

class flambe.nlp.transformers.WarmupConstantSchedule[source]

Bases: flambe.Component, pytorch_transformers.WarmupConstantSchedule

class flambe.nlp.transformers.WarmupLinearSchedule[source]

Bases: flambe.Component, pytorch_transformers.WarmupLinearSchedule