flambe.nlp.transformers.bert

Intergation of the pytorch_transformers bert module

Module Contents

class flambe.nlp.transformers.bert.BERTTextField(vocab_file: str, sos_token: str = '[CLS]', eos_token: str = '[SEP]', do_lower_case: bool = False, max_len_truncate: int = 100, **kwargs)[source]

Bases: flambe.field.TextField, pytorch_transformers.BertTokenizer

Perform WordPiece tokenization.

Inspired by: https://github.com/huggingface/pytorch-pretrained-BERT/ blob/master/pytorch_pretrained_bert/tokenization.py.

Note that this object requires a pretrained vocabulary.

classmethod from_alias(cls, path: str = 'bert-base-cased', cache_dir: Optional[str] = None, do_lower_case: bool = False, max_len_truncate: int = 100, **kwargs)[source]

Initialize from a pretrained tokenizer.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese
process(self, example: str)[source]

Process an example, and create a Tensor.

Parameters:example (str) – The example to process, as a single string
Returns:The processed example, tokenized and numericalized
Return type:torch.Tensor
class flambe.nlp.transformers.bert.BERTEmbeddings(input_size_or_config: Union[int, pt.BertConfig], embedding_size: int = 768, embedding_dropout: float = 0.1, embedding_freeze: bool = False, pad_index: int = 0, max_position_embeddings: int = 512, type_vocab_size: int = 2, **kwargs)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_bert.BertPreTrainedModel

Integrate the pytorch_pretrained_bert BERT word embedding model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained BERT models. Simply used the from_pretrained class method when initializing the model.

Currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese

classmethod from_alias(cls, path: str = 'bert-base-cased', cache_dir: Optional[str] = None, **kwargs)[source]

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese
forward(self, data: Tensor)[source]

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a float tensor, batch first
Returns:
  • torch.Tensor – The encoded output, as a float tensor, batch_first
  • torch.Tensor, optional – The padding mask if a pad index was given
class flambe.nlp.transformers.bert.BERTEncoder(input_size_or_config: Union[int, pt.modeling_bert.BertConfig], hidden_size: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, pool_last: bool = False, **kwargs)[source]

Bases: flambe.nn.Module, pytorch_transformers.modeling_bert.BertPreTrainedModel

Integrate the pytorch_pretrained_bert BERT encoder model.

This module can be used as any normal encoder, or it can be loaded with the official pretrained BERT models. Simply used the from_pretrained class method when initializing the model.

Currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese

classmethod from_alias(cls, path: str = 'bert-base-cased', cache_dir: Optional[str] = None, pool_last: bool = False, **kwargs)[source]

Initialize from a pretrained model.

Parameters:path (str) – Path to a pretrained model, or one of the following string aliases currently available: . bert-base-uncased . bert-large-uncased . bert-base-cased . bert-large-cased . bert-base-multilingual-uncased . bert-base-multilingual-cased . bert-base-chinese
forward(self, data: Tensor, mask: Optional[Tensor] = None)[source]

Performs a forward pass through the network.

Parameters:data (torch.Tensor) – The input data, as a long tensor
Returns:The encoded output, as a float tensor or the pooled output
Return type:torch.Tensor