flambe.nlp.language_modeling.datasets

Module Contents

class flambe.nlp.language_modeling.datasets.PTBDataset(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]

Bases: flambe.dataset.TabularDataset

The official PTB dataset.

PTB_URL = https://raw.githubusercontent.com/yoonkim/lstm-char-cnn/master/data/ptb/[source]
_process(self, file: bytes)[source]

Process the input file.

Parameters:field (str) – The input file, as bytes
Returns:List of examples, where each example is a single element tuple containing the text.
Return type:List[Tuple[str]]
class flambe.nlp.language_modeling.datasets.Wiki103(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', remove_headers: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]

Bases: flambe.dataset.TabularDataset

The official WikiText103 dataset.

WIKI_URL = https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip[source]
_process(self, file: bytes)[source]

Process the input file.

Parameters:file (bytes) – The input file, as a byte string
Returns:List of examples, where each example is a single element tuple containing the text.
Return type:List[Tuple[str]]
class flambe.nlp.language_modeling.datasets.Enwiki8(num_eval_symbols: int = 5000000, remove_end_of_line: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]

Bases: flambe.dataset.TabularDataset

The official WikiText103 dataset.

ENWIKI_URL = http://mattmahoney.net/dc/enwik8.zip[source]
_process(self, file: bytes)[source]

Process the input file.

Parameters:file (bytes) – The input file, as a byte string
Returns:List of examples, where each example is a single element tuple containing the text.
Return type:List[Tuple[str]]