`flambe.nlp.language_modeling.datasets`¶

Module Contents¶

class flambe.nlp.language_modeling.datasets.PTBDataset(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶

Bases: flambe.dataset.TabularDataset

The official PTB dataset.

PTB_URL = https://raw.githubusercontent.com/yoonkim/lstm-char-cnn/master/data/ptb/[source]¶

_process(self, file: bytes)[source]¶

Process the input file.

Parameters:	field (str) – The input file, as bytes
Returns:	List of examples, where each example is a single element tuple containing the text.
Return type:	List[Tuple[str]]

class flambe.nlp.language_modeling.datasets.Wiki103(split_by_line: bool = False, end_of_line_token: Optional[str] = '<eol>', remove_headers: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶

Bases: flambe.dataset.TabularDataset

The official WikiText103 dataset.

WIKI_URL = https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip[source]¶

_process(self, file: bytes)[source]¶

Process the input file.

Parameters:	file (bytes) – The input file, as a byte string
Returns:	List of examples, where each example is a single element tuple containing the text.
Return type:	List[Tuple[str]]

class flambe.nlp.language_modeling.datasets.Enwiki8(num_eval_symbols: int = 5000000, remove_end_of_line: bool = False, cache: bool = False, transform: Dict[str, Union[Field, Dict]] = None)[source]¶

Bases: flambe.dataset.TabularDataset

The official WikiText103 dataset.

ENWIKI_URL = http://mattmahoney.net/dc/enwik8.zip[source]¶

_process(self, file: bytes)[source]¶

Process the input file.

Parameters:	file (bytes) – The input file, as a byte string
Returns:	List of examples, where each example is a single element tuple containing the text.
Return type:	List[Tuple[str]]

flambe.nlp.language_modeling.datasets¶

Module Contents¶

`flambe.nlp.language_modeling.datasets`¶