flambe.nlp.language_modeling.sampler

Module Contents

class flambe.nlp.language_modeling.sampler.CorpusSampler(batch_size: int = 128, unroll_size: int = 128, n_workers: int = 0, pin_memory: bool = False, downsample: Optional[float] = None, drop_last: bool = True)[source]

Bases: flambe.sampler.sampler.Sampler

Implement a CorpusSampler object.

This object is useful for iteration over a large corpus of text in an ordered way. It takes as input a dataset with a single example containing the sequence of tokens.

static collate_fn(data: Sequence[Tuple[Tensor, Tensor]])[source]

Create a batch from data.

Parameters:data (Sequence[Tuple[Tensor, Tensor]]) – List of (source, target) tuples.
Returns:Source and target Tensors.
Return type:Tuple[Tensor, Tensor]
sample(self, data: Sequence[Sequence[Tensor]], n_epochs: int = 1)[source]

Sample from the list of features and yields batches.

Parameters:
  • data (Sequence[Sequence[Tensor, ..]]) – The input data to sample from
  • n_epochs (int, optional) – The number of epochs to run in the output iterator. Use -1 to run infinitely.
Yields:

Iterator[Tuple[Tensor]] – A batch of data, as a tuple of Tensors

length(self, data: Sequence[Sequence[torch.Tensor]])[source]

Return the number of batches in the sampler.

Parameters:data (Sequence[Sequence[torch.Tensor, ..]]) – The input data to sample from
Returns:The number of batches that would be created per epoch
Return type:int