CorpusSampler(batch_size: int = 128, unroll_size: int = 128, n_workers: int = 0, pin_memory: bool = False, downsample: Optional[float] = None, drop_last: bool = True)¶
Implement a CorpusSampler object.
This object is useful for iteration over a large corpus of text in an ordered way. It takes as input a dataset with a single example containing the sequence of tokens.
collate_fn(data: Sequence[Tuple[Tensor, Tensor]])¶
Create a batch from data.
Parameters: data (Sequence[Tuple[Tensor, Tensor]]) – List of (source, target) tuples. Returns: Source and target Tensors. Return type: Tuple[Tensor, Tensor]
sample(self, data: Sequence[Sequence[Tensor]], n_epochs: int = 1)¶
Sample from the list of features and yields batches.
- data (Sequence[Sequence[Tensor, ..]]) – The input data to sample from
- n_epochs (int, optional) – The number of epochs to run in the output iterator. Use -1 to run infinitely.
Iterator[Tuple[Tensor]] – A batch of data, as a tuple of Tensors
length(self, data: Sequence[Sequence[torch.Tensor]])¶
Return the number of batches in the sampler.
Parameters: data (Sequence[Sequence[torch.Tensor, ..]]) – The input data to sample from Returns: The number of batches that would be created per epoch Return type: int