CorpusSampler(batch_size: int = 128, unroll_size: int = 128, n_workers: int = 0, pin_memory: bool = False, downsample: Optional[float] = None, drop_last: bool = True)¶
Implement a CorpusSampler object.
This object is useful for iteration over a large corpus of text in an ordered way. It takes as input a dataset with a single example containing the sequence of tokens and will yield batches that contain both source sequences of tensors corresponding to the Corpus’s text, and these same sequences shifted by one as the target.
collate_fn(data: Sequence[Tuple[Tensor, Tensor]])¶
Create a batch from data.
Parameters: data (Sequence[Tuple[Tensor, Tensor]]) – List of (source, target) tuples. Returns: Source and target Tensors. Return type: Tuple[Tensor, Tensor]
sample(self, data: Sequence[Sequence[Tensor]], n_epochs: int = 1)¶
Sample from the list of features and yields batches.
- data (Sequence[Sequence[Tensor, ..]]) – The input data to sample from
- n_epochs (int, optional) – The number of epochs to run in the output iterator. Use -1 to run infinitely.
Iterator[Tuple[Tensor]] – A batch of data, as a tuple of Tensors
length(self, data: Sequence[Sequence[torch.Tensor]])¶
Return the number of batches in the sampler.
Parameters: data (Sequence[Sequence[torch.Tensor, ..]]) – The input data to sample from Returns: The number of batches that would be created per epoch Return type: int