delu.data.IndexDataset#
- class delu.data.IndexDataset(size)[source]#
A trivial dataset that yeilds indices back to user (useful for DDP).
This simple dataset is useful when:
you need a dataloader that yeilds batches of indices instead of objects
AND you work in the Distributed Data Parallel setup
Note
If only the first condition is true, consider using the combinatation of
torch.randperm
andtorch.Tensor.split
instead.Example:
from torch.utils.data import DataLoader from torch.utils.data.distributed import DistributedSampler train_size = 123456 batch_size = 123 dataset = delu.data.IndexDataset(dataset_size) for i in range(train_size): assert dataset[i] == i dataloader = DataLoader( dataset, batch_size, sampler=DistributedSampler(dataset) ) for epoch in range(n_epochs): for batch_indices in dataloader: ...
Methods
Initialize self.
Get the same index back.
Get the dataset size.