delu.data.IndexDataset#

class delu.data.IndexDataset(size)[source]#

A trivial dataset that yeilds indices back to user (useful for DDP).

This simple dataset is useful when:

  1. you need a dataloader that yeilds batches of indices instead of objects

  2. AND you work in the Distributed Data Parallel setup

Note

If only the first condition is true, consider using the combinatation of torch.randperm and torch.Tensor.split instead.

Example:

from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler

train_size = 123456
batch_size = 123
dataset = delu.data.IndexDataset(dataset_size)
for i in range(train_size):
    assert dataset[i] == i
dataloader = DataLoader(
    dataset,
    batch_size,
    sampler=DistributedSampler(dataset)
)

for epoch in range(n_epochs):
    for batch_indices in dataloader:
        ...

Methods

__init__

Initialize self.

__getitem__

Get the same index back.

__len__

Get the dataset size.