delu.data.FnDataset#

class delu.data.FnDataset(fn, args, transform=None)[source]#

[DEPRECATED | ] Create simple PyTorch datasets without classes and inheritance.

FnDataset allows avoiding implementing Dataset classes in simple cases.

Tutorial

First, a quick example. Without FnDataset:

from PIL import Image

class ImagesList(Dataset):
    def __init__(self, filenames, transform):
        self.filenames = filenames
        self.transform = transform

    def __len__(self):
        return len(self.filenames)

    def __getitem__(self, index):
        return self.transform(Image.open(self.filenames[index]))

dataset = ImagesList(filenames, transform)

With FnDataset:

dataset = delu.data.FnDataset(Image.open, filenames, transform)
# Cache images after the first load:
from functools import lru_cache
dataset = delu.data.FnDataset(lru_cache(None)(Image.open), filenames)

In other words, with the vanilla PyTorch, in order to create a dataset, you have to inherit from torch.utils.data.Dataset and implement three methods:

  • __init__

  • __len__

  • __getitem__

With FnDataset the only thing you may need to implement is the fn argument that will power __getitem__. The easiest way to learn FnDataset is to go through the examples below.

A list of images:

dataset = delu.data.FnDataset(Image.open, filenames)
# dataset[i] returns Image.open(filenames[i])

A list of images that are cached after the first load:

from functools import lru_cache
dataset = delu.data.FnDataset(lru_cache(None)(Image.open), filenames)

pathlib.Path is handy for creating datasets that read from files:

images_dir = Path(...)
dataset = delu.data.FnDataset(Image.open, images_dir.iterdir())

If you only need files with specific extensions:

dataset = delu.data.FnDataset(Image.open, images_dir.glob('*.png'))

If you only need files with specific extensions located in all subfolders:

dataset = delu.data.FnDataset(
    Image.open, (x for x in images_dir.rglob('**/*.png') if condition(x))
)

A segmentation dataset:

image_filenames = ...
gt_filenames = ...

def get(i):
    return Image.open(image_filenames[i]), Image.open(gt_filenames[i])

dataset = delu.data.FnDataset(get, len(image_filenames))

A dummy dataset that demonstrates that FnDataset is a very general thing:

def f(x):
    return x * 10

def g(x):
    return x * 2

dataset = delu.data.FnDataset(f, 3, g)
# dataset[i] returns g(f(i))
assert len(dataset) == 3
assert dataset[0] == 0
assert dataset[1] == 20
assert dataset[2] == 40

Methods

__init__

Initialize self.

__getitem__

Get value by index.

__len__

Get the dataset size.