Deprecated API#
Tip
Use the right sidebar, the search widget or Ctrl+F
to find the items you need.
- class delu.EarlyStopping[source]#
Bases:
EarlyStopping
[DEPRECATED] (Renamed to
delu.tools.EarlyStopping
)- forget_bad_updates() None [source]#
In the new class, see
delu.tools.EarlyStopping.reset_unsuccessful_updates
.
- class delu.Timer[source]#
Bases:
Timer
[DEPRECATED] (Renamed to
delu.tools.Timer
)- __call__() float [source]#
In the new class, see
delu.tools.Timer.elapsed
.
- format(format_str: str, /) str [source]#
In the new class, see the tutorial in
delu.tools.Timer
.
- class delu.ProgressTracker[source]#
Bases:
object
Helps with early stopping and tracks the best metric value.
[DEPRECATED] (Instead, use
delu.EarlyStopping
and manually track the best score.)For
ProgressTracker
, the greater score is the better score. At any moment the tracker is in one of the following states:success: the last update increased the best score
fail: last
n > patience
updates did not improve the best scoreneutral: if neither success nor fail
Tutorial
progress = delu.ProgressTracker(2) progress.update(-999999999) assert progress.success # the first update always updates the best score progress.update(123) assert progress.success assert progress.best_score == 123 progress.update(0) assert not progress.success and not progress.fail progress.update(123) assert not progress.success and not progress.fail progress.update(123) # patience is 2 and the best score is not updated for more than 2 steps assert progress.fail assert progress.best_score == 123 # fail doesn't affect the best score progress.update(123) assert progress.fail # still no improvements progress.forget_bad_updates() assert not progress.fail and not progress.success assert progress.best_score == 123 progress.update(0) assert not progress.fail # just 1 bad update (the patience is 2) progress.reset() assert not progress.fail and not progress.success assert progress.best_score is None
- __init__(patience: int | None, min_delta: float = 0.0) None [source]#
- Parameters:
patience – Allowed number of unsuccessfull updates. For example, if patience is 2, then 2 unsuccessfull updates in a row is not a fail, but 3 unsuccessfull updates in a row is a fail.
None
means “infinite patience” and the progress tracker is never in the “fail” state.min_delta – the minimal improvement over the current best score to count it as success.
Examples
progress = delu.ProgressTracker(2) progress = delu.ProgressTracker(3, 0.1)
- property best_score: float | None#
The best score so far.
If the tracker is just created/reset, return
None
.
- class delu.Stream[source]#
Bases:
object
Smart wrapper for data loaders and iterables.
[DEPRECATED]
Stream
simplifies managing loops, especially in typical Deep Learning scenarios.Stream
:manages the “epoch” and “iteration” variables
allows to dump and restore loop’s state: epoch, iteration, etc.
allows to customize the size of epoch
allows to change the underlying data loader on the fly
enables useful patterns
Tutorial
Let’s start with the most common training loop:
loader = DataLoader(...) iteration = 0 for epoch in range(max_epoch): for batch in loader: iteration += 1 print('Epoch:', epoch, 'Iteration:', iteration) ...
Let’s enhance the loop using
Stream
:stream = delu.Stream(DataLoader(...)) # (A) for epoch in stream.epochs(max_epoch): # (B) for batch in epoch: # (C) print('Epoch:', stream.epoch, 'Iteration:', stream.iteration) # (D) ...
Some comments for the above code:
(A)
Stream
is created by passing a dataloader as a single argument (in fact, you can pass any iterable object); the dataloader is accessible viaStream.loader
(B)
epoch
is an iterator over batches for one epoch(C)
a progress bar for batches is displayed (for the whole training loop, not just for one epoch)(D)
Stream.epoch
andStream.iteration
are managed automatically
Saving the loop’s state and resuming the loop is possible with the methods
Stream.state_dict
,Stream.load_state_dict
. In practice, it can look like this:model = ... optimizer = ... stream = delu.Stream(DataLoader(...)) if load_from_checkpoint: checkpoint = torch.load(checkpoint_path) model.load_state_dict(checkpoint['model']) optimizer.load_state_dict(checkpoint['optimizer']) stream.load_state_dict(checkpoint['stream']) ... for epoch in stream.epochs(...): for batch in epoch: ... torch.save( { 'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'stream': stream.state_dict(), }, f'checkpoint_{stream.epoch}.pt' )
Note
Stream’s state does not include the loader’s state. See
Stream.state_dict
andStream.load_state_dict
for details.In order to customize the epoch size, pass the size as the second argument:
for epoch in stream.epochs(max_epoch, custom_epoch_size): for batch in epoch: ...
Changing the underlying loader on the fly is possible at any moment (even in the middle of epoch) via
Stream.set_loader
. For example:for epoch in stream.epochs(max_epoch, custom_epoch_size): for batch in epoch: ... if need_new_data(): stream.set_loader(new_loader)
If the method
Stream.epochs
does not fit your workflow and you want more control over the loop, there are more “low-level” methods (in fact,Stream.epochs
is just a thin wrapper around them):For example, the most common training loop can be implemented as follows:
# A while stream.epoch < max_epoch: stream.increment_epoch() for batch in stream.data(): ... # B while stream.epoch < max_epoch: stream.increment_epoch() for _ in range(len(stream.loader)): batch = stream.next() # stream.iteration is incremented automatically ...
The “infinite” stream of data can be implemented as follows:
for item in stream.data(float('inf')): ... if condition: # for example: `if stream.iteration % frequency == 0` ...
Note
For better technical understanding, keep in mind that
Stream
simply encapsulates an “infinite iterator” that is constantly moving forward. The behavior is absolutely the same for both finite and infinite iterables and can be expressed with the following loop:while True: for item in loader: # the loader which is passed to the constructor ...
Documentation for
Stream.next
andStream.data
provide helpful examples.- __init__(loader: Iterable) None [source]#
- Parameters:
loader – any kind of iterable (DataLoader, list, iterator, generator, …)
Examples
stream = delu.Stream([0, 1, 2, 3]) stream = delu.Stream(range(10)) import itertools stream = delu.Stream(itertools.repeat(0)) from torch.utils.data import DataLoader, TensorDataset dataset = TensorDataset(torch.randn(10, 2)) stream = delu.Stream(DataLoader(dataset, batch_size=3, shuffle=True))
- data(n_items: int | float | None = None) Iterator [source]#
Iterate over the loader.
Under the hood,
Stream.next
is called, hence,Stream.iteration
changes during iterations.- Parameters:
n_items – how many items to produce. If
None
, interpreted aslen(self.loader)
. Iffloat
, must befloat('inf')
ormath.inf
.
Examples
stream = delu.Stream(range(5)) assert list(stream.data()) == [0, 1, 2, 3, 4] assert list(stream.data(3)) == [0, 1, 2] # stream doesn't "start over"! assert list(stream.data(3)) == [3, 4, 0] assert list(stream.data(1)) == [1] assert list(stream.data(2)) == [2, 3]
for x in stream.data(float('inf')): ... if stream.iteration % frequency: ...
- property epoch: int#
Current epoch.
Technically, the number of
Stream.increment_epoch
calls.
- epochs(
- max_epoch: int | float,
- epoch_size: int | float | None = None,
- progress_bar_config: Dict[str, Any] | None = {},
Iterate over data epochs.
A shortcut for what is probably the most popular form of a training loop in Deep Learning (plus a progress bar):
for epoch in stream.epochs(max_epoch, epoch_size): for batch in epoch: ... # is equivalent to: while stream.epoch < max_epoch: stream.increment_epoch() for batch in stream.data(epoch_size): ...
- Parameters:
max_epoch – defines the number of epochs. The loop keeps running while
self.epoch < max_epoch
. Iffloat
, must befloat('inf')
ormath.inf
.epoch_size – the number of data items in one epoch (is forwarded to
Stream.data
).progress_bar_config – if not
None
(the default value is{}
!), a progress bar for iterations will be displayed and the argument will be interpreted as key-word arguments for tqdm. The following key-word arguments will be automatically added if not presented inprogress_bar_config
:initial
,total
(if can be inferred from the arguments and/or fromStream.loader
). If [ipywidgets
](https://ipywidgets.readthedocs.io) is installed and the program is running in JupyterLab (Jupyter Notebook), then the pretty widget is used instead of the text-based one.
- Returns:
Iterator over iterators over data from
Stream.loader
.
Examples
stream = delu.Stream(range(3)) for epoch in stream.epochs(2): for x in epoch: print(x) print('-')
0 1 2 - 0 1 2 -
stream = delu.Stream(range(3)) for epoch in stream.epochs(3, 2): for x in epoch: print(x) print('-')
0 1 - 2 0 - 1 2 -
- increment_epoch() None [source]#
Increment
Stream.epoch
.Examples
stream = delu.Stream(range(5)) assert stream.epoch == 0 stream.increment_epoch() assert stream.epoch == 1 stream.increment_epoch() assert stream.epoch == 2
- property iteration: int#
Current iteration.
Technically, the number of
Stream.next
calls.
- load_state_dict(state_dict: Dict[str, Any]) None [source]#
Load state dictionary.
- Parameters:
state_dict – state. Must be produced by
Stream.state_dict
.
Note
The method does not affect data that is produced by
Stream.epochs
,Stream.data
,Stream.next
(see the examples below), i.e. the method only sets some “metadata” such as epoch, iteration etc. If you want to load the “state of data stream”, you have to load the state of corresponding random number generators separately.Examples
stream = delu.Stream(range(10)) stream.next() stream.increment_epoch() assert stream.state_dict() == {'epoch': 1, 'iteration': 1} new_stream = delu.Stream(range(10)) new_stream.load_state_dict(stream.state_dict()) assert new_stream.state_dict() == {'epoch': 1, 'iteration': 1} assert new_stream.next() == 0 assert new_stream.state_dict() == {'epoch': 1, 'iteration': 2}
- next() Any [source]#
Get the next item and increment iteration.
- Returns:
The next item.
Examples
stream = delu.Stream(range(3)) assert stream.iteration == 0 assert stream.next() == 0 assert stream.iteration == 1 assert stream.next() == 1 assert stream.next() == 2 assert stream.next() == 0 assert stream.iteration == 4
while True: x = stream.next() ... if stream.iteration % frequency: ...
- reload_iterator() None [source]#
Set the underlying iterator to
iter(self.loader)
.If the underlying loader is a finite iterable, the method can be used to interrupt and skip the current epoch (i.e. skip its data). If the loader is an iterator, the method does nothing.
Examples
stream = delu.Stream(range(5)) assert stream.next() == 0 assert stream.next() == 1 stream.reload_iterator() assert stream.next() == 0 stream = delu.Stream(iter(range(5))) assert stream.next() == 0 assert stream.next() == 1 stream.reload_iterator() assert stream.next() == 2
- set_loader(loader: Iterable) None [source]#
Set new loader.
- Parameters:
loader –
Examples
from itertools import repeat stream = delu.Stream(repeat(0)) for x in stream.data(5): print(stream.iteration, x) if stream.iteration == 2: stream.set_loader(repeat(1))
1 0 2 0 3 1 4 1 5 1
- state_dict() Dict[str, Any] [source]#
Get the stream’s state.
The result can be passed to
Stream.load_state_dict
. The result includes:epoch
iteration
Note
Fields related to data (loader, iterator etc.) are NOT included in the state. If you want to save the “state of data stream” then you have to save the state of corresponding random number generators separately.
- Returns:
state
Examples
stream = delu.Stream(range(10)) assert stream.state_dict() == {'epoch': 0, 'iteration': 0} stream.next() stream.next() stream.increment_epoch() assert stream.state_dict() == {'epoch': 1, 'iteration': 2}
- delu.collate(iterable: Iterable) Any [source]#
Almost an alias for
torch.utils.data.dataloader.default_collate
.[DEPRECATED] (Instead, use
torch.utils.data.default_collate
)Namely, the input is allowed to be any kind of iterable, not only a list. Firstly, if it is not a list, it is transformed to a list. Then, the list is passed to the original function and the result is returned as is.
- delu.improve_reproducibility(base_seed: int | None, one_cuda_seed: bool = False) int [source]#
Set seeds and turn off non-deterministic algorithms.
[DEPRECATED] (Instead, use
delu.random.seed
and manually set flags mentioned in the PyTorch docs on reproducibility)Do everything possible to improve reproducibility for code that relies on global random number generators. See also the note below.
Sets:
seeds in
random
,numpy.random
,torch
,torch.cuda
- Parameters:
base_seed – the argument for
delu.random.seed
. IfNone
, a high-quality base seed is generated instead.one_cuda_seed – the argument for
delu.random.seed
.
- Returns:
- if
base_seed
is set toNone
, the generated base seed is returned; otherwise,
base_seed
is returned as is
- if
- Return type:
base_seed
Note
If you don’t want to choose the base seed, but still want to have a chance to reproduce things, you can use the following pattern:
print('Seed:', delu.improve_reproducibility(None))
Note
100% reproducibility is not always possible in PyTorch. See this page for details.
Examples
assert delu.improve_reproducibility(0) == 0 seed = delu.improve_reproducibility(None)
- class delu.evaluation[source]#
Context-manager & decorator for models evaluation.
[DEPRECATED] (Instead, use
model.eval()
+torch.no_inference/no_grad
)This code…
with delu.evaluation(model): # or: evaluation(model_0, model_1, ...) ... @delu.evaluation(model) # or: @evaluation(model_0, model_1, ...) def f(): ...
…is equivalent to the following
context = getattr(torch, 'inference_mode', torch.no_grad) with context(): model.eval() ... @context() def f(): model.eval() ...
- Parameters:
modules –
Note
The training status of modules is undefined once a context is finished or a decorated function returns.
Warning
The function must be used in the same way as
torch.no_grad
andtorch.inference_mode
, i.e. only as a context manager or a decorator as shown below in the examples. Otherwise, the behaviour is undefined.Warning
Contrary to
torch.no_grad
andtorch.inference_mode
, the function cannot be used to decorate generators. So, in the case of generators, you have to manually create a context:def my_generator(): with delu.evaluation(...): for a in b: yield c
Examples
a = torch.nn.Linear(1, 1) b = torch.nn.Linear(2, 2) with delu.evaluation(a): ... with delu.evaluation(a, b): ... @delu.evaluation(a) def f(): ... @delu.evaluation(a, b) def f(): ...
- class delu.utils.data.Enumerate[source]#
Bases:
Dataset
Make a PyTorch dataset return indices in addition to items (like
enumerate
, but for datasets).[DEPRECATED]
TL;DR:
dataset[i] -> value
enumerated_dataset[i] -> (i, value)
Usage
Creating the initial non-enumerated
dataset
:>>> from torch.utils.data import DataLoader, TensorDataset >>> >>> X = torch.arange(10).float().view(5, 2) >>> X tensor([[0., 1.], [2., 3.], [4., 5.], [6., 7.], [8., 9.]]) >>> Y = -10 * torch.arange(5) >>> Y tensor([ 0, -10, -20, -30, -40]) >>> >>> dataset = TensorDataset(X, Y) >>> dataset[2] (tensor([4., 5.]), tensor(-20))
The enumerated dataset returns indices in addition to items:
>>> enumerated_dataset = delu.utils.data.Enumerate(dataset) >>> enumerated_dataset[2] (2, (tensor([4., 5.]), tensor(-20))) >>> >>> for x_batch, y_batch in DataLoader( ... dataset, batch_size=2 ... ): ... ... ... >>> for batch_idx, (x_batch, y_batch) in DataLoader( ... enumerated_dataset, batch_size=2 ... ): ... print(batch_idx) tensor([0, 1]) tensor([2, 3]) tensor([4])
The original dataset and its size remain accessible:
>>> enumerated_dataset.dataset is dataset True >>> len(enumerated_dataset) == len(dataset) True
- class delu.utils.data.IndexDataset[source]#
Bases:
Dataset
A trivial dataset that yields indices back to user (useful for DistributedDataParallel (DDP)).
[DEPRECATED]
This simple dataset is useful when both conditions are true:
A dataloader that yields batches of indices instead of objects is needed
The Distributed Data Parallel setup is used.
Note
If only the first condition is true, consider using the combinatation of
torch.randperm
andtorch.Tensor.split
instead.Usage
>>> >>> from torch.utils.data import DataLoader >>> from torch.utils.data.distributed import DistributedSampler >>> >>> train_size = 1000 >>> batch_size = 64 >>> dataset = delu.data.IndexDataset(train_size) >>> # The dataset is really *that* trivial: >>> for i in range(train_size): ... assert dataset[i] == i >>> dataloader = DataLoader( ... dataset, ... batch_size, ... sampler=DistributedSampler(dataset), ... ) >>> for epoch in range(n_epochs): ... for batch_indices in dataloader: ... ...
- class delu.data.IndexDataset[source]#
Bases:
IndexDataset
[DEPRECATED]
- class delu.data.FnDataset[source]#
Bases:
Dataset
Create simple PyTorch datasets without classes and inheritance.
[DEPRECATED]
FnDataset
allows avoiding implementingDataset
classes in simple cases.Tutorial
First, a quick example. Without
FnDataset
:from PIL import Image class ImagesList(Dataset): def __init__(self, filenames, transform): self.filenames = filenames self.transform = transform def __len__(self): return len(self.filenames) def __getitem__(self, index): return self.transform(Image.open(self.filenames[index])) dataset = ImagesList(filenames, transform)
With
FnDataset
:dataset = delu.data.FnDataset(Image.open, filenames, transform) # Cache images after the first load: from functools import lru_cache dataset = delu.data.FnDataset(lru_cache(None)(Image.open), filenames)
In other words, with the vanilla PyTorch, in order to create a dataset, you have to inherit from
torch.utils.data.Dataset
and implement three methods:__init__
__len__
__getitem__
With
FnDataset
the only thing you may need to implement is thefn
argument that will power__getitem__
. The easiest way to learnFnDataset
is to go through the examples below.A list of images:
dataset = delu.data.FnDataset(Image.open, filenames) # dataset[i] returns Image.open(filenames[i])
A list of images that are cached after the first load:
from functools import lru_cache dataset = delu.data.FnDataset(lru_cache(None)(Image.open), filenames)
pathlib.Path
is handy for creating datasets that read from files:images_dir = Path(...) dataset = delu.data.FnDataset(Image.open, images_dir.iterdir())
If you only need files with specific extensions:
dataset = delu.data.FnDataset(Image.open, images_dir.glob('*.png'))
If you only need files with specific extensions located in all subfolders:
dataset = delu.data.FnDataset( Image.open, (x for x in images_dir.rglob('**/*.png') if condition(x)) )
A segmentation dataset:
image_filenames = ... gt_filenames = ... def get(i): return Image.open(image_filenames[i]), Image.open(gt_filenames[i]) dataset = delu.data.FnDataset(get, len(image_filenames))
A dummy dataset that demonstrates that
FnDataset
is a very general thing:def f(x): return x * 10 def g(x): return x * 2 dataset = delu.data.FnDataset(f, 3, g) # dataset[i] returns g(f(i)) assert len(dataset) == 3 assert dataset[0] == 0 assert dataset[1] == 20 assert dataset[2] == 40
- __getitem__(index: int) Any [source]#
Get value by index.
See
FnDataset
for details.- Parameters:
index –
- Returns:
value
- __init__( ) None [source]#
- Parameters:
fn – the function that produces values based on arguments from
args
args – arguments for
fn
. If an iterable, but not a list, then is casted to a list. If an integer, then the behavior is the same as forlist(range(args))
. The size ofargs
defines the return value forFnDataset.__len__
.transform – if presented, is applied to the return value of
fn
inFnDataset.__getitem__
Examples
import PIL.Image as Image import torchvision.transforms as T dataset = delu.data.FnDataset(Image.open, filenames, T.ToTensor())
- class delu.data.IndexLoader[source]#
Bases:
object
Like
DataLoader
, but over indices instead of data.[DEPRECATED] (Instead, use
delu.data.IndexDataset
andDataLoader
)The shuffling logic is delegated to the native PyTorch DataLoader, i.e. no custom logic is performed under the hood. The data loader which actually generates indices is available as
IndexLoader.loader
.Examples
Usage for training:
train_loader = delu.data.IndexLoader( len(train_dataset), batch_size, shuffle=True ) for epoch in range(n_epochs): for batch_idx in train_loader: ...
Other examples:
dataset_size = 10 # len(dataset) for batch_idx in delu.data.IndexLoader(dataset_size, batch_size=3): print(batch_idx)
tensor([0, 1, 2]) tensor([3, 4, 5]) tensor([6, 7, 8]) tensor([9])
dataset_size = 10 # len(dataset) for batch_idx in delu.data.IndexLoader(dataset_size, 3, drop_last=True): print(batch_idx)
tensor([0, 1, 2]) tensor([3, 4, 5]) tensor([6, 7, 8])
- __init__( ) None [source]#
- Parameters:
size – the number of items (for example,
len(dataset)
)args – positional arguments for
torch.utils.data.DataLoader
device – if not CPU, then all indices are materialized and moved to the device at the beginning of every loop. It can be useful when the indices are applied to non-CPU data (e.g. CUDA-tensors) and moving data between devices takes non-negligible time (which can happen in the case of simple and fast models like MLPs).
kwargs – keyword arguments for
torch.utils.data.DataLoader
- property loader: DataLoader#
The original DataLoader.
- delu.data.make_index_dataloader(
- size: int,
- *args,
- **kwargs,
Make
DataLoader
over indices instead of data.[DEPRECATED] (Instead, use
delu.data.IndexDataset
andDataLoader
)This is just a shortcut for
torch.utils.data.DataLoader(delu.data.IndexDataset(...), ...)
.- Parameters:
size – the dataset size
*args – positional arguments for
torch.utils.data.DataLoader
**kwargs – keyword arguments for
torch.utils.data.DataLoader
Examples
Usage for training:
train_loader = delu.data.make_index_dataloader( len(train_dataset), batch_size, shuffle=True ) for epoch in range(n_epochs): for i_batch in train_loader: x_batch = X[i_batch] y_batch = Y[i_batch] ...
Other examples:
dataset_size = 10 # len(dataset) for batch_idx in delu.data.make_index_dataloader( dataset_size, batch_size=3 ): print(batch_idx)
tensor([0, 1, 2]) tensor([3, 4, 5]) tensor([6, 7, 8]) tensor([9])
dataset_size = 10 # len(dataset) for batch_idx in delu.data.make_index_dataloader( dataset_size, 3, drop_last=True ): print(batch_idx)
tensor([0, 1, 2]) tensor([3, 4, 5]) tensor([6, 7, 8])
- delu.hardware.free_memory(*args, **kwargs) None [source]#
[DEPRECATED] (Instead, use
delu.cuda.free_memory
.)
- delu.hardware.get_gpus_info() Dict[str, Any] [source]#
Get information about GPU devices: driver version, memory, utilization etc.
[DEPRECATED] (Instead, use functions from
torch.cuda
)The example below shows what kind of information is returned as the result. All figures about memory are given in bytes.
- Returns:
Information about GPU devices.
Warning
The ‘devices’ value contains information about all gpus regardless of the value of
CUDA_VISIBLE_DEVICES
.Examples
print(delu.hardware.get_gpu_info())
Output example (formatted for convenience):
{ 'driver': '440.33.01', 'devices': [ { 'name': 'GeForce RTX 2080 Ti', 'memory_total': 11554717696, 'memory_free': 11554652160, 'memory_used': 65536, 'utilization': 0, }, { 'name': 'GeForce RTX 2080 Ti', 'memory_total': 11552096256, 'memory_free': 11552030720, 'memory_used': 65536, 'utilization': 0, }, ], }
- delu.nn.named_sequential( ) Sequential [source]#
A shortcut for creating
torch.nn.Sequential
with named modules without usingcollections.OrderedDict
.[DEPRECATED]
The sole purpose of this function is to improve the ergonomics and readability of the common construction.
Usage
This …
>>> # xdoctest: +SKIP >>> m = delu.nn.named_sequential( ... ('linear1', nn.Linear(10, 20)), ... ('activation', nn.ReLU()), ... ('linear2', nn.Linear(20, 1)), ... )
… is equivalent to this:
>>> # xdoctest: +SKIP >>> from collections import OrderedDict >>> m = torch.nn.Sequential( ... OrderedDict( ... [ ... ('linear1', nn.Linear(10, 20)), ... ('activation', nn.ReLU()), ... ('linear2', nn.Linear(20, 1)), ... ] ... ) ... )
- Parameters:
names_and_modules – the names and the modules.