NLinear#
- class delu.nn.NLinear[source]#
Bases:
Module
N linear layers for N inputs:
(*, *N, D1) -> (*, *N, D2)
For a tensor
x
of the shape(*B, *N, D1)
, where*B
are batch dimensions,*N
are object dimensions (e.g. a sequence size in NLP, or width & height in computer vision) andD1
is the current embedding size (e.g. the number of features/channels):applying
torch.nn.Linear(D1, D2)
tox
means applying the same linear transformation to each of themath.prod(N)
embeddings.applying
NLinear(N, D1, D2)
tox
means applying a separate linear transformation to each of themath.prod(N)
embeddings.
In other words,
NLinear(N, D1, D2)
is a collection ofmath.prod(N)
non-sharedtorch.nn.Linear(D1, D2)
layers.Shape
Input:
(*, *n, in_features)
, where*
are batch dimensions.Output:
(*, *n, out_features)
.
Usage
Let’s consider a Transformer-like model that outputs tensors of the shape
(batch_size, n_tokens, d_embedding)
(in terms of NLP,n_tokens
is the sequence length). The following example demonstrates how to train a separate linear transformation for each of then_tokens
embeddings usingNLinear
.>>> batch_size = 2 >>> n_tokens = 3 >>> d_embedding_in = 4 >>> d_embedding_out = 5 >>> x = torch.randn(batch_size, n_tokens, d_embedding_in) >>> x.shape torch.Size([2, 3, 4]) >>> m = NLinear(n_tokens, d_embedding_in, d_embedding_out) >>> m(x).shape torch.Size([2, 3, 5])
Similarly to
torch.nn.Linear
, the input can have any number of batch dimensions. The number of layersn
, in turn, can be also be arbitrary.>>> # Computer vision. >>> batch_size = (2, 3) >>> width = 4 >>> height = 5 >>> in_channels = 6 >>> out_channels = 7 >>> x = torch.randn(*batch_size, width, height, in_channels) >>> x.shape torch.Size([2, 3, 4, 5, 6]) >>> # The number of layers: width * heght = 4 * 5 = 20 >>> m = NLinear((width, height), in_channels, out_channels) >>> m(x).shape torch.Size([2, 3, 4, 5, 7])