NLinear#
- class delu.nn.NLinear[source]#
Bases:
Module
N linear layers for N inputs:
(*, *N, D1) -> (*, *N, D2)
.Examples of use cases:
- NLP: apply a separate linear layer to each token embedding in a sequence
Batch:
(B, S, D)
(B
is the batch size,S
is the sequence length,D
is the embedding size)Module:
NLinear(S, D, D)
By contrast,
torch.nn.Linear(D, D)
would apply the same linear layer to all token embeddings.
- CV: apply a separate linear layer to each the patch embeddings of an image
Batch:
(B, W, H, C1)
(B
is the batch size,W
andH
are the image dimensions,C1
is the current number of channels)Module:
NLinear((W, H), C1, C2)
By contrast,
torch.nn.Linear(D, D)
would apply the same linear layer to all patch embeddings.
In other words,
NLinear(N, D1, D2)
is a collection ofmath.prod(N)
non-sharedtorch.nn.Linear(D1, D2)
layers.Shape
Input:
(*, *n, in_features)
, where*
are batch dimensions.Output:
(*, *n, out_features)
.
Usage
Let’s consider a Transformer-like model that outputs tensors of the shape
(batch_size, n_tokens, d_embedding)
(in terms of NLP,n_tokens
is the sequence length). The following example demonstrates how to train a separate linear transformation for each of then_tokens
embeddings usingNLinear
.>>> batch_size = 2 >>> n_tokens = 3 >>> d_embedding_in = 4 >>> d_embedding_out = 5 >>> x = torch.randn(batch_size, n_tokens, d_embedding_in) >>> x.shape torch.Size([2, 3, 4]) >>> m = NLinear(n_tokens, d_embedding_in, d_embedding_out) >>> m(x).shape torch.Size([2, 3, 5])
Similarly to
torch.nn.Linear
, the input can have any number of batch dimensions. The number of layersn
, in turn, can be also be arbitrary.>>> # Computer vision. >>> batch_size = (2, 3) >>> width = 4 >>> height = 5 >>> in_channels = 6 >>> out_channels = 7 >>> x = torch.randn(*batch_size, width, height, in_channels) >>> x.shape torch.Size([2, 3, 4, 5, 6]) >>> # The number of layers: width * heght = 4 * 5 = 20 >>> m = NLinear((width, height), in_channels, out_channels) >>> m(x).shape torch.Size([2, 3, 4, 5, 7])