NLinear#
- class delu.nn.NLinear[source]#
Bases:
Module
N separate linear layers for N embeddings:
(*, *N, D1) -> (*, *N, D2)
.Usage examples covered below:
(NLP) Training a separate linear layer for each token embedding in a sequence. By contrast, using
torch.nn.Linear
would mean applying the same linear layer to all tokens.(CV) Training a separate linear layer for each patch embedding in an image. By contrast, using
torch.nn.Linear
would mean applying the same linear layer to all tokens.
Technically,
NLinear(N, D1, D2)
is just a layout ofN
linear layerstorch.nn.Linear(D1, D2)
.Shape
Input:
(*, *n, in_features)
, where*
are batch dimensions.Output:
(*, *n, out_features)
.
Usage
(NLP) Training a separate linear layer for each of the token embeddings in a sequence:
>>> batch_size = 2 >>> sequence_length = 4 >>> d_embedding_in = 6 >>> d_embedding_out = 7 >>> x = torch.randn(batch_size, sequence_length, d_embedding_in) >>> x.shape torch.Size([2, 4, 6]) >>> m = NLinear(sequence_length, d_embedding_in, d_embedding_out) >>> m(x).shape torch.Size([2, 4, 7])
(CV) Training a separate linear layer for each of the patch embeddings in an image:
>>> # Batch dimensions can also be arbitrarily complex. >>> batch_size = (2, 3) >>> width = 4 >>> height = 5 >>> in_channels = 6 >>> out_channels = 7 >>> x = torch.randn(*batch_size, width, height, in_channels) >>> x.shape torch.Size([2, 3, 4, 5, 6]) >>> # N == width * heght == 4 * 5 == 20 >>> m = NLinear((width, height), in_channels, out_channels) >>> m(x).shape torch.Size([2, 3, 4, 5, 7])