NLinear#

class delu.nn.NLinear[source]#

Bases: Module

N linear layers for N inputs: (*, *N, D1) -> (*, *N, D2).

Examples of use cases:

NLP: apply a separate linear layer to each token embedding in a sequence
- Batch: (B, S, D) (B is the batch size, S is the sequence length, D is the embedding size)
- Module: NLinear(S, D, D)
- By contrast, torch.nn.Linear(D, D) would apply the same linear layer to all token embeddings.
CV: apply a separate linear layer to each the patch embeddings of an image
- Batch: (B, W, H, C1) (B is the batch size, W and H are the image dimensions, C1 is the current number of channels)
- Module: NLinear((W, H), C1, C2)
- By contrast, torch.nn.Linear(D, D) would apply the same linear layer to all patch embeddings.

In other words, NLinear(N, D1, D2) is a collection of math.prod(N) non-shared torch.nn.Linear(D1, D2) layers.

Shape

Input: (*, *n, in_features), where * are batch dimensions.
Output: (*, *n, out_features).

Usage

Let’s consider a Transformer-like model that outputs tensors of the shape (batch_size, n_tokens, d_embedding) (in terms of NLP, n_tokens is the sequence length). The following example demonstrates how to train a separate linear transformation for each of the n_tokens embeddings using NLinear.

>>> batch_size = 2
>>> n_tokens = 3
>>> d_embedding_in = 4
>>> d_embedding_out = 5
>>> x = torch.randn(batch_size, n_tokens, d_embedding_in)
>>> x.shape
torch.Size([2, 3, 4])
>>> m = NLinear(n_tokens, d_embedding_in, d_embedding_out)
>>> m(x).shape
torch.Size([2, 3, 5])

Similarly to torch.nn.Linear, the input can have any number of batch dimensions. The number of layers n, in turn, can be also be arbitrary.

>>> # Computer vision.
>>> batch_size = (2, 3)
>>> width = 4
>>> height = 5
>>> in_channels = 6
>>> out_channels = 7
>>> x = torch.randn(*batch_size, width, height, in_channels)
>>> x.shape
torch.Size([2, 3, 4, 5, 6])
>>> # The number of layers: width * heght = 4 * 5 = 20
>>> m = NLinear((width, height), in_channels, out_channels)
>>> m(x).shape
torch.Size([2, 3, 4, 5, 7])

__init__( n: int | Tuple[int, ...], in_features: int, out_features: int, bias: bool = True, device=None, dtype=None, ) → None[source]#: All arguments are the same as in torch.nn.Linear except for n, which is the expected layout of the input (see the examples in NLinear).

forward(x: Tensor) → Tensor[source]#: Do the forward pass.

reset_parameters()[source]#: Reset all parameters.