NLinear#

class delu.nn.NLinear[source]#

Bases: Module

N separate linear layers for N embeddings: (*, *N, D1) -> (*, *N, D2).

Usage examples covered below:

  • (NLP) Training a separate linear layer for each token embedding in a sequence. By contrast, using torch.nn.Linear would mean applying the same linear layer to all tokens.

  • (CV) Training a separate linear layer for each patch embedding in an image. By contrast, using torch.nn.Linear would mean applying the same linear layer to all tokens.

Technically, NLinear(N, D1, D2) is just a layout of N linear layers torch.nn.Linear(D1, D2).

Shape

  • Input: (*, *n, in_features), where * are batch dimensions.

  • Output: (*, *n, out_features).

Usage

(NLP) Training a separate linear layer for each of the token embeddings in a sequence:

>>> batch_size = 2
>>> sequence_length = 4
>>> d_embedding_in = 6
>>> d_embedding_out = 7
>>> x = torch.randn(batch_size, sequence_length, d_embedding_in)
>>> x.shape
torch.Size([2, 4, 6])
>>> m = NLinear(sequence_length, d_embedding_in, d_embedding_out)
>>> m(x).shape
torch.Size([2, 4, 7])

(CV) Training a separate linear layer for each of the patch embeddings in an image:

>>> # Batch dimensions can also be arbitrarily complex.
>>> batch_size = (2, 3)
>>> width = 4
>>> height = 5
>>> in_channels = 6
>>> out_channels = 7
>>> x = torch.randn(*batch_size, width, height, in_channels)
>>> x.shape
torch.Size([2, 3, 4, 5, 6])
>>> # N == width * heght == 4 * 5 == 20
>>> m = NLinear((width, height), in_channels, out_channels)
>>> m(x).shape
torch.Size([2, 3, 4, 5, 7])
__init__(
n: int | Tuple[int, ...],
in_features: int,
out_features: int,
bias: bool = True,
device=None,
dtype=None,
) None[source]#

All arguments are the same as in torch.nn.Linear except for n, which is the expected layout of the input (see the examples in NLinear).

forward(x: Tensor) Tensor[source]#

Do the forward pass.

reset_parameters()[source]#

Reset all parameters.