STULayer

class rectools.models.nn.transformers.hstu.STULayer(n_factors: int, n_heads: int, linear_hidden_dim: int, attention_dim: int, session_max_len: int, relative_time_attention: bool, relative_pos_attention: bool, attn_dropout_rate: float, dropout_rate: float, epsilon: float)[source]

Bases: Module

HSTU author’s encoder block architecture rewritten from jagged tensor to dense.

Parameters

n_factors (int) – Latent embeddings size.
n_heads (int) – Number of attention heads.
linear_hidden_dim (int) – U, V size.
attention_dim (int) – Q, K size.
session_max_len (int) – Maximum length of user sequence padded or truncated to.
relative_time_attention (bool) – Whether to use relative time attention.
relative_pos_attention (bool) – Whether to use relative positional attention
attn_dropout_rate (float) – Probability of an attention unit to be zeroed.
dropout_rate (float) – Probability of a hidden unit to be zeroed.
epsilon (float) – A value passed to LayerNorm for numerical stability.

Methods

forward(seqs, batch, attn_mask, ...)

Forward pass through STU.

Attributes

forward(seqs: Tensor, batch: Dict[str, Tensor], attn_mask: Tensor, timeline_mask: Tensor, key_padding_mask: Optional[Tensor]) → Tensor[source]

Forward pass through STU.

Parameters

seqs (torch.Tensor) – User sequences of item embeddings.
batch (torch.Tensor) – Could contain payload information, in particular sequence timestamps.
attn_mask (torch.Tensor) – Mask to use in forward pass of multi-head attention as attn_mask.
timeline_mask (torch.Tensor) – Mask marked padding items.
key_padding_mask (torch.Tensor, optional) – Optional mask to use in forward pass of multi-head attention as key_padding_mask.

Returns

User sequences passed through transformer layers.

Return type

torch.Tensor