STULayer

class rectools.models.nn.transformers.hstu.STULayer(n_factors: int, n_heads: int, linear_hidden_dim: int, attention_dim: int, session_max_len: int, relative_time_attention: bool, relative_pos_attention: bool, attn_dropout_rate: float, dropout_rate: float, epsilon: float)[source]

Bases: Module

HSTU author’s encoder block architecture rewritten from jagged tensor to dense.

Parameters
  • n_factors (int) – Latent embeddings size.

  • n_heads (int) – Number of attention heads.

  • linear_hidden_dim (int) – U, V size.

  • attention_dim (int) – Q, K size.

  • session_max_len (int) – Maximum length of user sequence padded or truncated to.

  • relative_time_attention (bool) – Whether to use relative time attention.

  • relative_pos_attention (bool) – Whether to use relative positional attention

  • attn_dropout_rate (float) – Probability of an attention unit to be zeroed.

  • dropout_rate (float) – Probability of a hidden unit to be zeroed.

  • epsilon (float) – A value passed to LayerNorm for numerical stability.

Methods

forward(seqs, batch, attn_mask, ...)

Forward pass through STU.

Attributes

forward(seqs: Tensor, batch: Dict[str, Tensor], attn_mask: Tensor, timeline_mask: Tensor, key_padding_mask: Optional[Tensor]) Tensor[source]

Forward pass through STU.

Parameters
  • seqs (torch.Tensor) – User sequences of item embeddings.

  • batch (torch.Tensor) – Could contain payload information, in particular sequence timestamps.

  • attn_mask (torch.Tensor) – Mask to use in forward pass of multi-head attention as attn_mask.

  • timeline_mask (torch.Tensor) – Mask marked padding items.

  • key_padding_mask (torch.Tensor, optional) – Optional mask to use in forward pass of multi-head attention as key_padding_mask.

Returns

User sequences passed through transformer layers.

Return type

torch.Tensor