hstu
Classes
|
HSTU model: transformer-based sequential model with unidirectional pointwise aggregated attention mechanism, combined with "Shifted Sequence" training objective. |
|
HSTU model config. |
|
Computes relative time and positional attention biases for STU. |
|
HSTU author's encoder block architecture rewritten from jagged tensor to dense. |
|
STULayers transformer blocks. |