TransformerDataPreparatorBase
- class rectools.models.nn.transformers.data_preparator.TransformerDataPreparatorBase(session_max_len: int, batch_size: int, dataloader_num_workers: int, train_min_user_interactions: int = 2, get_val_mask_func: Optional[Callable] = None, shuffle_train: bool = True, n_negatives: Optional[int] = None, negative_sampler: Optional[TransformerNegativeSamplerBase] = None, get_val_mask_func_kwargs: Optional[Dict[str, Any]] = None, extra_cols: Optional[List[str]] = None, add_unix_ts: bool = False, **kwargs: Any)[source]
Bases:
objectBase class for data preparator. To change train/recommend dataset processing, train/recommend dataloaders inherit from this class and pass your custom data preparator to your model parameters.
- Parameters
session_max_len (int) – Maximum length of user sequence.
batch_size (int) – How many samples per batch to load.
dataloader_num_workers (int) – Number of loader worker processes.
item_extra_tokens (Sequence(Hashable)) – Which element to use for sequence padding.
shuffle_train (bool, default True) – If
True, reshuffles data at each epoch.train_min_user_interactions (int, default 2) – Minimum length of user sequence. Cannot be less than 2.
get_val_mask_func (Callable, default None) – Function to get validation mask.
n_negatives (optional(int), default
None) – Number of negatives for BCE, gBCE and sampled_softmax losses.negative_sampler (optional(TransformerNegativeSamplerBase), default
None) – Negative sampler.get_val_mask_func_kwargs (optional(InitKwargs), default
None) – Additional keyword arguments for the get_val_mask_func. Make sure all dict values have JSON serializable types.add_unix_ts (bool, default
False) – Add extra columnunix_tscontains Column.Datetime converted to seconds from the beginning of the epochextra_cols (optional(List[str]), default
None) – Extra columns to keep in train and recommend datasets.kwargs (Any) –
- Inherited-members
Methods
get_dataloader_recommend(dataset, batch_size)Construct recommend dataloader from processed dataset.
Construct train dataloader from processed dataset.
Construct validation dataloader from processed dataset.
Return external item ids from processed dataset in sorted order.
Return internal item ids from processed dataset in sorted order.
process_dataset_train(dataset)Process train dataset and save data.
transform_dataset_i2i(dataset)Process dataset for i2i recommendations.
transform_dataset_u2i(dataset, users[, context])Process dataset for u2i recommendations.
Attributes
item_extra_tokensReturn number of padding elements
train_session_max_len_addition- get_dataloader_recommend(dataset: Dataset, batch_size: int) DataLoader[source]
Construct recommend dataloader from processed dataset.
- Returns
Recommend dataloader.
- Return type
DataLoader
- Parameters
dataset (Dataset) –
batch_size (int) –
- get_dataloader_train() DataLoader[source]
Construct train dataloader from processed dataset.
- Returns
Train dataloader.
- Return type
DataLoader
- get_dataloader_val() Optional[DataLoader][source]
Construct validation dataloader from processed dataset.
- Returns
Validation dataloader.
- Return type
Optional(DataLoader)
- get_known_item_ids() ndarray[source]
Return external item ids from processed dataset in sorted order.
- Return type
ndarray
- get_known_items_sorted_internal_ids() ndarray[source]
Return internal item ids from processed dataset in sorted order.
- Return type
ndarray
- property n_item_extra_tokens: int
Return number of padding elements
- process_dataset_train(dataset: Dataset) None[source]
Process train dataset and save data.
- Parameters
dataset (Dataset) –
- Return type
None
- transform_dataset_i2i(dataset: Dataset) Dataset[source]
Process dataset for i2i recommendations. Filter out interactions and adapt id maps.
- transform_dataset_u2i(dataset: Dataset, users: Union[Sequence[Hashable], ndarray], context: Optional[DataFrame] = None) Dataset[source]
Process dataset for u2i recommendations. Filter out interactions and adapt id maps. All users beyond target users for recommendations are dropped. All target users that do not have at least one known item in interactions are dropped.
- Parameters
dataset (Dataset) – RecTools dataset.
users (ExternalIds) – Array of external user ids to recommend for.
context (pd.DataFrame, optional, default
None) – Optional DataFrame containing additional user context information (e.g., session features,demographics). –
- Returns
Processed RecTools dataset. Final dataset will consist only of model known items during fit and only of required (and supported) target users for recommendations. Final user_id_map is an enumerated list of supported (filtered) target users. Final item_id_map is model item_id_map constructed during training.
- Return type