TransformerDataPreparatorBase

class rectools.models.nn.transformers.data_preparator.TransformerDataPreparatorBase(session_max_len: int, batch_size: int, dataloader_num_workers: int, train_min_user_interactions: int = 2, get_val_mask_func: Optional[Callable] = None, shuffle_train: bool = True, n_negatives: Optional[int] = None, negative_sampler: Optional[TransformerNegativeSamplerBase] = None, get_val_mask_func_kwargs: Optional[Dict[str, Any]] = None, extra_cols: Optional[List[str]] = None, add_unix_ts: bool = False, **kwargs: Any)[source]

Bases: object

Base class for data preparator. To change train/recommend dataset processing, train/recommend dataloaders inherit from this class and pass your custom data preparator to your model parameters.

Parameters
  • session_max_len (int) – Maximum length of user sequence.

  • batch_size (int) – How many samples per batch to load.

  • dataloader_num_workers (int) – Number of loader worker processes.

  • item_extra_tokens (Sequence(Hashable)) – Which element to use for sequence padding.

  • shuffle_train (bool, default True) – If True, reshuffles data at each epoch.

  • train_min_user_interactions (int, default 2) – Minimum length of user sequence. Cannot be less than 2.

  • get_val_mask_func (Callable, default None) – Function to get validation mask.

  • n_negatives (optional(int), default None) – Number of negatives for BCE, gBCE and sampled_softmax losses.

  • negative_sampler (optional(TransformerNegativeSamplerBase), default None) – Negative sampler.

  • get_val_mask_func_kwargs (optional(InitKwargs), default None) – Additional keyword arguments for the get_val_mask_func. Make sure all dict values have JSON serializable types.

  • add_unix_ts (bool, default False) – Add extra column unix_ts contains Column.Datetime converted to seconds from the beginning of the epoch

  • extra_cols (optional(List[str]), default None) – Extra columns to keep in train and recommend datasets.

  • kwargs (Any) –

Inherited-members

Methods

get_dataloader_recommend(dataset, batch_size)

Construct recommend dataloader from processed dataset.

get_dataloader_train()

Construct train dataloader from processed dataset.

get_dataloader_val()

Construct validation dataloader from processed dataset.

get_known_item_ids()

Return external item ids from processed dataset in sorted order.

get_known_items_sorted_internal_ids()

Return internal item ids from processed dataset in sorted order.

process_dataset_train(dataset)

Process train dataset and save data.

transform_dataset_i2i(dataset)

Process dataset for i2i recommendations.

transform_dataset_u2i(dataset, users[, context])

Process dataset for u2i recommendations.

Attributes

item_extra_tokens

n_item_extra_tokens

Return number of padding elements

train_session_max_len_addition

get_dataloader_recommend(dataset: Dataset, batch_size: int) DataLoader[source]

Construct recommend dataloader from processed dataset.

Returns

Recommend dataloader.

Return type

DataLoader

Parameters
  • dataset (Dataset) –

  • batch_size (int) –

get_dataloader_train() DataLoader[source]

Construct train dataloader from processed dataset.

Returns

Train dataloader.

Return type

DataLoader

get_dataloader_val() Optional[DataLoader][source]

Construct validation dataloader from processed dataset.

Returns

Validation dataloader.

Return type

Optional(DataLoader)

get_known_item_ids() ndarray[source]

Return external item ids from processed dataset in sorted order.

Return type

ndarray

get_known_items_sorted_internal_ids() ndarray[source]

Return internal item ids from processed dataset in sorted order.

Return type

ndarray

property n_item_extra_tokens: int

Return number of padding elements

process_dataset_train(dataset: Dataset) None[source]

Process train dataset and save data.

Parameters

dataset (Dataset) –

Return type

None

transform_dataset_i2i(dataset: Dataset) Dataset[source]

Process dataset for i2i recommendations. Filter out interactions and adapt id maps.

Parameters

dataset (Dataset) – RecTools dataset.

Returns

Processed RecTools dataset. Final dataset will consist only of model known items during fit. Final user_id_map is the same as dataset original. Final item_id_map is model item_id_map constructed during training.

Return type

Dataset

transform_dataset_u2i(dataset: Dataset, users: Union[Sequence[Hashable], ndarray], context: Optional[DataFrame] = None) Dataset[source]

Process dataset for u2i recommendations. Filter out interactions and adapt id maps. All users beyond target users for recommendations are dropped. All target users that do not have at least one known item in interactions are dropped.

Parameters
  • dataset (Dataset) – RecTools dataset.

  • users (ExternalIds) – Array of external user ids to recommend for.

  • context (pd.DataFrame, optional, default None) – Optional DataFrame containing additional user context information (e.g., session features,

  • demographics).

Returns

Processed RecTools dataset. Final dataset will consist only of model known items during fit and only of required (and supported) target users for recommendations. Final user_id_map is an enumerated list of supported (filtered) target users. Final item_id_map is model item_id_map constructed during training.

Return type

Dataset