Dataset

class rectools.dataset.dataset.Dataset(user_id_map: IdMap, item_id_map: IdMap, interactions: Interactions, user_features: Optional[Union[DenseFeatures, SparseFeatures]] = None, item_features: Optional[Union[DenseFeatures, SparseFeatures]] = None)[source]

Bases: object

Container class for all data for a recommendation model.

It stores data about internal-external id mapping, user-item interactions, user and item features in special rectools structures for convenient future usage.

This is data class, so you can create it explicitly, but it’s recommended to use construct method.

Parameters
Inherited-members

Methods

construct(interactions_df[, ...])

Class method for convenient Dataset creation.

get_user_item_matrix([include_weights])

Construct user-item CSR matrix based on interactions attribute.

Attributes

user_id_map

item_id_map

interactions

user_features

item_features

classmethod construct(interactions_df: DataFrame, user_features_df: Optional[DataFrame] = None, cat_user_features: Iterable[str] = (), make_dense_user_features: bool = False, item_features_df: Optional[DataFrame] = None, cat_item_features: Iterable[str] = (), make_dense_item_features: bool = False) Dataset[source]

Class method for convenient Dataset creation.

Use it to create dataset from raw data.

Parameters
  • interactions_df (pd.DataFrame) –

    Table where every row contains user-item interaction and columns are:
    • Columns.User - user id;

    • Columns.Item - item id;

    • Columns.Weight - weight of interaction, float, use 1 if interactions have no weight;

    • Columns.Datetime - timestamp of interactions, assign random value if you’re not going to use it later.

  • user_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.

  • item_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.

  • cat_user_features (tp.Iterable[str], default ()) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag is False and user_features_df (item_features_df) is not None.

  • cat_item_features (tp.Iterable[str], default ()) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag is False and user_features_df (item_features_df) is not None.

  • make_dense_user_features (bool, default False) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is not None. - if False, SparseFeatures.from_flatten method will be used; - if True, DenseFeatures.from_dataframe method will be used.

  • make_dense_item_features (bool, default False) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is not None. - if False, SparseFeatures.from_flatten method will be used; - if True, DenseFeatures.from_dataframe method will be used.

Returns

Container with all input data, converted to rectools structures.

Return type

Dataset

get_user_item_matrix(include_weights: bool = True) csr_matrix[source]

Construct user-item CSR matrix based on interactions attribute.

Interactions.get_user_item_matrix is used, see its documentation for details.

Return a resized user-item matrix. Resizing is done using user_id_map and item_id_map, hence if either a user or an item is not presented in interactions, but presented in id map, then it’s going to be in the returned matrix.

Parameters

include_weights (bool, default True) – Whether include interaction weights in matrix or not. If False, all values in returned matrix will be equal to 1.

Returns

Resized user-item CSR matrix

Return type

csr_matrix