Dataset

class rectools.dataset.dataset.Dataset(user_id_map: IdMap, item_id_map: IdMap, interactions: Interactions, user_features: Optional[Union[DenseFeatures, SparseFeatures]] = None, item_features: Optional[Union[DenseFeatures, SparseFeatures]] = None)[source]

Bases: object

Container class for all data for a recommendation model.

It stores data about internal-external id mapping, user-item interactions, user and item features in special rectools structures for convenient future usage.

WARNING: It’s highly not recommended to create Dataset object directly. Use construct class method instead.

Parameters
Inherited-members

Methods

construct(interactions_df[, ...])

Class method for convenient Dataset creation.

get_hot_item_features()

Item features for hot items.

get_hot_user_features()

User features for hot users.

get_raw_interactions([include_weight, ...])

Return interactions as a pd.DataFrame object with replacing internal user and item ids to external ones.

get_user_item_matrix([include_weights, ...])

Construct user-item CSR matrix based on interactions attribute.

Attributes

user_id_map

item_id_map

interactions

user_features

item_features

n_hot_items

Return number of hot items in dataset.

n_hot_users

Return number of hot users in dataset.

classmethod construct(interactions_df: DataFrame, user_features_df: Optional[DataFrame] = None, cat_user_features: Iterable[str] = (), make_dense_user_features: bool = False, item_features_df: Optional[DataFrame] = None, cat_item_features: Iterable[str] = (), make_dense_item_features: bool = False) Dataset[source]

Class method for convenient Dataset creation.

Use it to create dataset from raw data.

Parameters
  • interactions_df (pd.DataFrame) –

    Table where every row contains user-item interaction and columns are:
    • Columns.User - user id;

    • Columns.Item - item id;

    • Columns.Weight - weight of interaction, float, use 1 if interactions have no weight;

    • Columns.Datetime - timestamp of interactions, assign random value if you’re not going to use it later.

  • user_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.

  • item_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.

  • cat_user_features (tp.Iterable[str], default ()) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag is False and user_features_df (item_features_df) is not None.

  • cat_item_features (tp.Iterable[str], default ()) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag is False and user_features_df (item_features_df) is not None.

  • make_dense_user_features (bool, default False) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is not None. - if False, SparseFeatures.from_flatten method will be used; - if True, DenseFeatures.from_dataframe method will be used.

  • make_dense_item_features (bool, default False) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is not None. - if False, SparseFeatures.from_flatten method will be used; - if True, DenseFeatures.from_dataframe method will be used.

Returns

Container with all input data, converted to rectools structures.

Return type

Dataset

get_hot_item_features() Optional[Union[DenseFeatures, SparseFeatures]][source]

Item features for hot items.

Return type

Optional[Union[DenseFeatures, SparseFeatures]]

get_hot_user_features() Optional[Union[DenseFeatures, SparseFeatures]][source]

User features for hot users.

Return type

Optional[Union[DenseFeatures, SparseFeatures]]

get_raw_interactions(include_weight: bool = True, include_datetime: bool = True) DataFrame[source]

Return interactions as a pd.DataFrame object with replacing internal user and item ids to external ones.

Parameters
  • include_weight (bool, default True) – Whether to include weight column into resulting table or not.

  • include_datetime (bool, default True) – Whether to include datetime column into resulting table or not.

Return type

pd.DataFrame

get_user_item_matrix(include_weights: bool = True, include_warm_users: bool = False, include_warm_items: bool = False) csr_matrix[source]

Construct user-item CSR matrix based on interactions attribute.

Return a resized user-item matrix. Resizing is done using user_id_map and item_id_map, hence if either a user or an item is not presented in interactions, but presented in id map, then it’s going to be in the returned matrix.

Parameters
  • include_weights (bool, default True) – Whether include interaction weights in matrix or not. If False, all values in returned matrix will be equal to 1.

  • include_warm (bool, default False) – Whether to include warm users and items into the matrix or not. Rows and columns for warm users and items will be added to the end of matrix, they will contain only zeros.

  • include_warm_users (bool) –

  • include_warm_items (bool) –

Returns

Resized user-item CSR matrix

Return type

csr_matrix

property n_hot_items: int

Return number of hot items in dataset. Items with internal ids from 0 to n_hot_items - 1 are hot (they are present in interactions). Items with internal ids from n_hot_items to dataset.item_id_map.size - 1 are warm (they aren’t present in interactions, but they have features).

property n_hot_users: int

Return number of hot users in dataset. Users with internal ids from 0 to n_hot_users - 1 are hot (they are present in interactions). Users with internal ids from n_hot_users to dataset.user_id_map.size - 1 are warm (they aren’t present in interactions, but they have features).