Dataset
- class rectools.dataset.dataset.Dataset(user_id_map: IdMap, item_id_map: IdMap, interactions: Interactions, user_features: Optional[Union[DenseFeatures, SparseFeatures]] = None, item_features: Optional[Union[DenseFeatures, SparseFeatures]] = None)[source]
Bases:
objectContainer class for all data for a recommendation model.
It stores data about internal-external id mapping, user-item interactions, user and item features in special rectools structures for convenient future usage.
This is data class, so you can create it explicitly, but it’s recommended to use construct method.
- Parameters
user_id_map (IdMap) – User identifiers mapping.
item_id_map (IdMap) – Item identifiers mapping.
interactions (Interactions) – User-item interactions.
user_features (DenseFeatures or SparseFeatures, optional) – User explicit features.
item_features (DenseFeatures or SparseFeatures, optional) – Item explicit features.
- Inherited-members
Methods
construct(interactions_df[, ...])Class method for convenient Dataset creation.
get_user_item_matrix([include_weights])Construct user-item CSR matrix based on interactions attribute.
Attributes
user_id_mapitem_id_mapinteractionsuser_featuresitem_features- classmethod construct(interactions_df: DataFrame, user_features_df: Optional[DataFrame] = None, cat_user_features: Iterable[str] = (), make_dense_user_features: bool = False, item_features_df: Optional[DataFrame] = None, cat_item_features: Iterable[str] = (), make_dense_item_features: bool = False) Dataset[source]
Class method for convenient Dataset creation.
Use it to create dataset from raw data.
- Parameters
interactions_df (pd.DataFrame) –
- Table where every row contains user-item interaction and columns are:
Columns.User - user id;
Columns.Item - item id;
Columns.Weight - weight of interaction, float, use
1if interactions have no weight;Columns.Datetime - timestamp of interactions, assign random value if you’re not going to use it later.
user_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.
item_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.
cat_user_features (tp.Iterable[str], default
()) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag isFalseand user_features_df (item_features_df) is notNone.cat_item_features (tp.Iterable[str], default
()) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag isFalseand user_features_df (item_features_df) is notNone.make_dense_user_features (bool, default
False) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is notNone. - ifFalse, SparseFeatures.from_flatten method will be used; - ifTrue, DenseFeatures.from_dataframe method will be used.make_dense_item_features (bool, default
False) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is notNone. - ifFalse, SparseFeatures.from_flatten method will be used; - ifTrue, DenseFeatures.from_dataframe method will be used.
- Returns
Container with all input data, converted to rectools structures.
- Return type
- get_user_item_matrix(include_weights: bool = True) csr_matrix[source]
Construct user-item CSR matrix based on interactions attribute.
Interactions.get_user_item_matrix is used, see its documentation for details.
Return a resized user-item matrix. Resizing is done using user_id_map and item_id_map, hence if either a user or an item is not presented in interactions, but presented in id map, then it’s going to be in the returned matrix.
- Parameters
include_weights (bool, default
True) – Whether include interaction weights in matrix or not. If False, all values in returned matrix will be equal to1.- Returns
Resized user-item CSR matrix
- Return type
csr_matrix