Dataset
- class rectools.dataset.dataset.Dataset(user_id_map: IdMap, item_id_map: IdMap, interactions: Interactions, user_features: Optional[Union[DenseFeatures, SparseFeatures]] = None, item_features: Optional[Union[DenseFeatures, SparseFeatures]] = None)[source]
Bases:
object
Container class for all data for a recommendation model.
It stores data about internal-external id mapping, user-item interactions, user and item features in special rectools structures for convenient future usage.
WARNING: It’s highly not recommended to create Dataset object directly. Use construct class method instead.
- Parameters
user_id_map (IdMap) – User identifiers mapping.
item_id_map (IdMap) – Item identifiers mapping.
interactions (Interactions) – User-item interactions.
user_features (DenseFeatures or SparseFeatures, optional) – User explicit features.
item_features (DenseFeatures or SparseFeatures, optional) – Item explicit features.
- Inherited-members
Methods
construct
(interactions_df[, ...])Class method for convenient Dataset creation.
Item features for hot items.
User features for hot users.
get_raw_interactions
([include_weight, ...])Return interactions as a pd.DataFrame object with replacing internal user and item ids to external ones.
get_user_item_matrix
([include_weights, ...])Construct user-item CSR matrix based on interactions attribute.
Attributes
user_id_map
item_id_map
interactions
user_features
item_features
Return number of hot items in dataset.
Return number of hot users in dataset.
- classmethod construct(interactions_df: DataFrame, user_features_df: Optional[DataFrame] = None, cat_user_features: Iterable[str] = (), make_dense_user_features: bool = False, item_features_df: Optional[DataFrame] = None, cat_item_features: Iterable[str] = (), make_dense_item_features: bool = False) Dataset [source]
Class method for convenient Dataset creation.
Use it to create dataset from raw data.
- Parameters
interactions_df (pd.DataFrame) –
- Table where every row contains user-item interaction and columns are:
Columns.User - user id;
Columns.Item - item id;
Columns.Weight - weight of interaction, float, use
1
if interactions have no weight;Columns.Datetime - timestamp of interactions, assign random value if you’re not going to use it later.
user_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.
item_features_df (pd.DataFrame, optional) – User (item) explicit features table. It will be used to create SparseFeatures using from_flatten class method or DenseFeatures using from_dataframe class method depending on make_dense_user_features (make_dense_item_features) flag. See detailed info about the table structure in these methods description.
cat_user_features (tp.Iterable[str], default
()
) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag isFalse
and user_features_df (item_features_df) is notNone
.cat_item_features (tp.Iterable[str], default
()
) – List of categorical user (item) feature names for SparseFeatures.from_flatten method. Used only if make_dense_user_features (make_dense_item_features) flag isFalse
and user_features_df (item_features_df) is notNone
.make_dense_user_features (bool, default
False
) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is notNone
. - ifFalse
, SparseFeatures.from_flatten method will be used; - ifTrue
, DenseFeatures.from_dataframe method will be used.make_dense_item_features (bool, default
False
) – Create user (item) features as dense or sparse. Used only if user_features_df (item_features_df) is notNone
. - ifFalse
, SparseFeatures.from_flatten method will be used; - ifTrue
, DenseFeatures.from_dataframe method will be used.
- Returns
Container with all input data, converted to rectools structures.
- Return type
- get_hot_item_features() Optional[Union[DenseFeatures, SparseFeatures]] [source]
Item features for hot items.
- Return type
Optional[Union[DenseFeatures, SparseFeatures]]
- get_hot_user_features() Optional[Union[DenseFeatures, SparseFeatures]] [source]
User features for hot users.
- Return type
Optional[Union[DenseFeatures, SparseFeatures]]
- get_raw_interactions(include_weight: bool = True, include_datetime: bool = True) DataFrame [source]
Return interactions as a pd.DataFrame object with replacing internal user and item ids to external ones.
- Parameters
include_weight (bool, default
True
) – Whether to include weight column into resulting table or not.include_datetime (bool, default
True
) – Whether to include datetime column into resulting table or not.
- Return type
pd.DataFrame
- get_user_item_matrix(include_weights: bool = True, include_warm_users: bool = False, include_warm_items: bool = False) csr_matrix [source]
Construct user-item CSR matrix based on interactions attribute.
Return a resized user-item matrix. Resizing is done using user_id_map and item_id_map, hence if either a user or an item is not presented in interactions, but presented in id map, then it’s going to be in the returned matrix.
- Parameters
include_weights (bool, default
True
) – Whether include interaction weights in matrix or not. If False, all values in returned matrix will be equal to1
.include_warm (bool, default
False
) – Whether to include warm users and items into the matrix or not. Rows and columns for warm users and items will be added to the end of matrix, they will contain only zeros.include_warm_users (bool) –
include_warm_items (bool) –
- Returns
Resized user-item CSR matrix
- Return type
csr_matrix
- property n_hot_items: int
Return number of hot items in dataset. Items with internal ids from 0 to n_hot_items - 1 are hot (they are present in interactions). Items with internal ids from n_hot_items to dataset.item_id_map.size - 1 are warm (they aren’t present in interactions, but they have features).
- property n_hot_users: int
Return number of hot users in dataset. Users with internal ids from 0 to n_hot_users - 1 are hot (they are present in interactions). Users with internal ids from n_hot_users to dataset.user_id_map.size - 1 are warm (they aren’t present in interactions, but they have features).