Model configs and saving examples

There are some common methods for RecTools models that allow running experiments from configs and simplify framework integration with experiment trackers (e.g. MlFlow). They include:

  • from_config

  • from_params

  • get_config

  • get_params

We also allow saving and loading models with methods:

  • save

  • load

For convenience we also have common functions that do not depend on specific model class or instance. They can be used with any rectools model: * model_from_config * model_from_params * load_model

In this example we will show basic usage for all of these methods and common functions as well as config examples for our models.

[1]:
from datetime import timedelta
import pandas as pd

from rectools.models import (
    SASRecModel,
    BERT4RecModel,
    ImplicitItemKNNWrapperModel,
    ImplicitALSWrapperModel,
    ImplicitBPRWrapperModel,
    EASEModel,
    PopularInCategoryModel,
    PopularModel,
    RandomModel,
    LightFMWrapperModel,
    PureSVDModel,
    model_from_config,
    load_model,
    model_from_params
)

Basic usage

from_config and model_from_config

from_config method allows model initialization from a dictionary of model hyper-params.

[2]:
config = {
    "popularity": "n_interactions",
    "period": timedelta(weeks=2),
}
model = PopularModel.from_config(config)

You can also use model_from_config function to initialise any rectools model.

[3]:
config = {
    "cls": "PopularModel",  # always specify "cls" for `model_from_config` function
    # "cls": "rectools.models.PopularModel",  # will work too
    "popularity": "n_interactions",
    "period": timedelta(weeks=2),
}
model = model_from_config(config)
model
[3]:
<rectools.models.popular.PopularModel at 0x7f3e981aca90>

get_config and get_params

get_config method returns a dictionary of model hyper-params. In contrast to the previous method, here you will get a full list of model parameters, even the ones that were not specified during model initialization but instead were set to their default values.

[4]:
model.get_config()
[4]:
{'cls': rectools.models.popular.PopularModel,
 'verbose': 0,
 'popularity': <Popularity.N_INTERACTIONS: 'n_interactions'>,
 'period': datetime.timedelta(days=14),
 'begin_from': None,
 'add_cold': False,
 'inverse': False}

You can directly use output of get_config method to create new model instances using from_config method. New instances will have exactly the same hyper-params as the source model.

[5]:
source_config = model.get_config()
new_model = PopularModel.from_config(source_config)

To get model config in json-compatible format pass simple_types=True. See how popularity parameter changes for the Popular model in the example below:

[6]:
model.get_config(simple_types=True)
[6]:
{'cls': 'PopularModel',
 'verbose': 0,
 'popularity': 'n_interactions',
 'period': {'days': 14},
 'begin_from': None,
 'add_cold': False,
 'inverse': False}

get_params method allows to get model hyper-parameters as a flat dictionary which is often more convenient for experiment trackers.

Don’t forget to pass simple_types=True to make the format json-compatible. Note that you can’t initialize a new model from the output of this method.

[7]:
model.get_params(simple_types=True)
[7]:
{'cls': 'PopularModel',
 'verbose': 0,
 'popularity': 'n_interactions',
 'period.days': 14,
 'begin_from': None,
 'add_cold': False,
 'inverse': False}

from_params and model_from_params

from_params model class methods and model_from_params function act exactly like from_config but always expect dict of model parameters in a “flat” form. “Flat-dict” form of configs is very useful for hyper-parameters search (e.g. with Optuna)

See example below:

[8]:
params = {
    "cls": "PopularModel",
    "popularity": "n_interactions",
    "period.days": 14,  # flat form with ``.`` as a separator
}
model = model_from_params(params)
model
[8]:
<rectools.models.popular.PopularModel at 0x7f3c4c9421f0>

save, load and load_model

save and load model methods do exactly what you would expect from their naming :) Fit model to dataset before saving. Weights will be loaded during load method.

[9]:
model.save("pop_model.pkl")
[9]:
220
[10]:
loaded = PopularModel.load("pop_model.pkl")
loaded
[10]:
<rectools.models.popular.PopularModel at 0x7f3c4c942700>

You can also use load_model function to load any rectools model.

[11]:
loaded = load_model("pop_model.pkl")
loaded
[11]:
<rectools.models.popular.PopularModel at 0x7f3c4c942c10>

Configs examples for all models

SASRec

[12]:
config = {
    "epochs": 2,
    "n_blocks": 1,
    "n_heads": 1,
    "n_factors": 64,
}

model = SASRecModel.from_config(config)
model.get_params(simple_types=True)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/data/home/dmtikhono1/RecTools/.venv/lib/python3.9/site-packages/pydantic/main.py:426: UserWarning: Pydantic serializer warnings:
  Expected `str` but got `tuple` with value `('rectools.models.nn.item...net.CatFeaturesItemNet')` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(
[12]:
{'cls': 'SASRecModel',
 'verbose': 0,
 'data_preparator_type': 'rectools.models.nn.transformers.sasrec.SASRecDataPreparator',
 'n_blocks': 1,
 'n_heads': 1,
 'n_factors': 64,
 'use_pos_emb': True,
 'use_causal_attn': True,
 'use_key_padding_mask': False,
 'dropout_rate': 0.2,
 'session_max_len': 100,
 'dataloader_num_workers': 0,
 'batch_size': 128,
 'loss': 'softmax',
 'n_negatives': 1,
 'gbce_t': 0.2,
 'lr': 0.001,
 'epochs': 2,
 'deterministic': False,
 'recommend_batch_size': 256,
 'recommend_torch_device': None,
 'train_min_user_interactions': 2,
 'item_net_block_types': ['rectools.models.nn.item_net.IdEmbeddingsItemNet',
  'rectools.models.nn.item_net.CatFeaturesItemNet'],
 'item_net_constructor_type': 'rectools.models.nn.item_net.SumOfEmbeddingsConstructor',
 'pos_encoding_type': 'rectools.models.nn.transformers.net_blocks.LearnableInversePositionalEncoding',
 'transformer_layers_type': 'rectools.models.nn.transformers.sasrec.SASRecTransformerLayers',
 'lightning_module_type': 'rectools.models.nn.transformers.lightning.TransformerLightningModule',
 'get_val_mask_func': None,
 'get_trainer_func': None,
 'data_preparator_kwargs': None,
 'transformer_layers_kwargs': None,
 'item_net_constructor_kwargs': None,
 'pos_encoding_kwargs': None,
 'lightning_module_kwargs': None}

Transformer models (SASRec and BERT4Rec) in RecTools may accept functions and classes as arguments. These types of arguments are fully compatible with RecTools configs. You can eigther pass them as python objects or as strings that define their import paths.

Below is an example of both approaches:

[13]:
def leave_one_out_mask(interactions: pd.DataFrame) -> pd.Series:
    rank = (
        interactions
        .sort_values(Columns.Datetime, ascending=False, kind="stable")
        .groupby(Columns.User, sort=False)
        .cumcount()
    )
    return rank == 0

config = {
    # function to get validation mask
    "get_val_mask_func": leave_one_out_mask,
    # path to transformer layers class
    "transformer_layers_type": "rectools.models.nn.transformers.sasrec.SASRecTransformerLayers",
}

model = SASRecModel.from_config(config)
model.get_params(simple_types=True)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[13]:
{'cls': 'SASRecModel',
 'verbose': 0,
 'data_preparator_type': 'rectools.models.nn.transformers.sasrec.SASRecDataPreparator',
 'n_blocks': 2,
 'n_heads': 4,
 'n_factors': 256,
 'use_pos_emb': True,
 'use_causal_attn': True,
 'use_key_padding_mask': False,
 'dropout_rate': 0.2,
 'session_max_len': 100,
 'dataloader_num_workers': 0,
 'batch_size': 128,
 'loss': 'softmax',
 'n_negatives': 1,
 'gbce_t': 0.2,
 'lr': 0.001,
 'epochs': 3,
 'deterministic': False,
 'recommend_batch_size': 256,
 'recommend_torch_device': None,
 'train_min_user_interactions': 2,
 'item_net_block_types': ['rectools.models.nn.item_net.IdEmbeddingsItemNet',
  'rectools.models.nn.item_net.CatFeaturesItemNet'],
 'item_net_constructor_type': 'rectools.models.nn.item_net.SumOfEmbeddingsConstructor',
 'pos_encoding_type': 'rectools.models.nn.transformers.net_blocks.LearnableInversePositionalEncoding',
 'transformer_layers_type': 'rectools.models.nn.transformers.sasrec.SASRecTransformerLayers',
 'lightning_module_type': 'rectools.models.nn.transformers.lightning.TransformerLightningModule',
 'get_val_mask_func': '__main__.leave_one_out_mask',
 'get_trainer_func': None,
 'data_preparator_kwargs': None,
 'transformer_layers_kwargs': None,
 'item_net_constructor_kwargs': None,
 'pos_encoding_kwargs': None,
 'lightning_module_kwargs': None}

BERT4Rec

[14]:
config = {
    "epochs": 2,
    "n_blocks": 1,
    "n_heads": 1,
    "n_factors": 64,
    "mask_prob": 0.2,
    "get_val_mask_func": leave_one_out_mask,  # function to get validation mask
    # path to transformer layers class
    "transformer_layers_type": "rectools.models.nn.transformers.base.PreLNTransformerLayers",
}

model = BERT4RecModel.from_config(config)
model.get_params(simple_types=True)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[14]:
{'cls': 'BERT4RecModel',
 'verbose': 0,
 'data_preparator_type': 'rectools.models.nn.transformers.bert4rec.BERT4RecDataPreparator',
 'n_blocks': 1,
 'n_heads': 1,
 'n_factors': 64,
 'use_pos_emb': True,
 'use_causal_attn': False,
 'use_key_padding_mask': True,
 'dropout_rate': 0.2,
 'session_max_len': 100,
 'dataloader_num_workers': 0,
 'batch_size': 128,
 'loss': 'softmax',
 'n_negatives': 1,
 'gbce_t': 0.2,
 'lr': 0.001,
 'epochs': 2,
 'deterministic': False,
 'recommend_batch_size': 256,
 'recommend_torch_device': None,
 'train_min_user_interactions': 2,
 'item_net_block_types': ['rectools.models.nn.item_net.IdEmbeddingsItemNet',
  'rectools.models.nn.item_net.CatFeaturesItemNet'],
 'item_net_constructor_type': 'rectools.models.nn.item_net.SumOfEmbeddingsConstructor',
 'pos_encoding_type': 'rectools.models.nn.transformers.net_blocks.LearnableInversePositionalEncoding',
 'transformer_layers_type': 'rectools.models.nn.transformers.net_blocks.PreLNTransformerLayers',
 'lightning_module_type': 'rectools.models.nn.transformers.lightning.TransformerLightningModule',
 'get_val_mask_func': '__main__.leave_one_out_mask',
 'get_trainer_func': None,
 'data_preparator_kwargs': None,
 'transformer_layers_kwargs': None,
 'item_net_constructor_kwargs': None,
 'pos_encoding_kwargs': None,
 'lightning_module_kwargs': None,
 'mask_prob': 0.2}

ItemKNN

ImplicitItemKNNWrapperModel is a wrapper.
Use “model” key in config to specify wrapped model class and params:

Specify which model you want to wrap under the “model.cls” key. Options are: - “TFIDFRecommender” - “CosineRecommender” - “BM25Recommender” - “ItemItemRecommender” - A path to a class (including any custom class) that can be imported. Like “implicit.nearest_neighbours.TFIDFRecommender”

Specify wrapped model hyper-params under the “model” dict relevant keys.

[15]:
config = {
    "model": {
        "cls": "TFIDFRecommender",  # or "implicit.nearest_neighbours.TFIDFRecommender"
        "K": 50,
        "num_threads": 1
    }
}

model = ImplicitItemKNNWrapperModel.from_config(config)
model.get_params(simple_types=True)
[15]:
{'cls': 'ImplicitItemKNNWrapperModel',
 'verbose': 0,
 'model.cls': 'TFIDFRecommender',
 'model.K': 50,
 'model.num_threads': 1}
[16]:
params = {  # flat form
    "model.cls": "TFIDFRecommender",
    "model.K": 50,
    "model.num_threads": 1,
}
model = ImplicitItemKNNWrapperModel.from_params(params)
model.get_params(simple_types=True)
[16]:
{'cls': 'ImplicitItemKNNWrapperModel',
 'verbose': 0,
 'model.cls': 'TFIDFRecommender',
 'model.K': 50,
 'model.num_threads': 1}

iALS

ImplicitALSWrapperModel is a wrapper.
Use “model” key in config to specify wrapped model class and params:

Specify which model you want to wrap under the “model.cls” key. Since there is only one default model, you can skip this step. “implicit.als.AlternatingLeastSquares” will be used by default. Also you can pass a path to a class (including any custom class) that can be imported.

Specify wrapped model hyper-params under the “model” dict relevant keys.

Specify wrapper hyper-params under relevant config keys.

[17]:
config = {
    "model": {
        # "cls": "AlternatingLeastSquares",  # will work too
        # "cls": "implicit.als.AlternatingLeastSquares",  # will work too
        "factors": 16,
        "num_threads": 2,
        "iterations": 2,
        "random_state": 32
    },
    "fit_features_together": True,
}

model = ImplicitALSWrapperModel.from_config(config)
model.get_params(simple_types=True)
[17]:
{'cls': 'ImplicitALSWrapperModel',
 'verbose': 0,
 'model.cls': 'AlternatingLeastSquares',
 'model.factors': 16,
 'model.regularization': 0.01,
 'model.alpha': 1.0,
 'model.dtype': 'float32',
 'model.use_gpu': True,
 'model.iterations': 2,
 'model.calculate_training_loss': False,
 'model.random_state': 32,
 'fit_features_together': True,
 'recommend_n_threads': None,
 'recommend_use_gpu_ranking': None}
[18]:
params = {  # flat form
    "model.factors": 16,
    "model.iterations": 2,
    "model.random_state": 32,
    "recommend_use_gpu_ranking": False,
}
model = ImplicitALSWrapperModel.from_params(params)
model.get_params(simple_types=True)
[18]:
{'cls': 'ImplicitALSWrapperModel',
 'verbose': 0,
 'model.cls': 'AlternatingLeastSquares',
 'model.factors': 16,
 'model.regularization': 0.01,
 'model.alpha': 1.0,
 'model.dtype': 'float32',
 'model.use_gpu': True,
 'model.iterations': 2,
 'model.calculate_training_loss': False,
 'model.random_state': 32,
 'fit_features_together': False,
 'recommend_n_threads': None,
 'recommend_use_gpu_ranking': False}

BPR-MF

ImplicitBPRWrapperModel is a wrapper.
Use “model” key in config to specify wrapped model class and params:

Specify which model you want to wrap un:nbsphinx-math:`der `the “model.cls” key. Since there is only one default model, you can skip this step. “implicit.bpr.BayesianPersonalizedRanking” will be used by default. Also you can pass a path to a class (including any custom class) that can be imported.

Specify wrapped model hyper-params under the “model” dict relevant keys.

Specify wrapper hyper-params under relevant config keys.

[19]:
config = {
    "model": {
        # "cls": "BayesianPersonalizedRanking",  # will work too
        # "cls": "implicit.bpr.BayesianPersonalizedRanking",  # will work too
        "factors": 16,
        "iterations": 2,
        "random_state": 32
    },
    "recommend_use_gpu_ranking": False,
}

model = ImplicitBPRWrapperModel.from_config(config)
model.get_params(simple_types=True)
[19]:
{'cls': 'ImplicitBPRWrapperModel',
 'verbose': 0,
 'model.cls': 'BayesianPersonalizedRanking',
 'model.factors': 16,
 'model.learning_rate': 0.01,
 'model.regularization': 0.01,
 'model.dtype': 'float64',
 'model.iterations': 2,
 'model.verify_negative_samples': True,
 'model.random_state': 32,
 'model.use_gpu': True,
 'recommend_n_threads': None,
 'recommend_use_gpu_ranking': False}
[20]:
params = {  # flat form
    "model.factors": 16,
    "model.iterations": 2,
    "model.random_state": 32,
    "recommend_use_gpu_ranking": False,
}
model = ImplicitBPRWrapperModel.from_params(params)
model.get_params(simple_types=True)
[20]:
{'cls': 'ImplicitBPRWrapperModel',
 'verbose': 0,
 'model.cls': 'BayesianPersonalizedRanking',
 'model.factors': 16,
 'model.learning_rate': 0.01,
 'model.regularization': 0.01,
 'model.dtype': 'float64',
 'model.iterations': 2,
 'model.verify_negative_samples': True,
 'model.random_state': 32,
 'model.use_gpu': True,
 'recommend_n_threads': None,
 'recommend_use_gpu_ranking': False}

EASE

[21]:
config = {
    "regularization": 100,
    "verbose": 1,
}

model = EASEModel.from_config(config)
model.get_params(simple_types=True)
[21]:
{'cls': 'EASEModel',
 'verbose': 1,
 'regularization': 100.0,
 'recommend_n_threads': 0,
 'recommend_use_gpu_ranking': True}

PureSVD

[22]:
config = {
    "factors": 32,
}

model = PureSVDModel.from_config(config)
model.get_params(simple_types=True)
[22]:
{'cls': 'PureSVDModel',
 'verbose': 0,
 'factors': 32,
 'tol': 0.0,
 'maxiter': None,
 'random_state': None,
 'use_gpu': False,
 'recommend_n_threads': 0,
 'recommend_use_gpu_ranking': True}

LightFM

LightFMWrapperModel is a wrapper.
Use “model” key in config to specify wrapped model class and params:

Specify which model you want to wrap under the “model.cls” key. Since there is only one default model, you can skip this step. “LightFM” will be used by default. Also you can pass a path to a class (including any custom class) that can be imported. Like “lightfm.lightfm.LightFM”

Specify wrapped model hyper-params under the “model” dict relevant keys.

Specify wrapper hyper-params under relevant config keys.

[23]:
config = {
    "model": {
        # "cls": "lightfm.lightfm.LightFM",  # will work too
        # "cls": "LightFM",  # will work too
        "no_components": 16,
        "learning_rate": 0.03,
        "random_state": 32,
        "loss": "warp"
    },
    "epochs": 2,
}

model = LightFMWrapperModel.from_config(config)
model.get_params(simple_types=True)
[23]:
{'cls': 'LightFMWrapperModel',
 'verbose': 0,
 'model.cls': 'LightFM',
 'model.no_components': 16,
 'model.k': 5,
 'model.n': 10,
 'model.learning_schedule': 'adagrad',
 'model.loss': 'warp',
 'model.learning_rate': 0.03,
 'model.rho': 0.95,
 'model.epsilon': 1e-06,
 'model.item_alpha': 0.0,
 'model.user_alpha': 0.0,
 'model.max_sampled': 10,
 'model.random_state': 32,
 'epochs': 2,
 'num_threads': 1,
 'recommend_n_threads': None,
 'recommend_use_gpu_ranking': True}
[24]:
params = {  # flat form
    "model.no_components": 16,
    "model.learning_rate": 0.03,
    "model.random_state": 32,
    "model.loss": "warp",
    "epochs": 2,
}

model = LightFMWrapperModel.from_params(params)
model.get_params(simple_types=True)
[24]:
{'cls': 'LightFMWrapperModel',
 'verbose': 0,
 'model.cls': 'LightFM',
 'model.no_components': 16,
 'model.k': 5,
 'model.n': 10,
 'model.learning_schedule': 'adagrad',
 'model.loss': 'warp',
 'model.learning_rate': 0.03,
 'model.rho': 0.95,
 'model.epsilon': 1e-06,
 'model.item_alpha': 0.0,
 'model.user_alpha': 0.0,
 'model.max_sampled': 10,
 'model.random_state': 32,
 'epochs': 2,
 'num_threads': 1,
 'recommend_n_threads': None,
 'recommend_use_gpu_ranking': True}

Random

[28]:
config = {
    "random_state": 32,
}

model = RandomModel.from_config(config)
model.get_params(simple_types=True)
[28]:
{'cls': 'RandomModel', 'verbose': 0, 'random_state': 32}