Example of model selection using cross-validation with RecTools
CV split
Training a variety of models
Measuring a variety of metrics
[1]:
from pprint import pprint
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from implicit.nearest_neighbours import TFIDFRecommender, BM25Recommender
from implicit.als import AlternatingLeastSquares
from rectools import Columns
from rectools.dataset import Dataset
from rectools.metrics import Precision, Recall, MeanInvUserFreq, Serendipity, calc_metrics
from rectools.models import ImplicitItemKNNWrapperModel, RandomModel, PopularModel
from rectools.model_selection import TimeRangeSplitter, cross_validate
Load data
[2]:
%%time
!wget -q https://files.grouplens.org/datasets/movielens/ml-1m.zip -O ml-1m.zip
!unzip -o ml-1m.zip
!rm ml-1m.zip
Archive: ml-1m.zip
inflating: ml-1m/movies.dat
inflating: ml-1m/ratings.dat
inflating: ml-1m/README
inflating: ml-1m/users.dat
CPU times: user 53 ms, sys: 40.2 ms, total: 93.2 ms
Wall time: 3.15 s
[3]:
%%time
ratings = pd.read_csv(
"ml-1m/ratings.dat",
sep="::",
engine="python", # Because of 2-chars separators
header=None,
names=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],
)
print(ratings.shape)
ratings.head()
(1000209, 4)
CPU times: user 4.51 s, sys: 198 ms, total: 4.71 s
Wall time: 4.76 s
[3]:
user_id | item_id | weight | datetime | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
[4]:
ratings["user_id"].nunique(), ratings["item_id"].nunique()
[4]:
(6040, 3706)
[5]:
ratings["weight"].value_counts()
[5]:
4 348971
3 261197
5 226310
2 107557
1 56174
Name: weight, dtype: int64
[6]:
ratings["datetime"] = pd.to_datetime(ratings["datetime"] * 10 ** 9)
print("Time period")
ratings["datetime"].min(), ratings["datetime"].max()
Time period
[6]:
(Timestamp('2000-04-25 23:05:32'), Timestamp('2003-02-28 17:49:50'))
Create Dataset
class. It’s a wrapper for interactions. User and item features can also be added (see next examples for details).
[7]:
%%time
dataset = Dataset.construct(ratings)
CPU times: user 54.7 ms, sys: 15.5 ms, total: 70.3 ms
Wall time: 70.1 ms
Prepare cross-validation splitter
We’ll use last 3 periods of 2 weeks to validate our models.
[8]:
n_splits = 3
splitter = TimeRangeSplitter(
test_size="14D",
n_splits=n_splits,
filter_already_seen=True,
filter_cold_items=True,
filter_cold_users=True,
)
[9]:
splitter.get_test_fold_borders(dataset.interactions)
[9]:
[(Timestamp('2003-01-18 00:00:00', freq='14D'),
Timestamp('2003-02-01 00:00:00', freq='14D')),
(Timestamp('2003-02-01 00:00:00', freq='14D'),
Timestamp('2003-02-15 00:00:00', freq='14D')),
(Timestamp('2003-02-15 00:00:00', freq='14D'),
Timestamp('2003-03-01 00:00:00', freq='14D'))]
For test folds left border is always included in fold and the right one is excluded.
Train folds don’t have left border, and the right one is always excluded.
Train models
[10]:
# Take few simple models to compare
models = {
"random": RandomModel(random_state=42),
"popular": PopularModel(),
"most_raited": PopularModel(popularity="sum_weight"),
"tfidf_k=5": ImplicitItemKNNWrapperModel(model=TFIDFRecommender(K=5)),
"tfidf_k=10": ImplicitItemKNNWrapperModel(model=TFIDFRecommender(K=10)),
"bm25_k=10_k1=0.05_b=0.1": ImplicitItemKNNWrapperModel(model=BM25Recommender(K=5, K1=0.05, B=0.1)),
}
# We will calculate several classic (precision@k and recall@k) and "beyond accuracy" metrics
metrics = {
"prec@1": Precision(k=1),
"prec@10": Precision(k=10),
"recall": Recall(k=10),
"novelty": MeanInvUserFreq(k=10),
"serendipity": Serendipity(k=10),
}
K_RECS = 10
[11]:
%%time
# For each fold generate train and test part of dataset
# Then fit every model, generate recommendations and calculate metrics
cv_results = cross_validate(
dataset=dataset,
splitter=splitter,
models=models,
metrics=metrics,
k=K_RECS,
filter_viewed=True,
)
CPU times: user 14.2 s, sys: 714 ms, total: 14.9 s
Wall time: 14.9 s
We can get some split stats
[12]:
pd.DataFrame(cv_results["splits"])
[12]:
i_split | start | end | train | train_users | train_items | test | test_users | test_items | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2003-01-18 | 2003-02-01 | 998083 | 6040 | 3706 | 630 | 75 | 540 |
1 | 1 | 2003-02-01 | 2003-02-15 | 998713 | 6040 | 3706 | 899 | 57 | 704 |
2 | 2 | 2003-02-15 | 2003-03-01 | 999612 | 6040 | 3706 | 597 | 66 | 501 |
And the main result is metrics
[13]:
pd.DataFrame(cv_results["metrics"])
[13]:
model | i_split | prec@1 | prec@10 | recall | novelty | serendipity | |
---|---|---|---|---|---|---|---|
0 | random | 0 | 0.000000 | 0.000000 | 0.000000 | 6.539622 | 0.000000 |
1 | popular | 0 | 0.053333 | 0.024000 | 0.037410 | 1.580736 | 0.000123 |
2 | most_raited | 0 | 0.053333 | 0.026667 | 0.042251 | 1.592543 | 0.000151 |
3 | tfidf_k=5 | 0 | 0.053333 | 0.021333 | 0.023866 | 2.361189 | 0.000465 |
4 | tfidf_k=10 | 0 | 0.026667 | 0.021333 | 0.039926 | 2.137451 | 0.000327 |
5 | bm25_k=10_k1=0.05_b=0.1 | 0 | 0.026667 | 0.029333 | 0.046645 | 1.781881 | 0.000271 |
6 | random | 1 | 0.000000 | 0.001754 | 0.017544 | 6.489885 | 0.000054 |
7 | popular | 1 | 0.052632 | 0.057895 | 0.015707 | 1.588414 | 0.000183 |
8 | most_raited | 1 | 0.035088 | 0.056140 | 0.009919 | 1.600628 | 0.000155 |
9 | tfidf_k=5 | 1 | 0.052632 | 0.057895 | 0.048591 | 2.326116 | 0.002616 |
10 | tfidf_k=10 | 1 | 0.052632 | 0.052632 | 0.010033 | 2.143504 | 0.000917 |
11 | bm25_k=10_k1=0.05_b=0.1 | 1 | 0.070175 | 0.059649 | 0.010321 | 1.809416 | 0.000341 |
12 | random | 2 | 0.000000 | 0.004545 | 0.000956 | 6.535055 | 0.000608 |
13 | popular | 2 | 0.045455 | 0.042424 | 0.039984 | 1.656638 | 0.000450 |
14 | most_raited | 2 | 0.045455 | 0.039394 | 0.024521 | 1.668108 | 0.000332 |
15 | tfidf_k=5 | 2 | 0.090909 | 0.050000 | 0.039606 | 2.378988 | 0.001500 |
16 | tfidf_k=10 | 2 | 0.060606 | 0.051515 | 0.053531 | 2.206921 | 0.001346 |
17 | bm25_k=10_k1=0.05_b=0.1 | 2 | 0.090909 | 0.039394 | 0.038426 | 1.901316 | 0.000397 |
Let’s now aggregate metrics by folds and compare models
[14]:
pivot_results = (
pd.DataFrame(cv_results["metrics"])
.drop(columns="i_split")
.groupby(["model"], sort=False)
.agg(["mean", "std"])
)
mean_metric_subset = [(metric, "mean") for metric in pivot_results.columns.levels[0]]
(
pivot_results.style
.highlight_min(subset=mean_metric_subset, color='lightcoral', axis=0)
.highlight_max(subset=mean_metric_subset, color='lightgreen', axis=0)
)
[14]:
prec@1 | prec@10 | recall | novelty | serendipity | ||||||
---|---|---|---|---|---|---|---|---|---|---|
mean | std | mean | std | mean | std | mean | std | mean | std | |
model | ||||||||||
random | 0.000000 | 0.000000 | 0.002100 | 0.002292 | 0.006167 | 0.009865 | 6.521521 | 0.027493 | 0.000220 | 0.000336 |
popular | 0.050473 | 0.004360 | 0.041440 | 0.016969 | 0.031034 | 0.013335 | 1.608596 | 0.041782 | 0.000252 | 0.000174 |
most_raited | 0.044625 | 0.009151 | 0.040734 | 0.014782 | 0.025564 | 0.016192 | 1.620426 | 0.041491 | 0.000213 | 0.000103 |
tfidf_k=5 | 0.065625 | 0.021900 | 0.043076 | 0.019239 | 0.037354 | 0.012516 | 2.355431 | 0.026902 | 0.001527 | 0.001076 |
tfidf_k=10 | 0.046635 | 0.017747 | 0.041827 | 0.017757 | 0.034497 | 0.022252 | 2.162626 | 0.038480 | 0.000863 | 0.000512 |
bm25_k=10_k1=0.05_b=0.1 | 0.062584 | 0.032787 | 0.042792 | 0.015441 | 0.031797 | 0.019048 | 1.830871 | 0.062541 | 0.000337 | 0.000063 |