Example of model selection using cross-validation with RecTools

CV split
Training a variety of models
Measuring a variety of metrics

[1]:

from pprint import pprint

import numpy as np
import pandas as pd

from tqdm.auto import tqdm

from implicit.nearest_neighbours import TFIDFRecommender, BM25Recommender
from implicit.als import AlternatingLeastSquares

from rectools import Columns
from rectools.dataset import Dataset
from rectools.metrics import Precision, Recall, MeanInvUserFreq, Serendipity, calc_metrics
from rectools.models import ImplicitItemKNNWrapperModel, RandomModel, PopularModel
from rectools.model_selection import TimeRangeSplitter, cross_validate

Load data

[2]:

%%time
!wget -q https://files.grouplens.org/datasets/movielens/ml-1m.zip -O ml-1m.zip
!unzip -o ml-1m.zip
!rm ml-1m.zip

Archive:  ml-1m.zip
  inflating: ml-1m/movies.dat
  inflating: ml-1m/ratings.dat
  inflating: ml-1m/README
  inflating: ml-1m/users.dat
CPU times: user 53 ms, sys: 40.2 ms, total: 93.2 ms
Wall time: 3.15 s

[3]:

%%time
ratings = pd.read_csv(
    "ml-1m/ratings.dat",
    sep="::",
    engine="python",  # Because of 2-chars separators
    header=None,
    names=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],
)
print(ratings.shape)
ratings.head()

(1000209, 4)
CPU times: user 4.51 s, sys: 198 ms, total: 4.71 s
Wall time: 4.76 s

[3]:

	user_id	item_id	weight	datetime
0	1	1193	5	978300760
1	1	661	3	978302109
2	1	914	3	978301968
3	1	3408	4	978300275
4	1	2355	5	978824291

[4]:

ratings["user_id"].nunique(), ratings["item_id"].nunique()

[4]:

(6040, 3706)

[5]:

ratings["weight"].value_counts()

[5]:

4    348971
3    261197
5    226310
2    107557
1     56174
Name: weight, dtype: int64

[6]:

ratings["datetime"] = pd.to_datetime(ratings["datetime"] * 10 ** 9)
print("Time period")
ratings["datetime"].min(), ratings["datetime"].max()

Time period

[6]:

(Timestamp('2000-04-25 23:05:32'), Timestamp('2003-02-28 17:49:50'))

Create Dataset class. It’s a wrapper for interactions. User and item features can also be added (see next examples for details).

[7]:

%%time
dataset = Dataset.construct(ratings)

CPU times: user 54.7 ms, sys: 15.5 ms, total: 70.3 ms
Wall time: 70.1 ms

Prepare cross-validation splitter

We’ll use last 3 periods of 2 weeks to validate our models.

[8]:

n_splits = 3

splitter = TimeRangeSplitter(
    test_size="14D",
    n_splits=n_splits,
    filter_already_seen=True,
    filter_cold_items=True,
    filter_cold_users=True,
)

[9]:

splitter.get_test_fold_borders(dataset.interactions)

[9]:

[(Timestamp('2003-01-18 00:00:00', freq='14D'),
  Timestamp('2003-02-01 00:00:00', freq='14D')),
 (Timestamp('2003-02-01 00:00:00', freq='14D'),
  Timestamp('2003-02-15 00:00:00', freq='14D')),
 (Timestamp('2003-02-15 00:00:00', freq='14D'),
  Timestamp('2003-03-01 00:00:00', freq='14D'))]

For test folds left border is always included in fold and the right one is excluded.

Train folds don’t have left border, and the right one is always excluded.

Train models

[10]:

# Take few simple models to compare
models = {
    "random": RandomModel(random_state=42),
    "popular": PopularModel(),
    "most_raited": PopularModel(popularity="sum_weight"),
    "tfidf_k=5": ImplicitItemKNNWrapperModel(model=TFIDFRecommender(K=5)),
    "tfidf_k=10": ImplicitItemKNNWrapperModel(model=TFIDFRecommender(K=10)),
    "bm25_k=10_k1=0.05_b=0.1": ImplicitItemKNNWrapperModel(model=BM25Recommender(K=5, K1=0.05, B=0.1)),
}

# We will calculate several classic (precision@k and recall@k) and "beyond accuracy" metrics
metrics = {
    "prec@1": Precision(k=1),
    "prec@10": Precision(k=10),
    "recall": Recall(k=10),
    "novelty": MeanInvUserFreq(k=10),
    "serendipity": Serendipity(k=10),
}

K_RECS = 10

[11]:

%%time

# For each fold generate train and test part of dataset
# Then fit every model, generate recommendations and calculate metrics

cv_results = cross_validate(
    dataset=dataset,
    splitter=splitter,
    models=models,
    metrics=metrics,
    k=K_RECS,
    filter_viewed=True,
)

CPU times: user 14.2 s, sys: 714 ms, total: 14.9 s
Wall time: 14.9 s

We can get some split stats

[12]:

pd.DataFrame(cv_results["splits"])

[12]:

	i_split	start	end	train	train_users	train_items	test	test_users	test_items
0	0	2003-01-18	2003-02-01	998083	6040	3706	630	75	540
1	1	2003-02-01	2003-02-15	998713	6040	3706	899	57	704
2	2	2003-02-15	2003-03-01	999612	6040	3706	597	66	501

And the main result is metrics

[13]:

pd.DataFrame(cv_results["metrics"])

[13]:

	model	i_split	prec@1	prec@10	recall	novelty	serendipity
0	random	0	0.000000	0.000000	0.000000	6.539622	0.000000
1	popular	0	0.053333	0.024000	0.037410	1.580736	0.000123
2	most_raited	0	0.053333	0.026667	0.042251	1.592543	0.000151
3	tfidf_k=5	0	0.053333	0.021333	0.023866	2.361189	0.000465
4	tfidf_k=10	0	0.026667	0.021333	0.039926	2.137451	0.000327
5	bm25_k=10_k1=0.05_b=0.1	0	0.026667	0.029333	0.046645	1.781881	0.000271
6	random	1	0.000000	0.001754	0.017544	6.489885	0.000054
7	popular	1	0.052632	0.057895	0.015707	1.588414	0.000183
8	most_raited	1	0.035088	0.056140	0.009919	1.600628	0.000155
9	tfidf_k=5	1	0.052632	0.057895	0.048591	2.326116	0.002616
10	tfidf_k=10	1	0.052632	0.052632	0.010033	2.143504	0.000917
11	bm25_k=10_k1=0.05_b=0.1	1	0.070175	0.059649	0.010321	1.809416	0.000341
12	random	2	0.000000	0.004545	0.000956	6.535055	0.000608
13	popular	2	0.045455	0.042424	0.039984	1.656638	0.000450
14	most_raited	2	0.045455	0.039394	0.024521	1.668108	0.000332
15	tfidf_k=5	2	0.090909	0.050000	0.039606	2.378988	0.001500
16	tfidf_k=10	2	0.060606	0.051515	0.053531	2.206921	0.001346
17	bm25_k=10_k1=0.05_b=0.1	2	0.090909	0.039394	0.038426	1.901316	0.000397

Let’s now aggregate metrics by folds and compare models

[14]:

pivot_results = (
    pd.DataFrame(cv_results["metrics"])
    .drop(columns="i_split")
    .groupby(["model"], sort=False)
    .agg(["mean", "std"])
)
mean_metric_subset = [(metric, "mean") for metric in pivot_results.columns.levels[0]]
(
    pivot_results.style
    .highlight_min(subset=mean_metric_subset, color='lightcoral', axis=0)
    .highlight_max(subset=mean_metric_subset, color='lightgreen', axis=0)
)

[14]:

	prec@1		prec@10		recall		novelty		serendipity
	mean	std	mean	std	mean	std	mean	std	mean	std
model
random	0.000000	0.000000	0.002100	0.002292	0.006167	0.009865	6.521521	0.027493	0.000220	0.000336
popular	0.050473	0.004360	0.041440	0.016969	0.031034	0.013335	1.608596	0.041782	0.000252	0.000174
most_raited	0.044625	0.009151	0.040734	0.014782	0.025564	0.016192	1.620426	0.041491	0.000213	0.000103
tfidf_k=5	0.065625	0.021900	0.043076	0.019239	0.037354	0.012516	2.355431	0.026902	0.001527	0.001076
tfidf_k=10	0.046635	0.017747	0.041827	0.017757	0.034497	0.022252	2.162626	0.038480	0.000863	0.000512
bm25_k=10_k1=0.05_b=0.1	0.062584	0.032787	0.042792	0.015441	0.031797	0.019048	1.830871	0.062541	0.000337	0.000063