Examples of calculating different metrics with RecTools

  • Initializing different metrics

  • Calculating a value of a single metric

  • Calculating metric values per user

  • Calculating values of a bunch of metrics using only one function

[2]:
import numpy as np
import pandas as pd

from implicit.nearest_neighbours import TFIDFRecommender

from rectools import Columns
from rectools.dataset import Dataset
from rectools.metrics import (
    Precision,
    Accuracy,
    NDCG,
    IntraListDiversity,
    Serendipity,
    calc_metrics,
)
from rectools.metrics.distances import PairwiseHammingDistanceCalculator
from rectools.models import ImplicitItemKNNWrapperModel

Load data

[3]:
%%time
!wget -q https://files.grouplens.org/datasets/movielens/ml-1m.zip -O ml-1m.zip
!unzip -o ml-1m.zip
!rm ml-1m.zip
Archive:  ml-1m.zip
  inflating: ml-1m/movies.dat
  inflating: ml-1m/ratings.dat
  inflating: ml-1m/README
  inflating: ml-1m/users.dat
CPU times: user 39.5 ms, sys: 44.8 ms, total: 84.3 ms
Wall time: 3.22 s
[4]:
%%time
ratings = pd.read_csv(
    "ml-1m/ratings.dat",
    sep="::",
    engine="python",  # Because of 2-chars separators
    header=None,
    names=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],
)
print(ratings.shape)
ratings.head()
(1000209, 4)
CPU times: user 3.51 s, sys: 270 ms, total: 3.78 s
Wall time: 3.77 s
[4]:
user_id item_id weight datetime
0 1 1193 5 978300760
1 1 661 3 978302109
2 1 914 3 978301968
3 1 3408 4 978300275
4 1 2355 5 978824291
[5]:
ratings["datetime"] = pd.to_datetime(ratings["datetime"] * 10 ** 9)
ratings["datetime"].min(), ratings["datetime"].max()
[5]:
(Timestamp('2000-04-25 23:05:32'), Timestamp('2003-02-28 17:49:50'))
[6]:
%%time
movies = pd.read_csv(
    "ml-1m/movies.dat",
    sep="::",
    engine="python",  # Because of 2-chars separators
    header=None,
    names=[Columns.Item, "title", "genres"],
    encoding_errors="ignore",
)
print(movies.shape)
movies.head()
(3883, 3)
CPU times: user 9.53 ms, sys: 518 µs, total: 10 ms
Wall time: 9.36 ms
[6]:
item_id title genres
0 1 Toy Story (1995) Animation|Children's|Comedy
1 2 Jumanji (1995) Adventure|Children's|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama
4 5 Father of the Bride Part II (1995) Comedy

Build model

[7]:
# Split once by train and test to demonstrate how different metrics work
split_dt = pd.Timestamp("2003-02-01")
df_train = ratings.loc[ratings["datetime"] < split_dt]
df_test = ratings.loc[ratings["datetime"] >= split_dt]
[8]:
%%time

# Prepare dataset, fit model and generate recommendations
dataset = Dataset.construct(df_train)
model = ImplicitItemKNNWrapperModel(TFIDFRecommender(K=10))
model.fit(dataset)
recos = model.recommend(
    users=ratings[Columns.User].unique(),
    dataset=dataset,
    k=10,
    filter_viewed=True,
)
CPU times: user 4.77 s, sys: 257 ms, total: 5.02 s
Wall time: 1.31 s

Calculate metrics

Metrics initialization

To calculate a metric it is necessary to create its object.

Most metrics have k parameter - the number of top recommendations that will be used for metric calculation.

Some metrics have additional parameters.

Simple metrics

[7]:
precision = Precision(k=10)
accuracy_1 = Accuracy(k=1)
accuracy_10 = Accuracy(k=10)
serendipity = Serendipity(k=10)

Metric with simple additional parameter

[8]:
ndcg = NDCG(k=10, log_base=3)

Metric with complex additional parameter

To calculate any diversity metric (e.g. IntraListDivirsity) you need to measure distance between items.

For example, you can use Hamming distance.

As features, let’s use movie genres.

[9]:
movies["genre"] = movies["genres"].str.split("|")
genre_exploded = movies[["item_id", "genre"]].set_index("item_id").explode("genre")
genre_dummies = pd.get_dummies(genre_exploded, prefix="", prefix_sep="").groupby("item_id").sum()
genre_dummies.head()
[9]:
Action Adventure Animation Children's Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
item_id
1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
4 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
[10]:
distance_calculator = PairwiseHammingDistanceCalculator(genre_dummies)
ild = IntraListDiversity(k=10, distance_calculator=distance_calculator)

Single metric calculation

The easiest way to calculate metric is to use calc method.

Every metric has it, but arguments are different.

If you need to get metric value for every user, use calc_per_user method.

[11]:
precision_value = precision.calc(reco=recos, interactions=df_test)
print(f"precision: {precision_value}")

precision_per_user = precision.calc_per_user(reco=recos, interactions=df_test)
print("\nprecision per user:")
display(precision_per_user.head())

print("Values are equal? ", precision_per_user.mean() == precision_value)
precision: 0.06464646464646465

precision per user:
user_id
195    0.3
229    0.0
343    0.0
349    0.0
398    0.5
dtype: float64
Values are equal?  True
[12]:
# Catalog is a set of items that we recommend.
# Sometimes not all items from train dataset appear in recommendations list.
catalog = df_train[Columns.Item].unique()
print("Accuracy@1: ", accuracy_1.calc(reco=recos, interactions=df_test, catalog=catalog))
print("Accuracy@10: ", accuracy_10.calc(reco=recos, interactions=df_test, catalog=catalog))
Accuracy@1:  0.9956908534890186
Accuracy@10:  0.9935730756022174
[13]:
serendipity_value = serendipity.calc(
    reco=recos,
    interactions=df_test,
    prev_interactions=df_train,
    catalog=catalog
)
print("Serendipity: ", serendipity_value)
Serendipity:  2.3436131849908687e-05
[14]:
print("NDCG: ", ndcg.calc(reco=recos, interactions=df_test))
NDCG:  0.06808226116073855
[15]:
%%time
print("ILD: ", ild.calc(reco=recos))
ILD:  3.1908278145695363
CPU times: user 2.1 s, sys: 556 ms, total: 2.66 s
Wall time: 2.64 s

Multiple metrics calculation

It is possible to calculate a bunch of metrics using only one function - calc_metrics.

It contains same optimisations in performance: if several metrics do the same calculations, they will be performed only once.

[16]:
metrics = {
    "precision": precision,
    "accuracy@1": accuracy_1,
    "accuracy@10": accuracy_10,
    "ndcg": ndcg,
    "serendipity": serendipity,
    "diversity": ild,
}

# Some arguments can be omitted if they are not needed for metrics calculation
calc_metrics(
    metrics,
    reco=recos,
    interactions=df_test,
    prev_interactions=df_train,
    catalog=catalog
)
[16]:
{'precision': 0.06464646464646465,
 'accuracy@10': 0.9935730756022174,
 'accuracy@1': 0.9956908534890186,
 'ndcg': 0.06808226116073855,
 'diversity': 3.1908278145695363,
 'serendipity': 2.3436131849908687e-05}
[ ]: