{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examples of calculating different metrics with RecTools\n",
"\n",
"**Table of Contents**\n",
"\n",
"* Load and preprocess data: Movielens\n",
"* Build model\n",
"* Calculate metrics\n",
" * Metrics initialization\n",
" * Single metric calculation\n",
" * Per user metric calculation\n",
" * Multiple metrics calculation with one function\n",
"\n",
"#### We provide all types of metrics to measure model performance from different aspects\n",
"\n",
"- Classification:\n",
" - **HitRate**, **Precision** (with **R-Precision** variant which divides by minimum between k and number of user test items), **Recall**, **Accuracy**, **MCC**, **F1Beta**\n",
"\n",
"- Ranking:\n",
" - **MRR**, **MAP** (with an option to divide by k ot by number of user test items), **NDCG** (with an option to select log base)\n",
"\n",
"- Advanced AUC based ranking:\n",
" - **PartialAUC**, **PAP** (Partial AUC + Precision joint metric)\n",
"\n",
"- Beyond Accuracy:\n",
" - **Serendipity**, **MeanInvUserFreq** (mean inverse user frequency to calculty \"novelty\"), **Intra-List Diversity** (based on some meta features of items)\n",
"\n",
"- Popularity bias:\n",
" - **AvgRecPopularity**\n",
"\n",
"- Recommendations data quality:\n",
" - **SufficientReco** (share of filled recommendations in the list), **UnrepeatedReco** (share of unique items recommended for each user), **CoveredUsers** (share of test users that have at least one recommendation)\n",
"\n",
"- Between-model comparison:\n",
" - **Intersection** (share of common user-item pairs between different models recommendations)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/dmtikhonov/git_project/metrics/RecTools/.venv/lib/python3.10/site-packages/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.\n",
" warnings.warn(\n"
]
}
],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"from implicit.nearest_neighbours import TFIDFRecommender\n",
"\n",
"from rectools import Columns\n",
"from rectools.dataset import Dataset\n",
"from rectools.metrics import (\n",
" Precision,\n",
" NDCG,\n",
" AvgRecPopularity,\n",
" Intersection,\n",
" HitRate,\n",
" SufficientReco,\n",
" DebiasConfig,\n",
" IntraListDiversity,\n",
" Serendipity,\n",
" calc_metrics,\n",
")\n",
"from rectools.metrics.distances import PairwiseHammingDistanceCalculator\n",
"from rectools.models import ImplicitItemKNNWrapperModel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load and preprocess data: Movielens"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Archive: ml-1m.zip\n",
" inflating: ml-1m/movies.dat \n",
" inflating: ml-1m/ratings.dat \n",
" inflating: ml-1m/README \n",
" inflating: ml-1m/users.dat \n",
"CPU times: user 125 ms, sys: 55.1 ms, total: 180 ms\n",
"Wall time: 5.83 s\n"
]
}
],
"source": [
"%%time\n",
"!wget -q https://files.grouplens.org/datasets/movielens/ml-1m.zip -O ml-1m.zip\n",
"!unzip -o ml-1m.zip\n",
"!rm ml-1m.zip"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1000209, 4)\n",
"CPU times: user 2.27 s, sys: 71.3 ms, total: 2.34 s\n",
"Wall time: 2.35 s\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user_id | \n",
" item_id | \n",
" weight | \n",
" datetime | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" 1193 | \n",
" 5 | \n",
" 978300760 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1 | \n",
" 661 | \n",
" 3 | \n",
" 978302109 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1 | \n",
" 914 | \n",
" 3 | \n",
" 978301968 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1 | \n",
" 3408 | \n",
" 4 | \n",
" 978300275 | \n",
"
\n",
" \n",
" | 4 | \n",
" 1 | \n",
" 2355 | \n",
" 5 | \n",
" 978824291 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user_id item_id weight datetime\n",
"0 1 1193 5 978300760\n",
"1 1 661 3 978302109\n",
"2 1 914 3 978301968\n",
"3 1 3408 4 978300275\n",
"4 1 2355 5 978824291"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"ratings = pd.read_csv(\n",
" \"ml-1m/ratings.dat\",\n",
" sep=\"::\",\n",
" engine=\"python\", # Because of 2-chars separators\n",
" header=None,\n",
" names=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],\n",
")\n",
"print(ratings.shape)\n",
"ratings.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Timestamp('2000-04-25 23:05:32'), Timestamp('2003-02-28 17:49:50'))"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ratings[\"datetime\"] = pd.to_datetime(ratings[\"datetime\"] * 10 ** 9)\n",
"ratings[\"datetime\"].min(), ratings[\"datetime\"].max()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(3883, 3)\n",
"CPU times: user 6.26 ms, sys: 1.53 ms, total: 7.79 ms\n",
"Wall time: 8.6 ms\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" item_id | \n",
" title | \n",
" genres | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" Toy Story (1995) | \n",
" Animation|Children's|Comedy | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" Jumanji (1995) | \n",
" Adventure|Children's|Fantasy | \n",
"
\n",
" \n",
" | 2 | \n",
" 3 | \n",
" Grumpier Old Men (1995) | \n",
" Comedy|Romance | \n",
"
\n",
" \n",
" | 3 | \n",
" 4 | \n",
" Waiting to Exhale (1995) | \n",
" Comedy|Drama | \n",
"
\n",
" \n",
" | 4 | \n",
" 5 | \n",
" Father of the Bride Part II (1995) | \n",
" Comedy | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" item_id title genres\n",
"0 1 Toy Story (1995) Animation|Children's|Comedy\n",
"1 2 Jumanji (1995) Adventure|Children's|Fantasy\n",
"2 3 Grumpier Old Men (1995) Comedy|Romance\n",
"3 4 Waiting to Exhale (1995) Comedy|Drama\n",
"4 5 Father of the Bride Part II (1995) Comedy"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"movies = pd.read_csv(\n",
" \"ml-1m/movies.dat\",\n",
" sep=\"::\",\n",
" engine=\"python\", # Because of 2-chars separators\n",
" header=None,\n",
" names=[Columns.Item, \"title\", \"genres\"],\n",
" encoding_errors=\"ignore\",\n",
")\n",
"print(movies.shape)\n",
"movies.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build model"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Split once by train and test to demonstrate how different metrics work\n",
"split_dt = pd.Timestamp(\"2003-02-01\")\n",
"df_train = ratings.loc[ratings[\"datetime\"] < split_dt]\n",
"df_test = ratings.loc[ratings[\"datetime\"] >= split_dt]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.02 s, sys: 40.5 ms, total: 1.06 s\n",
"Wall time: 1.08 s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"# Prepare dataset, fit model and generate recommendations\n",
"dataset = Dataset.construct(df_train)\n",
"model = ImplicitItemKNNWrapperModel(TFIDFRecommender(K=10))\n",
"model.fit(dataset)\n",
"recos = model.recommend(\n",
" users=ratings[Columns.User].unique(),\n",
" dataset=dataset,\n",
" k=10,\n",
" filter_viewed=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calculate metrics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Metrics initialization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To calculate a metric it is necessary to create its object.\n",
"\n",
"Most metrics have `k` parameter - the number of top recommendations that will be used for metric calculation. Some metrics have additional parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Simple metrics\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"serendipity = Serendipity(k=10)\n",
"precision = Precision(k=10, r_precision=True) # r_precision means division by min(k, n_user_test_items)\n",
"ndcg = NDCG(k=10, log_base=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Metric with complex additional parameter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To calculate any diversity metric (e.g. `IntraListDivirsity`) you need to measure distance between items.\n",
"\n",
"For example, you can use Hamming distance.\n",
"\n",
"As features, let's use movie genres."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Action | \n",
" Adventure | \n",
" Animation | \n",
" Children's | \n",
" Comedy | \n",
" Crime | \n",
" Documentary | \n",
" Drama | \n",
" Fantasy | \n",
" Film-Noir | \n",
" Horror | \n",
" Musical | \n",
" Mystery | \n",
" Romance | \n",
" Sci-Fi | \n",
" Thriller | \n",
" War | \n",
" Western | \n",
"
\n",
" \n",
" | item_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 5 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Action Adventure Animation Children's Comedy Crime Documentary \\\n",
"item_id \n",
"1 0 0 1 1 1 0 0 \n",
"2 0 1 0 1 0 0 0 \n",
"3 0 0 0 0 1 0 0 \n",
"4 0 0 0 0 1 0 0 \n",
"5 0 0 0 0 1 0 0 \n",
"\n",
" Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi \\\n",
"item_id \n",
"1 0 0 0 0 0 0 0 0 \n",
"2 0 1 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 0 1 0 \n",
"4 1 0 0 0 0 0 0 0 \n",
"5 0 0 0 0 0 0 0 0 \n",
"\n",
" Thriller War Western \n",
"item_id \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"5 0 0 0 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies[\"genre\"] = movies[\"genres\"].str.split(\"|\")\n",
"genre_exploded = movies[[\"item_id\", \"genre\"]].set_index(\"item_id\").explode(\"genre\")\n",
"genre_dummies = pd.get_dummies(genre_exploded, prefix=\"\", prefix_sep=\"\").groupby(\"item_id\").sum()\n",
"genre_dummies.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"distance_calculator = PairwiseHammingDistanceCalculator(genre_dummies)\n",
"ild = IntraListDiversity(k=10, distance_calculator=distance_calculator)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Single metric calculation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The easiest way to calculate metric is to use `calc` method.\n",
"\n",
"Every metric has it, but arguments are different."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"precision: 0.08501683501683503\n"
]
}
],
"source": [
"precision_value = precision.calc(reco=recos, interactions=df_test)\n",
"print(f\"precision: {precision_value}\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Serendipity: 2.3436131849908687e-05\n"
]
}
],
"source": [
"catalog = df_train[Columns.Item].unique()\n",
"\n",
"serendipity_value = serendipity.calc(\n",
" reco=recos,\n",
" interactions=df_test,\n",
" prev_interactions=df_train,\n",
" catalog=catalog\n",
")\n",
"print(\"Serendipity: \", serendipity_value)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NDCG: 0.06808226116073855\n"
]
}
],
"source": [
"print(\"NDCG: \", ndcg.calc(reco=recos, interactions=df_test))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ILD: 3.1908278145695363\n",
"CPU times: user 460 ms, sys: 39.5 ms, total: 499 ms\n",
"Wall time: 501 ms\n"
]
}
],
"source": [
"%%time\n",
"print(\"ILD: \", ild.calc(reco=recos))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Per user metric calculation\n",
"If you need to get metric value for every user, use `calc_per_user` method."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"precision per user:\n"
]
},
{
"data": {
"text/plain": [
"user_id\n",
"195 0.3\n",
"229 0.0\n",
"343 0.0\n",
"349 0.0\n",
"398 0.5\n",
"dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Values are equal? True\n"
]
}
],
"source": [
"precision_per_user = precision.calc_per_user(reco=recos, interactions=df_test)\n",
"print(\"\\nprecision per user:\")\n",
"display(precision_per_user.head())\n",
"\n",
"print(\"Values are equal? \", precision_per_user.mean() == precision_value)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiple metrics calculation with one function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible to calculate a bunch of metrics using only one function - `calc_metrics`.\n",
"\n",
"It contains important optimisations in performance: if several metrics do the same calculations, they will be performed only once."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'hit_rate@10': 0.1717171717171717,\n",
" 'hit_rate_debiased@10': 0.16161616161616163,\n",
" 'ndcg@10': 0.06808226116073855,\n",
" 'pop_bias@10': 0.0017362993523658327,\n",
" 'diversity@10': 3.1908278145695363,\n",
" 'serendipity@10': 2.3436131849908687e-05,\n",
" 'intersection@10_same_model': 1.0,\n",
" 'sufficient@10': 1.0,\n",
" 'sufficient@20': 0.5}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Here we provide a debias config for one of the metrics apart from calculating it's regular value\n",
"# Check our \"Debiased metrics calculation user guied\" for more info\n",
"\n",
"metrics = {\n",
" \"hit_rate@10\": HitRate(k=10),\n",
" \"hit_rate_debiased@10\": HitRate(k=10, debias_config=DebiasConfig(iqr_coef=1.5, random_state=32)),\n",
" \"sufficient@10\": SufficientReco(k=10, deep=True),\n",
" \"sufficient@20\": SufficientReco(k=20, deep=True),\n",
" \"pop_bias@10\": AvgRecPopularity(k=10, normalize=True),\n",
" \"ndcg@10\": ndcg,\n",
" \"serendipity@10\": serendipity,\n",
" \"diversity@10\": ild,\n",
" \"intersection@10\": Intersection(k=10)\n",
"}\n",
"\n",
"\n",
"# Some arguments can be omitted if they are not needed for metrics calculation.\n",
"calc_metrics(\n",
" metrics,\n",
" reco=recos,\n",
" interactions=df_test, # needed fo all `TruePositive` based metrics\n",
" prev_interactions=df_train, # needed for serendipity\n",
" catalog=catalog, # needed for serendipity\n",
" ref_reco = {\"same_model\": recos} # needed for intersection. usually this should be recos from a different model\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}