Simple example of building recommendations with RecTools
Building simple model
Visual recommendations checking
[ ]:
import numpy as np
import pandas as pd
from implicit.nearest_neighbours import TFIDFRecommender
from rectools import Columns
from rectools.dataset import Dataset
from rectools.models import ImplicitItemKNNWrapperModel
Load data
[2]:
%%time
!wget -q https://files.grouplens.org/datasets/movielens/ml-1m.zip -O ml-1m.zip
!unzip -o ml-1m.zip
!rm ml-1m.zip
Archive: ml-1m.zip
inflating: ml-1m/movies.dat
inflating: ml-1m/ratings.dat
inflating: ml-1m/README
inflating: ml-1m/users.dat
CPU times: user 134 ms, sys: 415 ms, total: 548 ms
Wall time: 4.39 s
[2]:
%%time
ratings = pd.read_csv(
"ml-1m/ratings.dat",
sep="::",
engine="python", # Because of 2-chars separators
header=None,
names=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],
)
print(ratings.shape)
ratings.head()
(1000209, 4)
CPU times: user 5.76 s, sys: 409 ms, total: 6.17 s
Wall time: 6.16 s
[2]:
user_id | item_id | weight | datetime | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
[3]:
%%time
movies = pd.read_csv(
"ml-1m/movies.dat",
sep="::",
engine="python", # Because of 2-chars separators
header=None,
names=[Columns.Item, "title", "genres"],
encoding_errors="ignore",
)
print(movies.shape)
movies.head()
(3883, 3)
CPU times: user 9.55 ms, sys: 1.62 ms, total: 11.2 ms
Wall time: 10.4 ms
[3]:
item_id | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Animation|Children's|Comedy |
1 | 2 | Jumanji (1995) | Adventure|Children's|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
Build model
[4]:
# Prepare a dataset to build a model
dataset = Dataset.construct(ratings)
[5]:
%%time
# Fit model and generate recommendations for all users
model = ImplicitItemKNNWrapperModel(TFIDFRecommender(K=10))
model.fit(dataset)
recos = model.recommend(
users=ratings[Columns.User].unique(),
dataset=dataset,
k=10,
filter_viewed=True,
)
CPU times: user 6.05 s, sys: 274 ms, total: 6.32 s
Wall time: 1.42 s
[6]:
# Sample of recommendations - it's sorted by relevance (= rank) for each user
recos.head()
[6]:
user_id | item_id | score | rank | |
---|---|---|---|---|
0 | 1 | 364 | 20.436578 | 1 |
1 | 1 | 1196 | 15.716834 | 2 |
2 | 1 | 318 | 15.625371 | 3 |
3 | 1 | 2096 | 14.876911 | 4 |
4 | 1 | 2571 | 12.718620 | 5 |
Check recommendations
[7]:
# Select random user, see history of views and reco for this user
user_id = 3883
user_viewed = ratings.query("user_id == @user_id").merge(movies, on="item_id")
user_recos = recos.query("user_id == @user_id").merge(movies, on="item_id")
[8]:
# History, but only films that user likes
user_viewed.query("weight > 3")
[8]:
user_id | item_id | weight | datetime | title | genres | |
---|---|---|---|---|---|---|
0 | 3883 | 2997 | 5 | 967134212 | Being John Malkovich (1999) | Comedy |
2 | 3883 | 1265 | 5 | 967134285 | Groundhog Day (1993) | Comedy|Romance |
4 | 3883 | 2858 | 5 | 965822230 | American Beauty (1999) | Comedy|Drama |
10 | 3883 | 2369 | 4 | 965822136 | Desperately Seeking Susan (1985) | Comedy|Romance |
14 | 3883 | 3189 | 4 | 965822296 | My Dog Skip (1999) | Comedy |
16 | 3883 | 1784 | 4 | 965822136 | As Good As It Gets (1997) | Comedy|Drama |
17 | 3883 | 2599 | 4 | 967134250 | Election (1999) | Comedy |
18 | 3883 | 34 | 4 | 967134285 | Babe (1995) | Children's|Comedy|Drama |
[9]:
# Recommendations
user_recos.sort_values("rank")
[9]:
user_id | item_id | score | rank | title | genres | |
---|---|---|---|---|---|---|
0 | 3883 | 2396 | 13.991358 | 1 | Shakespeare in Love (1998) | Comedy|Romance |
1 | 3883 | 2762 | 10.249648 | 2 | Sixth Sense, The (1999) | Thriller |
2 | 3883 | 318 | 7.728188 | 3 | Shawshank Redemption, The (1994) | Drama |
3 | 3883 | 608 | 7.617913 | 4 | Fargo (1996) | Crime|Drama|Thriller |
4 | 3883 | 356 | 5.674010 | 5 | Forrest Gump (1994) | Comedy|Romance|War |
5 | 3883 | 2395 | 5.508895 | 6 | Rushmore (1998) | Comedy |
6 | 3883 | 223 | 5.398012 | 7 | Clerks (1994) | Comedy |
7 | 3883 | 593 | 5.335058 | 8 | Silence of the Lambs, The (1991) | Drama|Thriller |
8 | 3883 | 296 | 4.828189 | 9 | Pulp Fiction (1994) | Crime|Drama |
9 | 3883 | 2959 | 4.615653 | 10 | Fight Club (1999) | Drama |
Here is the simple example, we only used ratings to train the model and we only prepared recommendations for users who have rated movies before. But some models allow you to use explicit features, e.g. user age or item genre. And some models allow you to generate recommendations for users that have not rated any movies before. See documentation for the details.
[ ]: