LastNSplitter

class rectools.model_selection.last_n_split.LastNSplitter(n: int, n_splits: int = 1, filter_cold_users: bool = True, filter_cold_items: bool = True, filter_already_seen: bool = True)[source]

Bases: Splitter

Splitter for cross-validation by leave-one-out / leave-k-out scheme (recent activity). Generate train and test putting last n interactions for each user in test and all of his previous interactions in train. Cross-validation is achieved with sliding window over each users interactions history.

This technique may be used for sequential recommendation scenarios. It is common in research papers on sequential recommendations. But it doesn’t fully prevent data leak from the future.

It is also possible to exclude cold users and items and already seen items.

Parameters
  • n (int) – Number of interactions for each user that will be included in test.

  • n_splits (int, default 1) – Number of test folds.

  • filter_cold_users (bool, default True) – If True, users that are not present in train will be excluded from test. WARNING: both cold and warm users will be excluded from test.

  • filter_cold_items (bool, default True) – If True, items that are not present in train will be excluded from test. WARNING: both cold and warm items will be excluded from test.

  • filter_already_seen (bool, default True) – If True, pairs (user, item) that are present in train will be excluded from test.

Examples

>>> from rectools import Columns
>>> df = pd.DataFrame(
...     [
...         [1, 1, 1, "2021-09-01"], # 0
...         [1, 2, 1, "2021-09-02"], # 1
...         [1, 1, 1, "2021-09-03"], # 2
...         [1, 2, 1, "2021-09-04"], # 3
...         [1, 2, 1, "2021-09-05"], # 4
...         [2, 1, 1, "2021-08-20"], # 5
...         [2, 2, 1, "2021-08-21"], # 6
...         [2, 2, 1, "2021-08-22"], # 7
...     ],
...     columns=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],
... ).astype({Columns.Datetime: "datetime64[ns]"})
>>> interactions = Interactions(df)
>>>
>>> splitter = LastNSplitter(2, 2, False, False, False)
>>> for train_ids, test_ids, _ in splitter.split(interactions):
...     print(train_ids, test_ids)
[0] [1 2 5]
[0 1 2 5] [3 4 6 7]
>>>
>>> splitter = LastNSplitter(2, 2, True, False, False)
>>> for train_ids, test_ids, _ in splitter.split(interactions):
...     print(train_ids, test_ids)
[0] [1 2]
[0 1 2 5] [3 4 6 7]
Inherited-members

Parameters
  • n (int) –

  • n_splits (int) –

  • filter_cold_users (bool) –

  • filter_cold_items (bool) –

  • filter_already_seen (bool) –

Methods

filter(interactions, collect_fold_stats, ...)

Filter train and test indexes from one fold based on filter_cold_users, filter_cold_items,`filter_already_seen` class fields.

split(interactions[, collect_fold_stats])

Split interactions into folds and apply filtration to the result.