TimeRangeSplit
- class rectools.model_selection.time_split.TimeRangeSplit(date_range: Sequence[Union[date, datetime]], filter_cold_users: bool = True, filter_cold_items: bool = True, filter_already_seen: bool = True)[source]
Bases:
objectSplitter for cross-validation by time.
Generate train and test folds by time, it is also possible to exclude cold users and items and already seen items.
- Parameters
date_range (array-like(date|datetime)) – Ordered test fold borders. Left will be included, right will be excluded from fold. Interactions before first border will be used for train. Interaction after right border will not be used. Ca be easily generated with [pd.date_range] (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html)
filter_cold_users (bool, default
True) – If True, users that not in train will be excluded from test.filter_cold_items (bool, default
True) – If True, items that not in train will be excluded from test.filter_already_seen (bool, default
True) – IfTrue, pairs (user, item) that are in train will be excluded from test.
Examples
>>> from datetime import date >>> df = pd.DataFrame( ... [ ... [1, 2, "2021-09-01"], # 0 ... [2, 1, "2021-09-02"], # 1 ... [2, 3, "2021-09-03"], # 2 ... [3, 2, "2021-09-03"], # 3 ... [3, 3, "2021-09-04"], # 4 ... [3, 4, "2021-09-04"], # 5 ... [1, 2, "2021-09-05"], # 6 ... [4, 2, "2021-09-05"], # 7 ... [4, 2, "2021-09-06"], # 8 ... ], ... columns=[Columns.User, Columns.Item, Columns.Datetime], ... ).astype({Columns.Datetime: "datetime64[ns]"}) >>> date_range = pd.date_range(date(2021, 9, 4), date(2021, 9, 6)) >>> >>> trs = TimeRangeSplit(date_range, False, False, False) >>> for train_ids, test_ids, _ in trs.split(df): ... print(train_ids, test_ids) [0 1 2 3] [4 5] [0 1 2 3 4 5] [6 7] >>> >>> trs = TimeRangeSplit(date_range, True, True, True) >>> for train_ids, test_ids, _ in trs.split(df): ... print(train_ids, test_ids) [0 1 2 3] [4] [0 1 2 3 4 5] []
- Inherited-members
- Parameters
date_range (Sequence[Union[date, datetime]]) –
filter_cold_users (bool) –
filter_cold_items (bool) –
filter_already_seen (bool) –
Methods
get_n_splits(df)Return real number of folds.
split(df[, collect_fold_stats])Split interactions into folds.
- get_n_splits(df: DataFrame) int[source]
Return real number of folds.
- Parameters
df (DataFrame) –
- Return type
int
- split(df: DataFrame, collect_fold_stats: bool = False) Iterator[Tuple[ndarray, ndarray, Dict[str, Any]]][source]
Split interactions into folds.
- Parameters
df (pd.DataFrame) – User-item interactions. Obligatory columns: Columns.User, Columns.Item, Columns.Datetime.
collect_fold_stats (bool, default False) – Add some stats to fold info, like size of train and test part, number of users and items.
- Returns
Yields tuples with train part row numbers, test part row numbers and fold info.
- Return type
iterator(array, array, dict)