macrosynergy.panel.make_zn_scores#
Module for calculating z-scores for a panel around a neutral level (“zn scores”).
- make_zn_scores(df, xcat=None, cids=None, start=None, end=None, blacklist=None, sequential=True, min_obs=261, iis=True, neutral='zero', est_freq='D', thresh=None, upfront_thresh=None, pan_weight=1, postfix='ZN', ffill=0, unscore=False)[source]#
Computes z-scores for a panel around a neutral level (“zn scores”).
- Parameters:
df (Dataframe) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
xcat (str or List[str]) – extended category (or list of categories) for which zn-scores are calculated. If a list is provided, scores are computed separately for each category and the combined standardized DataFrame is returned.
cids (List[str]) – cross sections for which zn_scores are calculated; default is all available for category.
start (str) – earliest date in ISO format. Default is None and earliest date in df is used.
end (str) – latest date in ISO format. Default is None and latest date in df is used.
blacklist (dict) – cross-sections with date ranges that should be excluded from the calculation of zn-scores. This means that not only are there no zn-score values calculated for these periods, but also that they are not used for the scoring of other periods.
sequential (bool) – if True (default) score parameters (neutral level and mean absolute deviation) are estimated sequentially with concurrently available information only.
min_obs (int) – the minimum number of observations required to calculate zn_scores. Default is 261. The parameter is only applicable if the “sequential” parameter is set to True. Otherwise the neutral level and the mean absolute deviation are both computed in- sample and will use the full sample.
iis (bool) – if True (default) zn-scores are also calculated for the initial sample period defined by min-obs on an in-sample basis to avoid losing history. This is irrelevant if sequential is set to False.
neutral (str, Number) – method to determine neutral level. Default is ‘zero’. Alternatives are ‘mean’, ‘median’ or a number.
est_freq (str) – the frequency at which mean absolute deviations or means are are re-estimated. The options are daily, weekly, monthly & quarterly: “D”, “W”, “M”, “Q”. Default is daily. Re-estimation is performed at period end.
thresh (float) – threshold value beyond which scores are winsorized, i.e. contained at that threshold. The threshold is the maximum absolute score value that the function is allowed to produce. The minimum threshold is 1 mean absolute deviation.
upfront_thresh (float) – threshold value beyond which the original input data are winsorized, i.e. capped or floored at that threshold on the positive or negative side. Default is None. The threshold limits the values of the original data in their native units to avoid large outliers compromising subsequent operations.
pan_weight (float) – weight of panel (versus individual cross section) for calculating the z-score parameters, i.e. the neutral level and the mean absolute deviation. Default is 1, i.e. panel data are the basis for the parameters. Lowest possible value is 0, i.e. parameters are all specific to cross section.
postfix (str) – string appended to category name for output; default is “ZN”.
ffill (int, default 0) – Forward fills the trailing NaN values in the input DataFrame. The parameter specifies the number of periods to fill. If set to 0, no forward fill is performed.
unscore (bool, default False) – If True, the function will apply the specified threshold to z-scores, but return values on the original scale. The thresh parameter will determine the z-score limits, and the winsorized values will be converted back to the original scale before being returned.
- Returns:
standardized DataFrame with the zn-scores of the chosen category: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
- Return type:
Dataframe
Note
The blacklist argument is a dictionary with cross-sections as keys and tuples of start and end dates of the blacklist periods in ISO formats as values. If one cross section has multiple blacklist periods, numbers are added to the keys (i.e. TRY_1, TRY_2, etc.)
- expanding_stat(df, dates_iter, stat='mean', sequential=True, min_obs=261, iis=True)[source]#
Compute specified statistic based on an expanding sample.
- Parameters:
df (Dataframe) – Daily-frequency time series DataFrame.
dates_iter (DatetimeIndex) – controls the frequency of the neutral & mean absolute deviation calculations.
stat (str, Number) – statistical method to be applied. This is typically ‘mean’, or ‘median’.
sequential (bool) – if True (default) the statistic is estimated sequentially. If this set to false a single value is calculated per time series, based on the full sample.
min_obs (int) – minimum required observations for calculation of the statistic in days.
iis (bool) – if set to True, the values of the initial interval determined by min_obs will be estimated in-sample, based on the full initial sample.
- Returns:
Time series dataframe of the chosen statistic across all columns
- Return type: