macrosynergy.management.simulate.simulate_quantamental_data#
Module with functionality for generating mock quantamental data for testing purposes.
- simulate_ar(nobs, mean=0, sd_mult=1, ar_coef=0.75)[source]#
Create an auto-correlated data-series as numpy array.
- Parameters:
- Returns:
autocorrelated data series.
- Return type:
np.ndarray
- dataframe_generator(df_cids, df_xcats, cid, xcat)[source]#
Adjacent method used to construct the quantamental DataFrame.
- temporary_seed(seed)[source]#
A context manager that temporarily sets the seed for both NumPy and Python’s random.
- make_qdf(df_cids, df_xcats, back_ar=0, seed=None)[source]#
Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’.
- Parameters:
df_cids (pd.DataFrame) – DataFrame with parameters by cid. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation.
df_xcats (pd.DataFrame) – dataframe with parameters by xcat. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country- specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set auto-correlation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background factor is added to category values.
back_ar (float) – float between 0 and 1 denoting set auto-correlation of the background factor. Default is zero.
seed (int) – seed for random number generation. Default is None.
- Returns:
basic quantamental DataFrame according to specifications.
- Return type:
pd.DataFrame
- make_qdf_black(df_cids, df_xcats, blackout)[source]#
Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’. In this DataFrame the column, ‘value’, will consist of Binary Values denoting whether the cross-section is active for the corresponding dates.
- Parameters:
df_cids (pd.DataFrame) – dataframe with parameters by cid. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation.
df_xcats (pd.DataFrame) – dataframe with parameters by xcat. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country- specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set autocorrelation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background factor is added to categoy values.
blackout (dict) – Dictionary defining the blackout periods for each cross- section. The expected form of the dictionary is: {‘AUD’: (Timestamp(‘2000-01-13 00:00:00’), Timestamp(‘2000-01-13 00:00:00’)), ‘USD_1’: (Timestamp(‘2000-01-03 00:00:00’), Timestamp(‘2000-01-05 00:00:00’)), ‘USD_2’: (Timestamp(‘2000-01-09 00:00:00’), Timestamp(‘2000-01-10 00:00:00’)), ‘USD_3’: (Timestamp(‘2000-01-12 00:00:00’), Timestamp(‘2000-01-12 00:00:00’))} The values of the dictionary are tuples consisting of the start & end-date of the respective blackout period. Each cross- section could have potentially more than one blackout period on a single category, and subsequently each key will be indexed to indicate the number of periods.
- Returns:
basic quantamental DataFrame according to specifications with binary values.
- Return type:
pd.DataFrame
- generate_lines(sig_len, style='linear')[source]#
Returns a numpy array of a line with a given length.
- Parameters:
sig_len (int) – The number of elements in the returned array.
style (str) – The style of the line. Default ‘linear’. Current choices are: linear, decreasing-linear, sharp-hill, four-bit-sine, sine, cosine, sawtooth. Adding “inv” or “inverted” to the style will return the inverted version of that line. For example, ‘inv-sawtooth’ or ‘inverted sawtooth’ will return the inverted sawtooth line. ‘any’ will return a random line. ‘all’ will return a list of all the available styles.
- Returns:
A numpy array of the line. If style is ‘all’, then a list (of strings) of all the available styles is returned. NOTE: It is indeed request an “inverted linear” or “inverted decreasing-linear” line. They’re just there for completeness and readability.
- Return type:
Union[np.ndarray, List[str]]
- make_test_df(cids=['AUD', 'CAD', 'GBP'], xcats=['XR', 'CRY'], tickers=None, metrics=['value'], start='2010-01-01', end='2020-12-31', style='any')[source]#
Generates a test dataframe with pre-defined values. These values are meant to be used for testing purposes only. The functions generates a standard quantamental dataframe with where the value column is populated with pre-defined values. These values are simple lines, or waves that are easy to identify and differentiate in a plot.
- Parameters:
cids (List[str]) – A list of strings for cids.
xcats (List[str]) – A list of strings for xcats.
tickers (List[str]) – A list of strings for tickers. If provided, cids and xcats will be ignored.
metrics (List[str]) – A list of strings for metrics.
start (str) – An ISO-formatted date string.
end (str) – An ISO-formatted date string.
style (str) – A string that specifies the type of line to generate. Current choices are: ‘linear’, ‘decreasing-linear’, ‘sharp-hill’, ‘four-bit-sine’, ‘sine’, ‘cosine’, ‘sawtooth’, ‘any’. See macrosynergy.management.simulate.simulate_quantamental_data.generate_lines().
- Return type:
- simulate_returns_and_signals(cids=['AUD', 'CAD', 'GBP', 'USD'], xcat='EQ', return_suffix='XR', signal_suffix='_CSIG_STRAT', years=20, sigma_eta=0.01, sigma_0=0.1, start=None, end=None)[source]#
Simulate returns and signals
Equations for return and signal generation: 1. r(t+1,i) = sigma(t+1,i)*(alpha(t+1,i) + beta(t+1,i)*rb(t+1) + epsilon(t+1,i))
epsilon(t+1,i) ~ N(0, 1)
ln(sigma(t+1,i)) = ln(sigma(t,i)) + eta(t+1,i), eta(t+1,i) ~ N(0, sigma_eta^2)
alpha(t+1,i) = signal(t,i) + eta_alpha(t+1,i), eta_alpha(t+1,i) ~ N(0, sigma_alpha^2)
beta(t+1,i) = beta(t,i) + eta_beta(t+1,i), eta_beta(t+1,i) ~ N(0, sigma_beta^2)
rb(t+1) = mu + eta_rb(t+1), eta_rb(t+1) ~ N(0, sigma_rb^2)
signal(t, i) = … mean zero, but persistence….