macrosynergy.management.simulate#
- make_qdf(df_cids, df_xcats, back_ar=0, seed=None)[source]#
Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’.
- Parameters:
df_cids (pd.DataFrame) – DataFrame with parameters by cid. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation.
df_xcats (pd.DataFrame) – dataframe with parameters by xcat. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country- specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set auto-correlation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background factor is added to category values.
back_ar (float) – float between 0 and 1 denoting set auto-correlation of the background factor. Default is zero.
seed (int) – seed for random number generation. Default is None.
- Returns:
basic quantamental DataFrame according to specifications.
- Return type:
pd.DataFrame
- make_test_df(cids=['AUD', 'CAD', 'GBP'], xcats=['XR', 'CRY'], tickers=None, metrics=['value'], start='2010-01-01', end='2020-12-31', style='any')[source]#
Generates a test dataframe with pre-defined values. These values are meant to be used for testing purposes only. The functions generates a standard quantamental dataframe with where the value column is populated with pre-defined values. These values are simple lines, or waves that are easy to identify and differentiate in a plot.
- Parameters:
cids (List[str]) – A list of strings for cids.
xcats (List[str]) – A list of strings for xcats.
tickers (List[str]) – A list of strings for tickers. If provided, cids and xcats will be ignored.
metrics (List[str]) – A list of strings for metrics.
start (str) – An ISO-formatted date string.
end (str) – An ISO-formatted date string.
style (str) – A string that specifies the type of line to generate. Current choices are: ‘linear’, ‘decreasing-linear’, ‘sharp-hill’, ‘four-bit-sine’, ‘sine’, ‘cosine’, ‘sawtooth’, ‘any’. See macrosynergy.management.simulate.simulate_quantamental_data.generate_lines().
- Return type:
- dataframe_generator(df_cids, df_xcats, cid, xcat)[source]#
Adjacent method used to construct the quantamental DataFrame.
- generate_lines(sig_len, style='linear')[source]#
Returns a numpy array of a line with a given length.
- Parameters:
sig_len (int) – The number of elements in the returned array.
style (str) – The style of the line. Default ‘linear’. Current choices are: linear, decreasing-linear, sharp-hill, four-bit-sine, sine, cosine, sawtooth. Adding “inv” or “inverted” to the style will return the inverted version of that line. For example, ‘inv-sawtooth’ or ‘inverted sawtooth’ will return the inverted sawtooth line. ‘any’ will return a random line. ‘all’ will return a list of all the available styles.
- Returns:
A numpy array of the line. If style is ‘all’, then a list (of strings) of all the available styles is returned. NOTE: It is indeed request an “inverted linear” or “inverted decreasing-linear” line. They’re just there for completeness and readability.
- Return type:
Union[np.ndarray, List[str]]
- make_qdf_black(df_cids, df_xcats, blackout)[source]#
Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’. In this DataFrame the column, ‘value’, will consist of Binary Values denoting whether the cross-section is active for the corresponding dates.
- Parameters:
df_cids (pd.DataFrame) – dataframe with parameters by cid. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation.
df_xcats (pd.DataFrame) – dataframe with parameters by xcat. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country- specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set autocorrelation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background factor is added to categoy values.
blackout (dict) – Dictionary defining the blackout periods for each cross- section. The expected form of the dictionary is: {‘AUD’: (Timestamp(‘2000-01-13 00:00:00’), Timestamp(‘2000-01-13 00:00:00’)), ‘USD_1’: (Timestamp(‘2000-01-03 00:00:00’), Timestamp(‘2000-01-05 00:00:00’)), ‘USD_2’: (Timestamp(‘2000-01-09 00:00:00’), Timestamp(‘2000-01-10 00:00:00’)), ‘USD_3’: (Timestamp(‘2000-01-12 00:00:00’), Timestamp(‘2000-01-12 00:00:00’))} The values of the dictionary are tuples consisting of the start & end-date of the respective blackout period. Each cross- section could have potentially more than one blackout period on a single category, and subsequently each key will be indexed to indicate the number of periods.
- Returns:
basic quantamental DataFrame according to specifications with binary values.
- Return type:
pd.DataFrame
- simulate_ar(nobs, mean=0, sd_mult=1, ar_coef=0.75)[source]#
Create an auto-correlated data-series as numpy array.
- Parameters:
- Returns:
autocorrelated data series.
- Return type:
np.ndarray
- simulate_returns_and_signals(cids=['AUD', 'CAD', 'GBP', 'USD'], xcat='EQ', return_suffix='XR', signal_suffix='_CSIG_STRAT', years=20, sigma_eta=0.01, sigma_0=0.1, start=None, end=None)[source]#
Simulate returns and signals
Equations for return and signal generation: 1. r(t+1,i) = sigma(t+1,i)*(alpha(t+1,i) + beta(t+1,i)*rb(t+1) + epsilon(t+1,i))
epsilon(t+1,i) ~ N(0, 1)
ln(sigma(t+1,i)) = ln(sigma(t,i)) + eta(t+1,i), eta(t+1,i) ~ N(0, sigma_eta^2)
alpha(t+1,i) = signal(t,i) + eta_alpha(t+1,i), eta_alpha(t+1,i) ~ N(0, sigma_alpha^2)
beta(t+1,i) = beta(t,i) + eta_beta(t+1,i), eta_beta(t+1,i) ~ N(0, sigma_beta^2)
rb(t+1) = mu + eta_rb(t+1), eta_rb(t+1) ~ N(0, sigma_rb^2)
signal(t, i) = … mean zero, but persistence….
- class VintageData(ticker, cutoff='2020-12-31', release_lags=[15, 30], number_firsts=24, shortest=36, freq='M', start_value=100, trend_ar=5, sd_ar=3.4641016151377544, seasonal=None, added_dates=12)[source]#
Bases:
object
Creates standardized dataframe of single-ticker vintages. This class creates standardized grade 1 and grade 2 vintage data.
- Parameters:
ticker (str) – ticker name
cutoff (str) – last possible release date. The format must be ‘%Y-%m-%d’. All other dates are calculated from this one. Default is end 2020.
release_lags (list) – list of integers in ascending order denoting lags of the first, second etc. release in (calendar) days. Default is first release after 15 days and revision after 30 days. If days fall on weekend they will be delayed to Monday.
number_firsts (int) – number of first-release vintages in the simulated data set. Default is 24.
shortest (int) – number of observations in the first (shortest) vintage. Default is 36.
freq (str) – letter denoting the frequency of the vintage data. Must be one of ‘M’ (monthly, default), ‘Q’ (quarterly) or ‘W’ (weekly).
start_value (float) – expected first value of the random series. Default is 100.
trend_ar (float) – annualized trend. Default is 5% linear drift per year. This is applied to the start value. If the start value is not positive the linear trend is added as number.
sd_ar (float) – annualized standard deviation. Default is sqrt(12).
seasonal (float) – adds seasonal pattern (applying linear factor from low to high through the year) with value denoting the average % seasonal factor through the year. Default is None. The seasonal pattern makes only sense for values that are strictly positive and are interpreted as indices.
added_dates (int) – number of added first release dates, used for grade 2 dataframe generation. Default is 12.
- static date_check(date_string)[source]#
Validates that the dates passed are valid timestamp expressions and will convert to the required form ‘%Y-%m-%d’.
- Parameters:
date_string (str) – valid date expression. For instance, “1st January, 2000.”
- Raises:
TypeError – if the date_string is not a string.
ValueError – if the date_string is not in the correct format.
- seasonal_adj(obs_dates, seas_factors, values)[source]#
Method used to seasonally adjust the series. Economic data can vary according to the season.
- Parameters:
- Returns:
returns a list of values which have been adjusted seasonally
- Return type:
List[float]
- make_graded(grading, upgrades=[])[source]#
Simulates an explicitly graded dataframe with a column ‘grading’.
- make_grade2()[source]#
Method used to construct a dataframe that consists of each respective observation date and the corresponding release date(s) (the release dates are computed using the observation date and the time-period(s) specified in the field “release_lags”).
- Returns:
Will return the DataFrame with the additional columns.
- Return type:
pd.DataFrame