macrosynergy.management.utils#

get_cid(ticker)[source]#

Returns the cross-sectional identifier (cid) from a ticker.

Parameters:: ticker (str) – The ticker to be converted. Returns
Returns:: The cross-sectional identifier.
Return type:: str

get_xcat(ticker)[source]#

Returns the category (xcat) from a ticker.

Parameters:: ticker (str) – The ticker to be converted. Returns
Returns:: The category.
Return type:: str

split_ticker(ticker, mode)[source]#

Returns either the cross-sectional identifier (cid) or the category (xcat) from a ticker. The function is overloaded to accept either a single ticker or an iterable (e.g. list, tuple, pd.Series, np.array) of tickers.

Parameters:

ticker (str) – The ticker to be converted.
mode (str) – The mode to be used. Must be either “cid” or “xcat”. Returns

Returns:

The cross-sectional identifier or category.

Return type:

str

is_valid_iso_date(date)[source]#

Return type:: bool

convert_iso_to_dq(date)[source]#

Return type:: str

convert_dq_to_iso(date)[source]#

Return type:: str

form_full_url(url, params={})[source]#

Forms a full URL from a base URL and a dictionary of parameters. Useful for logging and debugging.

Parameters:

url (str) – base URL.
params (dict) – dictionary of parameters.

Returns:

full URL

Return type:

str

common_cids(df, xcats)[source]#

Returns a list of cross-sectional identifiers (cids) for which the specified categories (xcats) are available.

Parameters:

df (pd.Dataframe) – Standardized JPMaQS DataFrame with necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
xcats (List[str]) – A list with least two categories whose cross-sectional identifiers are being considered. return <List[str]>: List of cross-sectional identifiers for which all categories in xcats are available.

generate_random_date(start='1990-01-01', end='2020-01-01')[source]#

Generates a random date between two dates.

Parameters:

start (str) – The start date, in the ISO format (YYYY-MM-DD).
end (str) – The end date, in the ISO format (YYYY-MM-DD). Returns

Returns:

The random date.

Return type:

str

get_dict_max_depth(d)[source]#

Returns the maximum depth of a dictionary.

Parameters:: d (dict) – The dictionary to be searched. Returns
Returns:: The maximum depth of the dictionary.
Return type:: int

rec_search_dict(d, key, match_substring=False, match_type=None)[source]#

Recursively searches a dictionary for a key and returns the value associated with it.

Parameters:

d (dict) – The dictionary to be searched.
key (str) – The key to be searched for.
match_substring (bool) – If True, the function will return the value of the first key that contains the substring specified by the key parameter. If False, the function will return the value of the first key that matches the key parameter exactly. Default is False.
match_type (Any) – If not None, the function will look for a key that matches the search parameters and has the specified type. Default is None.

Returns:

The value associated with the key, or None if the key is not found.

Return type:

Any

class Timer[source]#

Bases: object

timer()[source]#

Return type:: Tuple[float, float]

lap()[source]#

Return type:: float

check_package_version(required_version)[source]#

standardise_dataframe(df)[source]#

Applies the standard JPMaQS Quantamental DataFrame format to a DataFrame.

Parameters:

df (pd.DataFrame) – The DataFrame to be standardized.

Raises:

TypeError – If the input is not a pandas DataFrame.
ValueError – If the input DataFrame is not in the correct format.

Returns:

The standardized DataFrame.

Return type:

pd.DataFrame

drop_nan_series(df, column='value', raise_warning=False)[source]#

Drops any series that are entirely NaNs. Raises a user warning if any series are dropped and the raise warning flag is set to true.

Parameters:

df (pd.DataFrame) – The dataframe to be cleaned.
column (str) – The column to be used as the value column, defaults to “value”.
raise_warning (bool) – Whether to raise a warning if any series are dropped.

Raises:

TypeError – If the input is not a pandas DataFrame.
ValueError – If the input DataFrame is not in the correct format.

Returns:

The cleaned DataFrame.

Return type:

pd.DataFrame | QuantamentalDataFrame

qdf_to_ticker_df(df, value_column='value')[source]#

Converts a standardized JPMaQS DataFrame to a wide format DataFrame with each column representing a ticker.

Parameters:

df (pd.DataFrame) – A standardised quantamental dataframe.
value_column (str) – The column to be used as the value column, defaults to “value”. If the specified column is not present in the DataFrame, a column named “value” will be used. If there is no column named “value”, the first column in the DataFrame will be used instead.

Returns:

The converted DataFrame.

Return type:

pd.DataFrame

ticker_df_to_qdf(df, metric='value')[source]#

Converts a wide format DataFrame (with each column representing a ticker) to a standardized JPMaQS DataFrame.

Parameters:: df (pd.DataFrame) – A wide format DataFrame.
Returns:: The converted DataFrame.
Return type:: pd.DataFrame

concat_single_metric_qdfs(df_list, errors='ignore')[source]#

Combines a list of Quantamental DataFrames into a single DataFrame.

Parameters:

df_list (List[QuantamentalDataFrame]) – A list of Quantamental DataFrames.
errors (str) – The error handling method to use. If ‘raise’, then invalid items in the list will raise an error. If ‘ignore’, then invalid items will be ignored. Default is ‘ignore’.

Returns:

The combined DataFrame.

Return type:

QuantamentalDataFrame

apply_slip(df, slip, cids=None, xcats=None, tickers=None, metrics=['value'], extend_dates=False, raise_error=True)[source]#

Applies a “slip” to the DataFrame for the given cross-sections and categories, on the given metrics. A slip shifts the specified category n-days fowards in time, where n is the slip value. This is identical to a lag, but is measured in days, and must always be applied before any resampling.

Parameters:

target_df (QuantamentalDataFrame) – DataFrame to which the slip is applied.
slip (int) – Slip to be applied.
cids (List[str]) – List of cross-sections.
xcats (List[str]) – List of target categories.
metrics (List[str]) – List of metrics to which the slip is applied.
extend_dates (bool) – If True, includes the dates added by the slip in the DataFrame. If False, only the input dates are included. Default is False.
raise_error (bool) – If True, raises an error if the slip cannot be applied to all xcats in the target DataFrame. If False, raises a warning instead.

Raises:

TypeError – If the provided parameters are not of the expected type.
ValueError – If the provided parameters are semantically incorrect.

Returns:

DataFrame with the slip applied.

Return type:

QuantamentalDataFrame

downsample_df_on_real_date(df, groupby_columns=[], freq='M', agg='mean')[source]#

Downsamples JPMaQS DataFrame.

Parameters:

df (pd.Dataframe) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.
groupby_columns (List) – a list of columns used to group the DataFrame.
freq (str) – frequency option. Per default the correlations are calculated based on the native frequency of the datetimes in ‘real_date’, which is business daily. Downsampling options include weekly (‘W’), monthly (‘M’), or quarterly (‘Q’) mean.
agg (str) – aggregation method. Must be one of “mean” (default), “median”, “min”, “max”, “first” or “last”.

Returns:

the downsampled DataFrame.

Return type:

pd.DataFrame

update_df(df, df_add, xcat_replace=False)[source]#

Append a standard DataFrame to a standard base DataFrame with ticker replacement on the intersection.

Parameters:

df (pd.DataFrame) – standardised base JPMaQS DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
df_add (pd.DataFrame) – another standardised JPMaQS DataFrame, with the latest values, to be added with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’, and ‘value’. Columns that are present in the base DataFrame but not in the appended DataFrame will be populated with NaN values.
xcat_replace (bool) – all series belonging to the categories in the added DataFrame will be replaced, rather than just the added tickers.

Returns:

standardised DataFrame with the latest values of the modified or newly defined tickers added.

Return type:

pd.DataFrame

..note::: Tickers are combinations of cross-sections and categories.

update_tickers(df, df_add)[source]#

Method used to update aggregate DataFrame on a ticker level.

Parameters:

df (pd.DataFrame) – aggregate DataFrame used to store all tickers.
df_add (pd.DataFrame) – DataFrame with the latest values.

update_categories(df, df_add)[source]#

Method used to update the DataFrame on the category level.

Parameters:

df (pd.DataFrame) – base DataFrame.
df_add (pd.DataFrame) – appended DataFrame.

reduce_df(df, xcats=None, cids=None, start=None, end=None, blacklist=None, out_all=False, intersect=False)[source]#

Filter DataFrame by xcats and cids and notify about missing xcats and cids.

Parameters:

df (pd.Dataframe) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
xcats (Union[str, List[str]]) – extended categories to be filtered on. Default is all in the DataFrame.
cids (List[str]) – cross sections to be checked on. Default is all in the dataframe.
start (str) – string representing the earliest date. Default is None.
end (str) – string representing the latest date. Default is None.
blacklist (dict) – cross-sections with date ranges that should be excluded from the data frame. If one cross-section has several blacklist periods append numbers to the cross-section code.
out_all (bool) – if True the function returns reduced dataframe and selected/ available xcats and cids. Default is False, i.e. only the DataFrame is returned
intersect (bool) – if True only retains cids that are available for all xcats. Default is False.

Returns:

reduced DataFrame that also removes duplicates or (for out_all True) DataFrame and available and selected xcats and cids.

Return type:

pd.Dataframe

reduce_df_by_ticker(df, ticks=None, start=None, end=None, blacklist=None)[source]#

Filter dataframe by xcats and cids and notify about missing xcats and cids

Parameters:

df (pd.Dataframe) – standardized dataframe with the following columns: ‘cid’, ‘xcat’, ‘real_date’.
ticks (List[str]) – tickers (cross sections + base categories)
start (str) – string in ISO 8601 representing earliest date. Default is None.
end (str) – string ISO 8601 representing the latest date. Default is None.
blacklist (dict) – cross sections with date ranges that should be excluded from the dataframe. If one cross section has several blacklist periods append numbers to the cross section code.

Returns:

reduced dataframe that also removes duplicates

Return type:

pd.Dataframe

categories_df(df, xcats, cids=None, val='value', start=None, end=None, blacklist=None, years=None, freq='M', lag=0, fwin=1, xcat_aggs=['mean', 'mean'])[source]#

In principle, create custom two-categories DataFrame with appropriate frequency and, if applicable, lags.

Parameters:

df (pd.Dataframe) – standardized JPMaQS DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.
xcats (List[str]) – extended categories involved in the custom DataFrame. The last category in the list represents the dependent variable, and the (n - 1) preceding categories will be the explanatory variables(s).
cids (List[str]) – cross-sections to be included. Default is all in the DataFrame.
val (str) – name of column that contains the values of interest. Default is ‘value’.
start (str) – earliest date in ISO 8601 format. Default is None, i.e. earliest date in DataFrame is used.
end (str) – latest date in ISO 8601 format. Default is None, i.e. latest date in DataFrame is used.
blacklist (dict) – cross-sections with date ranges that should be excluded from the DataFrame. If one cross section has several blacklist periods append numbers to the cross section code.
years (int) – number of years over which data are aggregated. Supersedes the “freq” parameter and does not allow lags, Default is None, i.e. no multi-year aggregation.
freq (str) – letter denoting frequency at which the series are to be sampled. This must be one of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is ‘M’. Will always be the last business day of the respective frequency.
lag (int) – lag (delay of arrival) of explanatory category(s) in periods as set by freq. Default is 0.
fwin (int) – forward moving average window of first category. Default is 1, i.e no average. Note: This parameter is used mainly for target returns as dependent variable.
xcat_aggs (List[str]) – exactly two aggregation methods. Default is ‘mean’ for both. The same aggregation method, the first method in the parameter, will be used for all explanatory variables.

Returns:

custom DataFrame with category columns. N.B.: The number of explanatory categories that can be included is not restricted and will be appended column-wise to the returned DataFrame. The order of the DataFrame’s columns will reflect the order of the categories list.

Return type:

pd.DataFrame

categories_df_aggregation_helper(dfx, xcat_agg)[source]#

Helper method to down-sample each category in the DataFrame by aggregating over the intermediary dates according to a prescribed method.

Parameters:

dfx (List[str]) – standardised DataFrame defined exclusively on a single category.
xcat_agg (List[str]) – associated aggregation method for the respective category.

weeks_btwn_dates(start_date, end_date)[source]#

Returns the number of business weeks between two dates.

Return type:: int

months_btwn_dates(start_date, end_date)[source]#

Returns the number of months between two dates.

Return type:: int

years_btwn_dates(start_date, end_date)[source]#

Returns the number of years between two dates.

Return type:: int

quarters_btwn_dates(start_date, end_date)[source]#

Returns the number of quarters between two dates.

Return type:: int

get_eops(dates=None, start_date=None, end_date=None, freq='M')[source]#

Returns a series of end-of-period dates for a given frequency. Dates can be passed as a series, index, a generic iterable or as a start and end date.

Parameters:

freq (str) – The frequency string. Must be one of “D”, “W”, “M”, “Q”, “A”.
dates (pd.DatetimeIndex | pd.Series | Iterable[pd.Timestamp]) – The dates to be used to generate the end-of-period dates. Can be passed as a series, index, a generic iterable or as a start and end date.
start_date (str | pd.Timestamp) – The start date. Must be passed if dates is not passed.

Return type:

Series

get_sops(dates=None, start_date=None, end_date=None, freq='M')[source]#

Returns a series of start-of-period dates for a given frequency. Dates can be passed as a series, index, a generic iterable or as a start and end date.

Parameters:

freq (str) – The frequency string. Must be one of “D”, “W”, “M”, “Q”, “A”.
dates (pd.DatetimeIndex | pd.Series | Iterable[pd.Timestamp]) – The dates to be used to generate the start-of-period dates. Can be passed as a series, index, a generic iterable or as a start and end date.
start_date (str | pd.Timestamp) – The start date. Must be passed if dates is not passed.

Return type:

Series

merge_categories(df, xcats=None, new_xcat=None, cids=None, hierarchy=None, backfill=False, start=None)[source]#

Merges categories into a new category, given a list of categories to be merged. The merging is done in a preferred order, i.e. the first category in the list will be the preferred value for each real_date and if the first category does not have a value for a given real_date, the next category in the list will be used, etc…

Parameters:

df (pd.DataFrame) – standardized JPMaQS DataFrame with the columns ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
xcats (List[str]) – extended categories to be merged, in preferred order. Alias for hierarchy; provide one or the other.
new_xcat (str) – name of the new category to be created.
cids (List[str], optional) – cross sections to be included. Default is all in the DataFrame.
hierarchy (List[str], optional) – alias for xcats. Provided for parity with the previous extend_history API.
backfill (bool, optional) – If True, the new xcat is backfilled with its first valid value to the date specified by start. Default is False.
start (str, optional) – ISO date. If backfill is True, the first valid value is propagated back to this date. If backfill is False and start is provided, the output is trimmed to dates >= start.

Returns:

DataFrame with the merged category.

Return type:

pd.DataFrame

estimate_release_frequency(timeseries=None, df_wide=None, atol=None, rtol=None)[source]#

Estimates the release frequency of a timeseries, by inferring the frequency of the timeseries index. Before calling pd.infer_freq, the function drops NaNs, and rounds values as specified by the tolerance parameters to allow dropping of “duplicate” values.

Parameters:

timeseries (pd.Series, optional) – The timeseries to be used to estimate the release frequency. Only one of timeseries or df_wide must be passed.
df_wide (pd.DataFrame, optional) – The wide DataFrame to be used to estimate the release frequency. This mode processes each column of the DataFrame as a timeseries. Only one of timeseries or df_wide must be passed.
atol (float, optional) – The absolute tolerance for the difference between two values. If None, no rounding is applied.
rtol (float, optional) – The relative tolerance for the difference between two values. If None, no rounding is applied.

Returns:

The estimated release frequency. If df_wide is passed, a dictionary with the column names as keys and the estimated frequencies as values is returned.

Return type:

str or dict

rotate_cid_xcat(df, direction, xcat_template, fixed_value)[source]#

Rotate a panel DataFrame between cid-per-row and xcat-per-row representations.

Two directions are supported:

“to_xcats”: for each row, replaces “cid” with a per-stock xcat derived from xcat_template (substituting the cid value into the “{cid}” placeholder) and sets “cid” to fixed_value.
“to_cids”: the inverse — extracts the stock identifier from “xcat” using the template as a regex, writes it into “cid”, and replaces “xcat” with fixed_value.

Parameters:

df (pd.DataFrame or QuantamentalDataFrame) – Panel DataFrame with at least “cid” and “xcat” columns.
direction (str) – Transformation direction: “to_xcats” or “to_cids”.
xcat_template (str) – Template string containing the placeholder “{cid}” that maps between a stock identifier and an xcat name, e.g. “EQXR_{cid}_NSA”.
fixed_value (str) – Value assigned to the column being collapsed. When direction is “to_xcats”, all rows will have cid set to fixed_value; when direction is “to_cids”, all rows will have xcat set to fixed_value.

Returns:

A copy of df with “cid” and “xcat” updated according to direction.

Return type:

pd.DataFrame

Raises:

ValueError – If direction is not “to_xcats” or “to_cids”.

create_delta_data(df, return_density_stats=False, score_by='diff')[source]#

Creates a dictionary of dataframes with the changes in the information state for each ticker in the QuantamentalDataFrame. Optionally, returns a DataFrame with the statistics for change frequency, density and date range for each ticker.

Parameters:

df (QuantamentalDataFrame) – The QuantamentalDataFrame to calculate the changes for.
return_density_stats (bool) – If True, returns a DataFrame with the density stats for each ticker.
score_by (str) – The method to use for scoring. If “diff” (default), the score is calculated based

Returns:

A dictionary of DataFrames with the changes in the information state for each ticker.

Return type:

Union[Dict[str, pd.DataFrame], pd.DataFrame]

calculate_score_on_sparse_indicator(isc, std='std', halflife=None, min_periods=10, isc_version=0, iis=False, custom_method=None, custom_method_kwargs={}, volatility_forecast=True)[source]#

Calculate score on sparse indicator

Parameters:

isc (Dict[str, pd.DataFrame]) – A dictionary of DataFrames with the changes in the information state for each ticker.
std (str) – The method to use for calculating the standard deviation. Supported methods are std, abs, exp and exp_abs. See the documentation for VolatilityEstimationMethods for more information.
halflife (int) – The halflife of the exponential weighting. Only used with exp and exp_abs methods. Default is None.
min_periods (int) – The minimum number of periods required for the calculation. Default is 10.
isc_version (int) – The version of the information state changes to use. If set to 0 (default), only the first version is used. If set to any other positive integer, all versions are used.
iis (bool) – if True (default) zn-scores are also calculated for the initial sample period defined by min_periods, on an in-sample basis, to avoid losing history.
custom_method (Callable) – A custom method to use for calculating the standard deviation. Must have the signature custom_method(s: pd.Series, **kwargs) -> pd.Series.
custom_method_kwargs (Dict) – Keyword arguments to pass to the custom method.
volatility_forecast (bool) – If True (default), the volatility forecast is shifted one period forward to align with the information state changes.

Returns:

A dictionary of DataFrames with the changes in the information state for each ticker.

Return type:

Dict[str, pd.DataFrame]

sparse_to_dense(isc, value_column, min_period, max_period, postfix=None, metrics=['eop', 'grading'], thresh=None)[source]#

Convert a dictionary of DataFrames with changes in the information state to a dense DataFrame (QuantamentalDataFrame).

Parameters:

isc (Dict[str, pd.DataFrame]) – A dictionary of DataFrames with the changes in the information state for each ticker.
value_column (str) – The name of the column to use as the value.
min_period (pd.Timestamp) – The minimum period to include in the DataFrame.
max_period (pd.Timestamp) – The maximum period to include in the DataFrame.
postfix (str) – A postfix to append to the xcat column. Default is None.
metrics (Optional[List[str]]) – A list of metrics to include in the DataFrame. Default is [“eop”, “grading”]. Use metrics=None to include all available (non-value) metrics; use metrics=[] to include none (the value column only).
thresh (Union[Tuple[float, float], float]) – A float or a tuple of two floats to winsorise the data to. Default is None. If a single float is provided, it is used for both lower and upper bounds, as (-thresh, thresh). If a tuple is provided, it is used as (thresh[0], thresh[1]).

Returns:

A DataFrame with the dense information state.

Return type:

pd.DataFrame

temporal_aggregator_exponential(df, halflife=5, winsorise=None)[source]#

Temporal aggregator using exponential moving average.

Parameters:

df (QuantamentalDataFrame) – The QuantamentalDataFrame to aggregate.
halflife (int) – The halflife of the exponential moving average.
winsorise (float) – The value to winsorise the data to. Default is None.

Returns:

A QuantamentalDataFrame with the aggregated values.

Return type:

QuantamentalDataFrame

temporal_aggregator_period(isc, start, end, winsorise=10, postfix='_NCSUM')[source]#

Temporal aggregator over periods of changes in the information state.

Parameters:

isc (Dict[str, pd.DataFrame]) – A dictionary of DataFrames with the changes in the information state for each ticker.
start (pd.Timestamp) – The start date of the period to aggregate.
end (pd.Timestamp) – The end date of the period to aggregate.
winsorise (int) – The value to winsorise the data to. Default is 10.
postfix (str) – A postfix to append to the xcat column. Default is “_NCSUM”.

Returns:

A QuantamentalDataFrame with the aggregated values.

Return type:

QuantamentalDataFrame

temporal_aggregator_mean(df, window=21, winsorise=None)[source]#

Temporal aggregator using a rolling mean.

Parameters:

df (QuantamentalDataFrame) – The QuantamentalDataFrame to aggregate.
window (int) – The window size for the rolling mean.
winsorise (float) – The value to winsorise the data to. Default is None.

Returns:

A QuantamentalDataFrame with the aggregated values.

Return type:

QuantamentalDataFrame

class InformationStateChanges(min_period=None, max_period=None)[source]#

Bases: object

Class to hold information state changes for a set of tickers. InformationStateChanges show only data releases where there is an update in the indicator’s value, grading or eop_lag. This offers a more compact representation of the data, where only releases which add information are retained.

Initialize using the from_qdf class method to create an InformationStateChanges object from a QuantamentalDataFrame. The calculate_score method can be used to calculate scores for the information state changes.

Example initialization:

from macrosynergy.download import JPMaQSDownload
from macrosynergy.management import InformationStateChanges

tickers = ["USD_GDPPC_SA", "GBP_GDPPC_SA"]

with JPMaQSDownload(client_id="cl_id", client_secret="cl_secret") as jpmaqs:
    df = jpmaqs.download(tickers=tickers, metrics="all")

isc = InformationStateChanges.from_qdf(df)
usd_gpdppc_isc = isc["USD_GDPPC_SA"]

Parameters:

min_period (pd.Timestamp) – The minimum period to include in the InformationStateChanges object.
max_period (pd.Timestamp) – The maximum period to include in the InformationStateChanges object.

Note

Instantiate using the from_qdf or from_isc_df class methods. This class is subscriptable, i.e. isc[“ticker”] will return the DataFrame for the given ticker.

keys()[source]#

A list of tickers in the InformationStateChanges object.

Returns:: A view of the tickers in the InformationStateChanges object.
Return type:: KeysView

values()[source]#

Extract the DataFrames from the InformationStateChanges object.

Returns:: A view of the DataFrames in the InformationStateChanges object.
Return type:: ValuesView

items()[source]#

Iterate through (ticker, DataFrame) pairs in the InformationStateChanges object.

Returns:: A view of the (ticker, DataFrame) pairs in the InformationStateChanges object.
Return type:: ItemsView

classmethod from_qdf(df, norm=True, annualize_by_release_frequency=None, score_by='diff', zscore_freq_window=3, zscore_freqs_allowed=('D', 'W', 'M', 'Q', 'A'), **kwargs)[source]#

Create an InformationStateChanges object from a QuantamentalDataFrame.

Parameters:

qdf (QuantamentalDataFrame) – The QuantamentalDataFrame to create the InformationStateChanges object from. This dataframe must contain a value column. Additionally, the eop_lag column is required to calculate the correct eop and version information. If not provided, the information state is assumed to be based on the value only. The grading column is optional and will be preserved in the output if provided.
norm (bool) – If True, calculate the score for the information state changes.
annualize_by_release_frequency (bool) – If True, annualize the score by the inferred release frequency. Default is None, where it follows the behaviour of norm (i.e. annualize_by_release_frequency is set to True if norm is True and False otherwise).
score_by (str) – The method to use for scoring. If “diff” (default), the score is calculated based on the difference between the information state changes. If “level”, the score is calculated based on the value (‘level’) of the information state change.
zscore_freq_window (int) – rolling-median window passed to infer_release_frequency as part of annualize_by_release_frequency. Default 3.
zscore_freqs_allowed (Tuple[str, ...]) – candidate frequency labels for infer_release_frequency as part of annualize_by_release_frequency. Default (“D”, “W”, “M”, “Q”, “A”).
**kwargs (Any) – Additional keyword arguments to pass to the calculate_score method. Please refer to InformationStateChanges.calculate_score() for more information.

Returns:

An InformationStateChanges object.

Return type:

InformationStateChanges

classmethod from_isc_df(df, ticker, value_column='value', eop_column='eop', grading_column='grading', real_date_column='real_date', norm=True, **kwargs)[source]#

Create an InformationStateChanges object from a DataFrame.

Parameters:

df (pd.DataFrame) – The DataFrame to create the InformationStateChanges object from.
ticker (str) – The ticker to create the InformationStateChanges object for.
value_column (str) – The name of the column to use as the value.
eop_column (str) – The name of the column to use as the end of period date.
grading_column (str) – The name of the column to use as the grading.
real_date_column (str) – The name of the column to use as the real date.
norm (bool) – If True, calculate the score for the information state changes.
**kwargs (Any) – Additional keyword arguments to pass to the calculate_score Please refer to InformationStateChanges.calculate_score() for more information.

Returns:

An InformationStateChanges object.

Return type:

InformationStateChanges

to_qdf(value_column='value', postfix=None, metrics=['eop', 'grading'], thresh=None)[source]#

Convert the InformationStateChanges object to a QuantamentalDataFrame.

Parameters:

value_column (str) – The name of the column to use as the value.
postfix (str) – A postfix to append to the xcat column. Default is None.
metrics (List[str]) – A list of metrics to include in the DataFrame. Default is [“eop”, “grading”]. Use metrics=None to include all available (non-value) metrics; use metrics=[] to include none (the value column only).
thresh (Union[Tuple[float, float], float]) – A float or a tuple of two floats to winsorise the data to. Default is None. If a single float is provided, it is used for both lower and upper bounds, as (-thresh, thresh). If a tuple is provided, it is used as (thresh[0], thresh[1]).

Returns:

A DataFrame with the information state changes.

Return type:

pd.DataFrame

annualize_by_release_frequency(zscore_freq_window=3, zscore_freqs_allowed=('D', 'W', 'M', 'Q', 'A'), thresh=None)[source]#

Annualize each value by a time-varying weight inferred from its release cadence.

Multiplies each value by sqrt(1 / ANNUALIZATION_FACTORS[freq]), where freq is the contemporaneous release frequency inferred per observation from the eop cadence (see infer_release_frequency()). The weight is time-varying: a series whose cadence changes (e.g. quarterly -> monthly) is weighted quarterly before the break and monthly after it.

Parameters:

zscore_freq_window (int) – rolling-median window passed to infer_release_frequency. Default 3.
zscore_freqs_allowed (Tuple[str, ...]) – candidate frequency labels. Default (“D”, “W”, “M”, “Q”, “A”).
thresh (Union[Tuple[float, float], float]) – Winsorise the zscore before weighting. Default None (no winsorisation). A scalar clips to (-thresh, thresh); a tuple clips to its (min, max), so order does not matter.

Notes

Tickers without a zscore column, or whose release frequency cannot be inferred (fewer than two distinct eop dates), are warned about and skipped rather than raising - so this stays safe on the default from_qdf(norm=True) path even when some tickers have too few releases to weight.

Return type:: QuantamentalDataFrame

to_dict(ticker)[source]#

Return type:: Dict[str, Union[List[Tuple[str, float, str, float]], Tuple[str, str, str], str]]

to_json(ticker)[source]#

Return type:: str

get_releases(from_date=Timestamp('2026-07-09 00:00:00'), to_date=Timestamp('2026-07-10 00:00:00'), excl_xcats=None, latest_only=True)[source]#

Get the latest releases for the InformationStateChanges object.

Parameters:

from_date (pd.Timestamp) – The start date of the period to get releases for.
to_date (pd.Timestamp) – The end date of the period to get releases for.
excl_xcats (List[str]) – A list of xcats to exclude from the releases.
latest_only (bool) – If True, only the latest release for each ticker is returned. Default is True.

Returns:

A DataFrame with the latest releases for each ticker. If latest_only is False, all releases within the date range are returned.

Return type:

pd.DataFrame

temporal_aggregator_period(winsorise=10, start=None, end=None)[source]#

Temporal aggregator over periods of changes in the information state.

Parameters:

winsorise (int) – The value to winsorise the data to. Default is 10.
start (pd.Timestamp) – The start date of the period to aggregate.
end (pd.Timestamp) – The end date of the period to aggregate.

Returns:

A QuantamentalDataFrame with the aggregated values.

Return type:

QuantamentalDataFrame

calculate_score(std='std', halflife=None, min_periods=10, isc_version=0, iis=False, custom_method=None, custom_method_kwargs={}, volatility_forecast=True, score_by='diff')[source]#

Calculate score on sparse indicator for the InformationStateChanges object.

Parameters:

std (str) – The method to use for calculating the standard deviation. Supported methods are std, abs, exp and exp_abs. See the documentation for StandardDeviationMethods for more information.
halflife (int) – The halflife of the exponential weighting. Only used with exp and exp_abs methods. Default is None.
min_periods (int) – The minimum number of periods required for the calculation. Default is 10.
isc_version (int) – The version of the information state changes to use. If set to 0 (default), only the first version is used. If set to any other positive integer, all versions are used.
iis (bool) – if True (default) zn-scores are also calculated for the initial sample period defined by min_periods, on an in-sample basis, to avoid losing history.
custom_method (Callable) – A custom method to use for calculating the standard deviation. Must have the signature custom_method(s: pd.Series, **kwargs) -> pd.Series.
custom_method_kwargs (Dict) – Keyword arguments to pass to the custom method.
volatility_forecast (bool) – If True (default), the volatility forecast is shifted one period forward to align with the information state changes.
score_by (str) – The method to use for scoring. If “diff” (default), the score is calculated based on the difference between the information state changes. If “level”, the score is calculated based on the value (‘level’) of the information state change.

Returns:

The InformationStateChanges object with the scores

Return type:

InformationStateChanges

expanding_mean_with_nan(dfw, absolute=False)[source]#

Calculate the expanding mean of a DataFrame’s values across rows, handling NaN values.

This function computes the expanding (cumulative) mean of all elements in the DataFrame dfw, row-by-row. NaN values are ignored in the summation, ensuring they do not affect the calculation. If absolute is set to True, it uses the absolute values of elements for the expanding mean calculation. The function returns a list of expanding mean values, with each element corresponding to the expanding mean up to that row.

Parameters:

dfw (pd.DataFrame) – A DataFrame with a datetime index (or convertible to datetime) and numeric data across its columns. The index is expected to represent timestamps.
absolute (bool, optional) – If True, computes the expanding mean using the absolute values of the DataFrame’s elements, by default False.

Returns:

A list containing the expanding mean for each row of the DataFrame.

Return type:

List[np.float64]

Raises:

TypeError – If dfw is not a DataFrame, if its index cannot be converted to timestamps, or if absolute is not a boolean.

ewm_sum(df, halflife)[source]#

Compute the exponentially weighted moving sum of a DataFrame.

Parameters:

df (pd.DataFrame) – DataFrame in the wide format for which to calculate weights.
halflife (Number) – The halflife of the exponential decay.

calculate_cumulative_weights(df, halflife)[source]#

Calculate the cumulative moving exponential weights for a DataFrame.

Parameters:

df (pd.DataFrame) – DataFrame in the wide format for which to calculate weights.
halflife (Number) – The halflife of the exponential decay.

concat_categorical(df1, df2)[source]#

Concatenate two DataFrames with categorical columns. The dtypes of the of the second DataFrame will be cast to the dtypes of the first. The columns of the DataFrames must be identical.

Parameters:

df1 (pd.DataFrame) – The first DataFrame.
df2 (pd.DataFrame) – The second DataFrame.

Returns:

The concatenated DataFrame with the same columns as the input.

Return type:

pd.DataFrame

forward_fill_wide_df(df, blacklist=None, n=1)[source]#

Forward fills NaN values in a wide DataFrame using the last valid value in each column. It will not forward fill gaps in the data, only the next n periods after the last valid value.

Parameters:

df (pd.DataFrame) – The DataFrame to be forward filled in wide format, where each column represents a cross-section and the index are dates.
blacklist (dict, optional) – A dictionary where keys are column names and values are lists of two elements, representing the start and end dates of periods to be excluded from filling.
n (int, optional) – The number of periods to fill forward. Default is 1, meaning only the next period

infer_release_frequency(eop, window=3, freqs=('D', 'W', 'M', 'Q', 'A'))[source]#

Classify the release frequency of each observation from its local eop cadence.

The gap (in days) between consecutive distinct eop dates is smoothed with a rolling median (window, min_periods=1) and snapped to the nearest supported frequency by log-distance to the reference period length (365.25 / ANNUALIZATION_FACTORS). Observations sharing an eop (revisions) inherit that period’s frequency.

Parameters:

eop (pd.Series) – per-observation end-of-period dates (datetime); the index is preserved.
window (int) – rolling-median window over distinct-eop gaps. Default 3.
freqs (Tuple[str, ...]) – candidate frequency labels. Default (“D”, “W”, “M”, “Q”, “A”).

Returns:

per-observation frequency labels, aligned to the input index.

Return type:

pd.Series

Raises:

ValueError – if there are fewer than two distinct eop dates, so no gap can be computed to estimate a release frequency.

macrosynergy.management.utils#

Submodules#