macrosynergy.panel.panel_imputer#
- class BasePanelImputer(df, xcats, cids, start=None, end=None, min_cids=None, postfix='F')[source]#
Bases:
objectBase class for imputing missing values in a panel DataFrame. Defines an overall structure for how the imputation should be performed, without the imputation method. Separate subclasses should be created for each imputation method, which will have a defined impute technique.
- Parameters:
df (DataFrame) – DataFrame containing the panel data.
xcats (List[str]) – List of extended categories.
cids (List[str]) – List of cross sections.
start (str) – Start date in ISO format.
end (str) – End date in ISO format.
min_cids (int) – Minimum number of cross sections required to perform imputation on a specific real date. Default is len(cids) // 2.
postfix (str) – Postfix to add to the extended categories after imputation. Default is “F”.
- get_impute_function(group)[source]#
Abstract method that should be implemented in a subclass. Defines the imputation technique to be used on a group of values.
- generate_blacklist(diff)[source]#
Generates a dictionary of cross sections and dates that have been imputed in the same format as a blacklist dictionary. For each cross section it stores a date range where imputation has been performed.
- Parameters:
diff (DataFrame) – DataFrame containing the differences between the original and imputed DataFrames.
- class MeanPanelImputer(df, xcats, cids, start=None, end=None, min_cids=None, postfix='F')[source]#
Bases:
BasePanelImputerImputer class that fills missing values with the global cross-sectional mean. If the group has less than min_cids non-missing values, the group is left as is.
- class MedianPanelImputer(df, xcats, cids, start=None, end=None, min_cids=None, postfix='F')[source]#
Bases:
BasePanelImputerImputer class that fills missing values with the global cross-sectional median. If the group has less than min_cids non-missing values, the group is left as is.