add_equity_data
site.SiteProblem.add_equity_data(
equity_data,
equity_col,
common_col,
label,
direction='higher_is_worse',
continuous_measure=False,
n_bins=10,
reverse=False,
verbose=True,
)Add a dataframe containing equity data into your problem.
This method associates demand points with an equity metric (such as the Index of Multiple Deprivation). If a continuous measure is provided, it is automatically discretized into deciles (or maximum possible quantiles) to facilitate categorical plotting and comparative equity analysis.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| equity_data | str, pandas.DataFrame, or geopandas.GeoDataFrame | The input data containing the equity metrics. Can be a filepath or an already loaded dataframe object. | required |
| equity_col | str | The name of the column in equity_data containing the equity values or categories to be used. |
required |
| common_col | str | The name of the ID column used to join this data to the primary demand/spatial data in the SiteProblem. | required |
| label | str | A human-readable label for the equity metric (e.g., ‘IMD Decile’, ‘Age Group’). This is used internally for auto-generating plot titles and table headers. | required |
| direction | (higher_is_better, higher_is_worse) | Indicates whether higher values of equity_col represent a more or less advantaged group. This is stored as metadata and applied at analysis time — it does not modify the stored data. - "higher_is_better" : higher values indicate a more favourable equity position (e.g. IMD decile 10 = least deprived under the standard DLUHC 1–10 scale). - "higher_is_worse" : higher values indicate greater disadvantage (e.g. raw IMD score, where a higher score means more deprived; or a custom scale where 1 = least deprived). .. note:: IMD deciles as published by DLUHC run 1 (most deprived) to 10 (least deprived), so for pre-binned IMD decile columns use direction="higher_is_better". For raw IMD scores (higher = more deprived) use direction="higher_is_worse". |
"higher_is_better" |
| continuous_measure | bool | If True, treats equity_col as continuous numerical data and uses quantile-based discretization to convert it into deciles (1-10). The raw continuous data is preserved in a new column named {equity_col}_raw. |
False |
| reverse | bool | Only used when continuous_measure=True. Controls the direction of bin labelling relative to the raw values: - False (default): lower raw values receive lower bin numbers. - True: lower raw values receive higher bin numbers (i.e. the labelling is inverted). This is purely a binning convenience — for instance, to convert a raw IMD score (where lower = less deprived) into a decile where 1 = least deprived. It is independent of direction, which governs downstream analysis rather than how bins are labelled. |
False |
| verbose | bool | If True, output additional warnings and messages | True |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If continuous_measure is True but the data cannot be meaningfully binned due to too many identical values. |
Notes
When continuous_measure is True, pandas.qcut is used with duplicates='drop'. If the data is highly skewed with duplicate values, this may result in fewer than 10 bins. The method handles this dynamically to ensure the resulting categories always start at 1.