hh_profiles

Household electricity demand time series for scenarios eGon2035 and eGon100RE at census cell level are set up.

Electricity demand data for households in Germany in 1-hourly resolution for an entire year. Spatially, the data is resolved to 100 x 100 m cells and provides individual and distinct time series for each household in a cell. The cells are defined by the dataset Zensus 2011.

class EgonDestatisZensusHouseholdPerHaRefined(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Class definition of table society.egon_destatis_zensus_household_per_ha_refined.

cell_id
characteristics_code
grid_id
hh_10types
hh_5types
hh_type
id
nuts1
nuts3
class EgonEtragoElectricityHouseholds(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Class definition of table demand.egon_etrago_electricity_households.

The table contains household electricity demand profiles aggregated at MV grid district level in MWh.

bus_id
p_set
q_set
scn_name
class HouseholdDemands(dependencies)[source]

Bases: egon.data.datasets.Dataset

Household electricity demand time series for scenarios eGon2035 and eGon100RE at census cell level are set up.

Electricity demand data for households in Germany in 1-hourly resolution for an entire year. Spatially, the data is resolved to 100 x 100 m cells and provides individual and distinct time series for each household in a cell. The cells are defined by the dataset Zensus 2011.

Dependencies
Resulting tables

The following datasets are used for creating the data:

  • Electricity demand time series for household categories produced by demand profile generator (DPG) from Fraunhofer IEE (see get_iee_hh_demand_profiles_raw())

  • Spatial information about people living in households by Zensus 2011 at federal state level

    • Type of household (family status)
    • Age
    • Number of people
  • Spatial information about number of households per ha, categorized by type of household (family status) with 5 categories (also from Zensus 2011)

  • Demand-Regio annual household demand at NUTS3 level

What is the goal?

To use the electricity demand time series from the demand profile generator to created spatially reference household demand time series for Germany at a resolution of 100 x 100 m cells.

What is the challenge?

The electricity demand time series produced by demand profile generator offer 12 different household profile categories. To use most of them, the spatial information about the number of households per cell (5 categories) needs to be enriched by supplementary data to match the household demand profile categories specifications. Hence, 10 out of 12 different household profile categories can be distinguished by increasing the number of categories of cell-level household data.

How are these datasets combined?

  • Spatial information about people living in households by zensus (2011) at federal state NUTS1 level df_zensus is aggregated to be compatible to IEE household profile specifications.

    • exclude kids and reduce to adults and seniors
    • group as defined in HH_TYPES
    • convert data from people living in households to number of households by mapping_people_in_households
    • calculate fraction of fine household types (10) within subgroup of rough household types (5) df_dist_households
  • Spatial information about number of households per ha df_census_households_nuts3 is mapped to NUTS1 and NUTS3 level. Data is refined with household subgroups via df_dist_households to df_census_households_grid_refined.

  • Enriched 100 x 100 m household dataset is used to sample and aggregate household profiles. A table including individual profile id’s for each cell and scaling factor to match Demand-Regio annual sum projections for 2035 and 2050 at NUTS3 level is created in the database as demand.household_electricity_profiles_in_census_cells.

What are central assumptions during the data processing?

  • Mapping zensus data to IEE household categories is not trivial. In conversion from persons in household to number of households, number of inhabitants for multi-person households is estimated as weighted average in OO_factor
  • The distribution to refine household types at cell level are the same for each federal state
  • Refining of household types lead to float number of profiles drew at cell level and need to be rounded to nearest int by np.rint().
  • 100 x 100 m cells are matched to NUTS via cells centroid location
  • Cells with households in unpopulated areas are removed

Drawbacks and limitations of the data

  • The distribution to refine household types at cell level are the same for each federal state
  • Household profiles aggregated annual demand matches Demand Regio demand at NUTS-3 level, but it is not matching the demand regio time series profile
  • Due to secrecy, some census data are highly modified under certain attributes (quantity_q = 2). This cell data is not corrected, but excluded.
  • There is deviation in the Census data from table to table. The statistical methods are not stringent. Hence, there are cases in which data contradicts.
  • Census data with attribute ‘HHTYP_FAM’ is missing for some cells with small amount of households. This data is generated using the average share of household types for cells with similar household number. For some cells the summed amount of households per type deviates from the total number with attribute ‘INSGESAMT’. As the profiles are scaled with demand-regio data at nuts3-level the impact at a higher aggregation level is negligible. For sake of simplicity, the data is not corrected.
  • There are cells without household data but a population. A randomly chosen household distribution is taken from a subgroup of cells with same population value and applied to all cells with missing household distribution and the specific population value.

Helper functions * To access the DB, select specific profiles at various aggregation levels

name = 'Household Demands'
version = '0.0.10'
class HouseholdElectricityProfilesInCensusCells(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Class definition of table demand.egon_household_electricity_profile_in_census_cell.

Lists references and scaling parameters of time series data for each household in a cell by identifiers. This table is fundamental for creating subsequent data like demand profiles on MV grid level or for determining the peak load at load area level.

cell_id
cell_profile_ids
factor_2035
factor_2050
grid_id
nuts1
nuts3
class IeeHouseholdLoadProfiles(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Class definition of table demand.iee_household_load_profiles.

id
load_in_wh
type
adjust_to_demand_regio_nuts3_annual(df_hh_profiles_in_census_cells, df_iee_profiles, df_demand_regio)[source]

Computes the profile scaling factor for alignment to demand regio data

The scaling factor can be used to re-scale each load profile such that the sum of all load profiles within one NUTS-3 area equals the annual demand of demand regio data.

Parameters:
  • df_hh_profiles_in_census_cells (pd.DataFrame) – Result of assign_hh_demand_profiles_to_cells().
  • df_iee_profiles (pd.DataFrame) – Household load profile data
    • Index: Times steps as serial integers
    • Columns: pd.MultiIndex with (HH_TYPE, id)
  • df_demand_regio (pd.DataFrame) – Annual demand by demand regio for each NUTS-3 region and scenario year. Index is pd.MultiIndex with tuple(scenario_year, nuts3_code).
Returns:

pd.DataFrame – Returns the same data as assign_hh_demand_profiles_to_cells(), but with filled columns factor_2035 and factor_2050.

assign_hh_demand_profiles_to_cells(df_zensus_cells, df_iee_profiles)[source]

Assign household demand profiles to each census cell.

A table including the demand profile ids for each cell is created by using get_cell_demand_profile_ids(). Household profiles are randomly sampled for each cell. The profiles are not replaced to the pool within a cell but after.

Parameters:
  • df_zensus_cells (pd.DataFrame) – Household type parameters. Each row representing one household. Hence, multiple rows per zensus cell.
  • df_iee_profiles (pd.DataFrame) – Household load profile data
    • Index: Times steps as serial integers
    • Columns: pd.MultiIndex with (HH_TYPE, id)
Returns:

pd.DataFrame – Tabular data with one row represents one zensus cell. The column cell_profile_ids contains a list of tuples (see get_cell_demand_profile_ids()) providing a reference to the actual load profiles that are associated with this cell.

clean(x)[source]

Clean zensus household data row-wise

Clean dataset by

  • converting ‘.’ and ‘-’ to str(0)
  • removing brackets

Table can be converted to int/floats afterwards

Parameters:x (pd.Series) – It is meant to be used with df.applymap()
Returns:pd.Series – Re-formatted data row
create_missing_zensus_data(df_households_typ, df_missing_data, missing_cells)[source]

Generate missing data as average share of the household types for cell groups with the same amount of households.

There is missing data for specific attributes in the zensus dataset because of secrecy reasons. Some cells with only small amount of households are missing with attribute HHTYP_FAM. However the total amount of households is known with attribute INSGESAMT. The missing data is generated as average share of the household types for cell groups with the same amount of households.

Parameters:
  • df_households_typ (pd.DataFrame) – Zensus households data
  • df_missing_data (pd.DataFrame) – number of missing cells of group of amount of households
  • missing_cells (dict) – dictionary with list of grids of the missing cells grouped by amount of households in cell
Returns:

df_average_split (pd.DataFrame) – generated dataset of missing cells

get_cell_demand_metadata_from_db(attribute, list_of_identifiers)[source]

Retrieve selection of household electricity demand profile mapping

Parameters:
  • attribute (str) – attribute to filter the table
    • nuts3
    • nuts1
    • cell_id
  • list_of_identifiers (list of str/int) – nuts3/nuts1 need to be str cell_id need to be int
Returns:pd.DataFrame – Selection of mapping of household demand profiles to zensus cells
get_cell_demand_profile_ids(df_cell, pool_size)[source]

Generates tuple of hh_type and zensus cell ids

Takes a random sample of profile ids for given cell:
  • if pool size >= sample size: without replacement
  • if pool size < sample size: with replacement
Parameters:
  • df_cell (pd.DataFrame) – Household type information for a single zensus cell
  • pool_size (int) – Number of available profiles to select from
Returns:

list of tuple – List of (hh_type, cell_id)

get_census_households_grid()[source]

Retrieves and adjusts census household data at 100x100m grid level, accounting for missing or divergent data.

Query census household data at 100x100m grid level from database. As there is a divergence in the census household data depending which attribute is used. There also exist cells without household but with population data. The missing data in these cases are substituted. First census household data with attribute ‘HHTYP_FAM’ is missing for some cells with small amount of households. This data is generated using the average share of household types for cells with similar household number. For some cells the summed amount of households per type deviates from the total number with attribute ‘INSGESAMT’. As the profiles are scaled with demand-regio data at nuts3-level the impact at a higher aggregation level is negligible. For sake of simplicity, the data is not corrected.

Returns:pd.DataFrame – census household data at 100x100m grid level
get_census_households_nuts1_raw()[source]

Get zensus age x household type data from egon-data-bundle

Dataset about household size with information about the categories:

  • family type
  • age class
  • household size

for Germany in spatial resolution of federal states NUTS-1.

Data manually selected and retrieved from: https://ergebnisse2011.zensus2022.de/datenbank/online For reproducing data selection, please do:

  • Search for: “1000A-3016”
  • or choose topic: “Bevölkerung kompakt”
  • Choose table code: “1000A-3016” with title “Personen: Alter (11 Altersklassen) - Größe des privaten Haushalts - Typ des privaten Haushalts (nach Familien/Lebensform)”
  • Change setting “GEOLK1” to “Bundesländer (16)”

Data would be available in higher resolution (“Landkreise und kreisfreie Städte (412)”), but only after registration.

The downloaded file is called ‘Zensus2011_Personen.csv’.

Returns:pd.DataFrame – Pre-processed zensus household data
get_hh_profiles_from_db(profile_ids)[source]

Retrieve selection of household electricity demand profiles

Parameters:profile_ids (list of str (str, int)) – (type)a00..(profile number) with number having exactly 4 digits
Returns:pd.DataFrame – Selection of household demand profiles
get_houseprofiles_in_census_cells()[source]

Retrieve household electricity demand profile mapping from database

Returns:pd.DataFrame – Mapping of household demand profiles to zensus cells
get_iee_hh_demand_profiles_raw()[source]

Gets and returns household electricity demand profiles from the egon-data-bundle.

Household electricity demand profiles generated by Fraunhofer IEE. Methodology is described in Erzeugung zeitlich hochaufgelöster Stromlastprofile für verschiedene Haushaltstypen. It is used and further described in the following theses by:

  • Jonas Haack: “Auswirkungen verschiedener Haushaltslastprofile auf PV-Batterie-Systeme” (confidential)
  • Simon Ruben Drauz “Synthesis of a heat and electrical load profile for single and multi-family houses used for subsequent performance tests of a multi-component energy system”, http://dx.doi.org/10.13140/RG.2.2.13959.14248

Notes

The household electricity demand profiles have been generated for 2016 which is a leap year (8784 hours) starting on a Friday. The weather year is 2011 and the heat timeseries 2011 are generated for 2011 too (cf. dataset egon.data.datasets.heat_demand_timeseries.HTS), having 8760h and starting on a Saturday. To align the profiles, the first day of the IEE profiles are deleted, resulting in 8760h starting on Saturday.

Returns:pd.DataFrame – Table with profiles in columns and time as index. A pd.MultiIndex is used to distinguish load profiles from different EUROSTAT household types.
get_load_timeseries(df_iee_profiles, df_hh_profiles_in_census_cells, cell_ids, year, aggregate=True, peak_load_only=False)[source]

Get peak load for one load area in MWh

The peak load is calculated in aggregated manner for a group of zensus cells that belong to one load area (defined by cell_ids).

Parameters:
  • df_iee_profiles (pd.DataFrame) – Household load profile data in Wh

    • Index: Times steps as serial integers
    • Columns: pd.MultiIndex with (HH_TYPE, id)

    Used to calculate the peak load from.

  • df_hh_profiles_in_census_cells (pd.DataFrame) – Return value of adjust_to_demand_regio_nuts3_annual().

  • cell_ids (list) – Zensus cell ids that define one group of zensus cells that belong to the same load area.

  • year (int) – Scenario year. Is used to consider the scaling factor for aligning annual demand to NUTS-3 data.

  • aggregate (bool) – If true, all profiles are aggregated

  • peak_load_only (bool) – If true, only the peak load value is returned (the type of the return value is float). Defaults to False which returns the entire time series as pd.Series.

Returns:

pd.Series or float – Aggregated time series for given cell_ids or peak load of this time series in MWh.

get_scaled_profiles_from_db(attribute, list_of_identifiers, year, aggregate=True, peak_load_only=False)[source]

Retrieve selection of scaled household electricity demand profiles

Parameters:
  • attribute (str) – attribute to filter the table
    • nuts3
    • nuts1
    • cell_id
  • list_of_identifiers (list of str/int) – nuts3/nuts1 need to be str cell_id need to be int
  • year (int) –
    • 2035
    • 2050
  • aggregate (bool) – If True, all profiles are summed. This uses a lot of RAM if a high attribute level is chosen
  • peak_load_only (bool) – If True, only peak load value is returned

Notes

Aggregate == False option can use a lot of RAM if many profiles are selected

Returns:pd.Series or float – Aggregated time series for given cell_ids or peak load of this time series in MWh.
houseprofiles_in_census_cells()[source]

Allocate household electricity demand profiles for each census cell.

Creates table emand.egon_household_electricity_profile_in_census_cell that maps household electricity demand profiles to census cells. Each row represents one cell and contains a list of profile IDs. This table is fundamental for creating subsequent data like demand profiles on MV grid level or for determining the peak load at load area level.

Use get_houseprofiles_in_census_cells() to retrieve the data from the database as pandas.

impute_missing_hh_in_populated_cells(df_census_households_grid)[source]

Fills in missing household data in populated cells based on a random selection from a subgroup of cells with the same population value.

There are cells without household data but a population. A randomly chosen household distribution is taken from a subgroup of cells with same population value and applied to all cells with missing household distribution and the specific population value. In the case, in which there is no subgroup with household data of the respective population value, the fallback is the subgroup with the last last smaller population value.

Parameters:df_census_households_grid (pd.DataFrame) – census household data at 100x100m grid level
Returns:pd.DataFrame – substituted census household data at 100x100m grid level
inhabitants_to_households(df_hh_people_distribution_abs)[source]

Convert number of inhabitant to number of household types

Takes the distribution of peoples living in types of households to calculate a distribution of household types by using a people-in-household mapping. Results are not rounded to int as it will be used to calculate a relative distribution anyways. The data of category ‘HHGROESS_KLASS’ in census households at grid level is used to determine an average wherever the amount of people is not trivial (OR, OO). Kids are not counted.

Parameters:df_hh_people_distribution_abs (pd.DataFrame) – Grouped census household data on NUTS-1 level in absolute values
Returns:df_dist_households (pd.DataFrame) – Distribution of households type
mv_grid_district_HH_electricity_load(scenario_name, scenario_year, drop_table=False)[source]

Aggregated household demand time series at HV/MV substation level

Calculate the aggregated demand time series based on the demand profiles of each zensus cell inside each MV grid district. Profiles are read from local hdf5-file. Creates table demand.egon_etrago_electricity_households with Household electricity demand profiles aggregated at MV grid district level in MWh. Primarily used to create the eTraGo data model.

Parameters:
  • scenario_name (str) – Scenario name identifier, i.e. “eGon2035”
  • scenario_year (int) – Scenario year according to scenario_name
  • drop_table (bool) – Toggle to True for dropping table at beginning of this function. Be careful, delete any data.
Returns:

pd.DataFrame – Multiindexed dataframe with timestep and bus_id as indexers. Demand is given in kWh.

process_nuts1_census_data(df_census_households_raw)[source]

Make data compatible with household demand profile categories

Removes and reorders categories which are not needed to fit data to household types of IEE electricity demand time series generated by demand-profile-generator (DPG).

  • Kids (<15) are excluded as they are also excluded in DPG origin dataset
  • Adults (15<65)
  • Seniors (<65)
Parameters:df_census_households_raw (pd.DataFrame) – cleaned zensus household type x age category data
Returns:pd.DataFrame – Aggregated zensus household data on NUTS-1 level
proportionate_allocation(df_group, dist_households_nuts1, hh_10types_cluster)[source]

Household distribution at nuts1 are applied at census cell within group

To refine the hh_5types and keep the distribution at nuts1 level, the household types are clustered and drawn with proportionate weighting. The resulting pool is splitted into subgroups with sizes according to the number of households of clusters in cells.

Parameters:
  • df_group (pd.DataFrame) – Census household data at grid level for specific hh_5type cluster in a federal state
  • dist_households_nuts1 (pd.Series) – Household distribution of of hh_10types in a federal state
  • hh_10types_cluster (list of str) – Cluster of household types to be refined to
Returns:

pd.DataFrame – Refined household data with hh_10types of cluster at nuts1 level

refine_census_data_at_cell_level(df_census_households_grid, df_census_households_nuts1)[source]

Processes and merges census data to specify household numbers and types per census cell according to IEE profiles.

The census data is processed to define the number and type of households per zensus cell. Two subsets of the census data are merged to fit the IEE profiles specifications. To do this, proportionate allocation is applied at nuts1 level and within household type clusters.

Header:“characteristics_code”, “characteristics_text”, “mapping”

“1”, “Einpersonenhaushalte (Singlehaushalte)”, “SR; SO” “2”, “Paare ohne Kind(er)”, “PR; PO” “3”, “Paare mit Kind(ern)”, “P1; P2; P3” “4”, “Alleinerziehende Elternteile”, “SK” “5”, “Mehrpersonenhaushalte ohne Kernfamilie”, “OR; OO”

Parameters:
  • df_census_households_grid (pd.DataFrame) – Aggregated zensus household data on 100x100m grid level
  • df_census_households_nuts1 (pd.DataFrame) – Aggregated zensus household data on NUTS-1 level
Returns:

pd.DataFrame – Number of hh types per census cell

regroup_nuts1_census_data(df_census_households_nuts1)[source]

Regroup census data and map according to demand-profile types.

For more information look at the respective publication: https://www.researchgate.net/publication/273775902_Erzeugung_zeitlich_hochaufgeloster_Stromlastprofile_fur_verschiedene_Haushaltstypen

Parameters:df_census_households_nuts1 (pd.DataFrame) – census household data on NUTS-1 level in absolute values
Returns:df_dist_households (pd.DataFrame) – Distribution of households type
set_multiindex_to_profiles(hh_profiles)[source]

The profile id is split into type and number and set as multiindex.

Parameters:hh_profiles (pd.DataFrame) – Profiles
Returns:hh_profiles (pd.DataFrame) – Profiles with Multiindex
write_hh_profiles_to_db(hh_profiles)[source]

Write HH demand profiles of IEE into db. One row per profile type. The annual load profile timeseries is an array.

schema: demand tablename: iee_household_load_profiles

Parameters:hh_profiles (pd.DataFrame) – It is meant to be used with df.applymap()
write_refinded_households_to_db(df_census_households_grid_refined)[source]