mastr
Download Marktstammdatenregister (MaStR) from Zenodo.
- extract_and_preprocess_mastr()[source]
Extract the downloaded MaStR dump and create cleaned, schema-aligned CSVs.
This routine expects a MaStR ZIP archive (downloaded by
download_mastr_data()) to be present inWORKING_DIR_MASTR_NEW. It unpacks the archive, reads the raw CSV files shipped in the dump, applies a set of harmonization steps (column renaming, categorical normalization, data enrichments), and writes cleaned CSVs. The function performs the following steps:Locate and extract the MaStR ZIP
Read raw CSVs from the extracted dump folder
bnetza_mastr_wind_raw.csv,bnetza_mastr_solar_raw.csv,bnetza_mastr_biomass_raw.csv,bnetza_mastr_hydro_raw.csv,bnetza_mastr_gsgk_raw.csv,bnetza_mastr_storage_raw.csv,bnetza_mastr_combustion_raw.csv,bnetza_mastr_nuclear_raw.csv,bnetza_mastr_locations_extended_raw.csv,bnetza_mastr_grid_connections_raw.csv.Voltage-level enrichment for locations
Solar-specific fixes
Common harmonization across technologies
Write cleaned outputs (UTF-8, no index) to
WORKING_DIR_MASTR_NEW-bnetza_mastr_wind_cleaned.csv-bnetza_mastr_solar_cleaned.csv-bnetza_mastr_biomass_cleaned.csv-bnetza_mastr_hydro_cleaned.csv-bnetza_mastr_gsgk_cleaned.csv-bnetza_mastr_storage_cleaned.csv-bnetza_mastr_combustion_cleaned.csv-bnetza_mastr_nuclear_cleaned.csv
- Returns:
None – Results are written to disk as CSV files (see list above).
- class mastr_data_setup(dependencies)[source]
Bases:
DatasetDownload Marktstammdatenregister (MaStR) from Zenodo.
- Dependencies
The downloaded data incorporates two different datasets:
- Dump 2021-04-30
Used technologies: PV plants, wind turbines, biomass, hydro plants, combustion, nuclear, gsgk, storage
Data is further processed in the
PowerPlantsdataset
- Dump 2022-11-17
Used technologies: PV plants, wind turbines, biomass, hydro plants
Data is further processed in module
mastrandPowerPlants
See documentation section Marktstammdatenregister for more information.
- name: str = 'MastrData'
- sources: DatasetSources = DatasetSources(tables={}, files={}, urls={'mastr': {'zenodo': {'deposit_id': '14783581', 'file_basename': 'bnetza_mastr', 'dump_name': 'bnetza_open_mastr_2025-02-09', 'technologies': ['biomass', 'combustion', 'gsgk', 'hydro', 'nuclear', 'solar', 'storage', 'wind']}}, 'geocoding': {'dump_name': 'mastr_geocoding_dump_2025-02-09_14783581.gpkg', 'deposit_id': 17279317}})
The sources used by the datasets. Could be tables, files and urls
- targets: DatasetTargets = DatasetTargets(tables={}, files={'mastr': {'download_dir': {'path': './bnetza_mastr/dump_2025-02-09'}}, 'geocoding': 'mastr_geocoding'})
The targets created by the datasets. Could be tables and files
- tasks: Tasks = (<function download_mastr_data>, <function extract_and_preprocess_mastr>, <function download_mastr_geocoding>)
- version: str = '0.0.4'