Getting Started

Pre-requisites

In addition to the installation of Python packages, some non-Python packages are required too. Right now these are:

  • Docker: Docker is used to provide a PostgreSQL database (in the default case).

    Docker provides extensive installation instruction. Best you consult their docs and choose the appropriate install method for your OS.

    Docker is not required if you use a local PostreSQL installation.

  • The psql executable. On Ubuntu, this is provided by the postgresql-client-common package.

  • Header files for the libpq5 PostgreSQL library. These are necessary to build the psycopg2 package from source and are provided by the libpq-dev package on Ubuntu.

  • osm2pgsql On recent Ubuntu version you can install it via sudo apt install osm2pgsql.

  • postgis On recent Ubuntu version you can install it via sudo apt install postgis.

  • osmTGmod resp. osmosis needs java. On recent Ubuntu version you can install it via sudo apt install default-jre and sudo apt install default-jdk.

  • conda is needed for the subprocess of running pypsa-eur-sec. For the installation of miniconda, check out the conda installation guide.

  • pypsa-eur-sec resp. Fiona needs the additional library libtbb2. On recent Ubuntu version you can install it via sudo apt install libtbb2

  • gdal On recent Ubuntu version you can install it via sudo apt install gdal-bin.

  • curl is required. You can install it via sudo apt install curl.

  • To download ERA5 weather data you need to register at the CDS registration page and install the CDS API key as described here You also have to agree on the terms of use

  • Make sure you have enough free disk space (~350 GB) in your working directory.

Installation

Since no release is available on PyPI and installations are probably used for development, cloning it via

git clone git@github.com:openego/eGon-data.git

and installing it in editable mode via

pip install -e eGon-data/

are recommended.

In order to keep the package installation isolated, we recommend installing the package in a dedicated virtual environment. There’s both, an external tool and a builtin module which help in doing so. I also highly recommend spending the time to set up virtualenvwrapper to manage your virtual environments if you start having to keep multiple ones around.

If you run into any problems during the installation of egon.data, try looking into the list of known installation problems we have collected. Maybe we already know of your problem and also of a solution to it.

Run the workflow

The egon.data package installs a command line application called egon-data with which you can control the workflow so once the installation is successful, you can explore the command line interface starting with egon-data --help.

The most useful subcommand is probably egon-data serve. After running this command, you can open your browser and point it to localhost:8080, after which you will see the web interface of Apache Airflow with which you can control the \(eGo^n\) data processing pipeline.

If running egon-data results in an error, we also have collected a list of known runtime errors, which can consult in search of a solution.

To run the workflow from the CLI without using egon-data serve you can use

egon-data airflow scheduler
egon-data airflow dags trigger egon-data-processing-pipeline

For further details how to use the CLI see Apache Airflow CLI Reference.

Warning

A complete run of the workflow might require much computing power and can’t be run on laptop. Use the test mode for experimenting.

Warning

A complete run of the workflow needs loads of free disk space (~350 GB) to store (temporary) files.

Test mode

The workflow can be tested on a smaller subset of data on example of the federal state of Schleswig-Holstein. Data is reduced during execution of the workflow to represent only this area.

Warning

Right now, the test mode is set in egon.data/airflow/pipeline.py.