Workflow

Workflow management

Execution

In principle egon-data is not limited to the use of a specific programming language as the workflow integrates different scripts using Apache Airflow, but Python and SQL are widely used within the process. Apache Airflow organizes the order of execution of processing steps through so-called operators. In the default case the SQL processing is executed on a containerized local PostgreSQL database using Docker. For further information on Docker and its installation please refer to their documentation. Connection information of our local Docker database are defined in the corresponding docker-compose.yml

The egon-data workflow is composed of four different sections: database setup, data import, data processing and data export to the OpenEnergy Platform. Each section consists of different tasks, which are managed by Apache Airflow and correspond with the local database. Only final datasets which function as an input for the optimization tools or selected interim results are uploaded to the Open Energy Platform. The data processing in egon-data needs to be performed locally as calculations on the Open Energy Platform are prohibited. More information on how to run the workflow can be found in the getting started section of our documentation.

Versioning

Warning

Please note, the following is not implemented yet, but we are working on it.

Source code and data are versioned independendly from each other. Every data table uploaded to the Open Energy Platform contains a column ‘version’ which is used to identify different versions of the same data set. The version number is maintained for every table separately. This is a major difference to the versioning concept applied in the former data processing where all (interim) results were versioned under the same version number.