Modeling

Modeling Technique

The Grid Planners are looking for a data driven forecast with an uncertainty indication as mentioned before. This is conceptualised in the figure below.

_images/autumn_spring_load.png

A possible output of the model visualized.

Input Features

The Grid Planners want to have a result that is based on real observed data. Or to make it more explicit; data that has been measured and not data from possible scenarios with a wide range of uncertainty.

Another suggestion was made to use (historic) weather data as an exogenous feature / input for the model. However, this is not data we have available with a reasonable confidence for the forecast horizon (in the future). And seasonality can also be captured from the observed data itself for different time series models.

Therefore, the model only uses historic DALI measurement data.

Fit-Predict Process

A model will be fitted / tuned for each DALI measurement extreme since every transformer has its own unique signal (src.forecast.forecast.determine_estimates()).

A model is trained on observed data, a forecast is made and stored and finally the trained model is discarded (src.forecast.forecast.forecast()).

_images/process_modeling.png

The weekly extremes are loaded for tuning the model. The model forecasts are stored in a Snowflake database afterwards.

The next time a forecast is required, new data is available and a new model will be fitted and used for forecasting.

Model Environment

A probabilistic approach has been implemented to fulfill the wish to forecast and display uncertainty rather than a point estimate.

From the main probabilistic toolboxes (PyMC3, STAN, EDWARD2, Pyro, TensorFlow2 Probability) PyMC3 was used for its extensive documentation.

A future step could be to transfer the model into TensorFlow2 Probability since PyMC3 is not the most recent toolbox anymore.

Generalized Additive Model Concept

To address the explainability of the model, a similar approach as Facebook Prophet is used.

The model is a Generalized Additive Model (GAM) that consists of a trend / drift and seasonality component (and an error component).

The translation to PyMC3 of Richie Vink was used as a basis.

The GAM approach makes it easy to decompose the different components of the time series and show the Grid Planners the effect of the drift and seasons separately.

Test Design

To evaluate the model the observed data is split into a train and test set based in the forecast horizon (src.preprocess.preprocess.split_last()).

After splitting the train set is tested again for not being too short (at least two years: src.preprocess.preprocess.too_short()).

_images/train_test.png

The split of measurement data into train and test data.

The test set is only used to validate forecasting results.

The train set is used to train / fit / tune the model.

Model

Generalized Additive Model

The GAM model used is (src.model.model.create_model()):

\[\sigma_\epsilon \sim Uniform(lower=0, \:upper=1)\]
\[\Sigma\:|\:drift, yearly, \sigma_\epsilon = Normal(\mu=drift + yearly, \:sd=\sigma_\epsilon)\]

The additive naming is explicit in this notation.

The error component has a bandwidth of \(\sigma_\epsilon\).

Drift Component

According to the Grid Planners an increasing growth is more and more common due to the energy transition.

Therefore, a stable drift model (src.model.model.drift_model()) is used that can mimic that.

An exponential function resulted in divergence during the model tuning, but a second order taylor series makes the model convergent and stable.

The drift component model with a taylor series with the order of \(n\) is described as:

\[X_{drift}(t) = [t^0, ..., t^n]\]
\[\beta_{drift} \sim Normal(\mu=0, \:sd=0.5)\]
\[drift\:|\:\beta_{drift} = X_{drift}(t)\:\beta_{drift}\]

For modelling a drift that has the described growth, a polynomial with order \(n=2\) is used.

Yearly Component

Since the data has been aggregated into weekly extremes, the only seasonality to model is the yearly pattern.

The yearly seasonality is modeled with \(n\) order fourier series (src.model.model.seasonality_model()).

This is based on the work of Richie Vink.

The yearly seasonality model is described as:

\[X_{yearly}(t) = [cos(\frac{2 \pi 1 t}{T}), ..., sin(\frac{2 \pi n t}{T})]\]
\[\beta_{yearly} \sim Normal(\mu=0, \:sd=1)\]
\[drift\:|\:\beta_{yearly} = X_{yearly}(t)\:\beta_{yearly}\]

Here the \(T\) is the period of the seasonality in unit of time of the data.

The unit of time in this case is a week for the data and a year in weeks is \(T=52.1775\).

The order taken for the fourier series is \(n=5\).

Enabling Forecasts

The model parameters (\(\beta\))’s can now be tuned to produce the most likely model that produces the observed (measurement) data.

To forecast, the model also needs to produce samples beyond the timestamps it has been tuned on.

The PyMC3 model can cope with this by feeding it with timestamps that are extrapolated for the forecasting horizon (src.preprocess.preprocess.extrapolate_timestamps()).

The matching observations (measurements) can be intentionally filled with NaN’s. In the model PyMC3 will name them \(\Sigma_{missing}\). This characteristic also makes the model robust against missing data.

By sampling the posterior predictive after tuning, also samples are generated for the extrapolated forecast timestamps (src.forecast.forecast.determine_estimates()).

Total model Σ

_images/graph_model.png

The total model visualized.

Two separate GAM models \(\Sigma\) (src.model.model.create_model()) are used for the weekly minimum and maximum.

The visual above shows the total GAM model with a polynomial drift order \(n=2\) (the bias of order 0 explains \(n+1=3\)) and a fourier order of \(n=5\) (the sine and cosine parts explain \(n*2=10\)).

The number of observations (weeks of measurements for this case) is 121 and the forecasting horizon is just more than six months (27 weeks).

Formatting Results

From the model posterior predictive samples are drawn for all timestamps (also measurement timestamps, 1000 samples per timestamp).

From the posterior samples, the quantile bands are determined (src.forecast.format.make_quantile_bands()). This reduces the data storage.

The upper and lower limits of the quantile bands are then stored in the same format as the input (src.forecast.format.format_model_estimates()).

The input of the model and the output are then concatenated together. This eases the visualization later.

_images/data_stored.png

The concatenated input and result.

Model Assessment

The following model findings are most salient:

  • The model converges during tuning and gives feasible results.

    • Exponential drift function tuning will not converge.

  • The computational burden on a CPU to tune and forecast both extremes is 1:24.

    • CPU: 2 GHz Quad-Core Intel Core i5

    • RAM: 16 GB

  • The model is fairly insensitive to outliers and missing data.

  • The splitting of observations into train and test set works.

  • The extrapolation with the forecasting horizon works.

  • An pure additive model may not be sufficient.

    • Growth also increases the yearly component (see visualization below).

    • A pure multiplicative diverges.

    • A hybrid model (addition of a small fraction of a multiplicative model) might be an option.

A visualization of the results is shown in the figure below which shows most of the aforementioned points:

_images/additive_model.png

An visualization of the measurements (history) and forecast (estimates). Measurements from the train and test set are plotted.

Improvement suggestions

The following ideas could result in a better model:

  • Implementing a hybrid additive-multiplicative model for dealing with the growing seasonality.

  • Adding an extra component to detect temporary bypass switching of loads of other transformers.

    This could be implemented by estimating parameters of a rectangular function.

  • Making more recent observations more relevant for slowly changing loading patterns.

    Possibilities are to mimic weights with pm.Potential or pm<distribution>(tau=weights).

  • Using the population seasonality as a prior in case of a short history of observations.

  • Using a by-pass dummy model for outlier robustness.