Using NeuralODEs in real life applications
Tutorial by Tobias Thummerer | Last edit: September 24 2024
This workshop was held at the JuliaCon 2023 | July 25 2023 | MIT (Boston, USA)
Keywords: #NeuralODE, #NeuralFMU, #SciML, #PeNODE, #HybridModeling
Workshop Video
License
# Copyright (c) 2023 Tobias Thummerer, Lars Mikelsons
# Licensed under the MIT license.
# See LICENSE (https://github.com/thummeto/FMIFlux.jl/blob/main/LICENSE) file in the project root for details.
# This workshop was held at the JuliaCon2023 @ MIT (Boston)
Introduction
NeuralODEs lead to amazing results in academic examples. But the expectations are often being disappointed as soon as one tries to adapt this concept for real life use cases. Bad convergence behavior, handling of discontinuities and/or instabilities are just some of the stumbling blocks that might pop up during the first steps. During the workshop, we want to show how to integrate real life industrial models in NeuralODEs using FMI and present sophisticated training strategies.
This tutorial can be used in two ways:
- As a single script, showing how a NeuralFMU can be setup and trained. Results can be loaded from a precomputed hyperparameter optimization.
- As a module (see sections Optional: Organize as module) together with the file
juliacon_2023_distributedhyperopt.jl
to perform your own distributed hyperparameter optimization.
This workshops divides into five sections:
- Installing / Loading the Packages
- Loading FMU & Data
- NeuralFMU setup
- Training the NeuralFMU
- Results
1. Installing / Loading the Packages
Before we start modeling our NeuralODE, we load all required packages. If some packages are still missing, install them by typing import Pkg; Pkg.add("[PKG-NAME]")
.
# Loading the required libraries
using FMI # import FMUs into Julia
using FMIFlux # for NeuralFMUs
using FMIZoo # a collection of demo models, including the VLDM
using FMIFlux.Flux # Machine Learning in Julia
using DifferentialEquations: Tsit5 # for picking a NeuralFMU solver
import JLD2 # data format for saving/loading parameters
import Random # for fixing the random seed
using Plots # plotting results
Beside the packages, we use another little script that includes some nice plotting functions specially for this workshop.
# a helper file with some predefined functions to make "things look nicer", but are not really relevant to the topic
include(joinpath(@__DIR__, "juliacon_2023_helpers.jl"));
Because notebooks can't handle progress bars, we disable progress bar printing - but feel free to enable it if you are using the code outside of a jupyter notebook. The progress bar gives further helpful information, like the estimated remaining computation time for simulation and training.
# disable progress bars in jupyter notebook
showProgress=false;
2. Loading FMU & Data
Before starting with hybrid modeling, we load in the used training data and our FMU of the VLDM. We simulate the FMU, plot the results and compare them to data.
2.1 Loading measurement data
We start by loading in the data (training and validation) used in this tutorial from FMIZoo.jl - a container library for different system model FMUs and corresponding data.
Note: There where two measurements done, so data is a mean value with some deviation around (not an exact line).
# FMIZoo.jl supports different data sampling rates (by interpolation), sample length for data is 0.1s
dt = 0.1
# load data (training) from FMIZoo.jl
data = VLDM(:train, dt=dt)
# plot the velocity consumption (training data)
plot(data.speed_t, # the time points the speed was captures (from data)
data.speed_val; # the speeds at the considered time points (from data)
ribbon=data.speed_dev, # a `ribbon` for the speed deviation - so the `uncertainty` because we made two measurements - but don't expect too much to see (very little uncertainty)
fillalpha=0.3, # alpha value for the ribbon
label="Data", # the plot label
title="WLTC (first 40%)", # plot title
xlabel="t [s]", # plot x-label
ylabel="velocity [m/s]") # plot y-label
Further, we load validation data and have a look on it, too.
# load data (validation) from FMIZoo.jl
data_validation = VLDM(:validate, dt=dt)
# plot the velocity consumption (validation data)
plot(data_validation.speed_t, data_validation.speed_val; label="Data", ribbon=data_validation.speed_dev, fillalpha=0.3, title="WLTC (complete)", xlabel="t [s]", ylabel="velocity [m/s]")
Let's extract a simulation starting time tStart
and stopping time tStop
from data - so we simulate as far as data is available. tSave
are the points in time we want our ODE solution being saved later.
# start (`tStart`) and stop time (`tStop`) for simulation, saving time points for ODE solver (`tSave`)
tStart = data.consumption_t[1]
tStop = data.consumption_t[end]
tSave = data.consumption_t
5838-element Vector{Float64}:
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
⋮
582.6
582.7
582.8
582.9
583.0
583.1
583.2
583.3
583.4
583.5
583.6
583.7
So you can see time points are sampled with dt=0.1
as specified and the cycle ranges from $0.0s$ to $583.7s$.
Next is to get a value for the start state x0
, so the initial state to solve the FMU and NeuralFMU.
# get start state vector from data (FMIZoo)
x0 = FMIZoo.getStateVector(data, # the data container
tStart) # the point in time where we want the state
6-element Vector{Float64}:
0.0
0.0
0.0
0.0
0.0
0.0
In this special case, it's all zero, but this is not the default over different system!
Further, we can check for the loaded FMU parameters, that are paths to the used characteristic maps used in the model.
# have a look on the FMU parameters (these are the file paths to the characteristic maps, remaining parameters are set to default by the FMU)
display(data.params)
Dict{String, Any} with 3 entries:
"peFileName" => "C:\\Users\\runneradmin\\.julia\\packages\\FMIZoo\\WtBM9\\src…
"edFileName" => "C:\\Users\\runneradmin\\.julia\\packages\\FMIZoo\\WtBM9\\src…
"dcFileName" => "C:\\Users\\runneradmin\\.julia\\packages\\FMIZoo\\WtBM9\\src…
After that, we load the FMU and have a look on its model meta data.
# load our FMU of the VLDM (we take it from the FMIZoo.jl, exported with Dymola 2020x)
fmu = loadFMU("VLDM", "Dymola", "2020x"; type=:ME)
# let's have a look on the model meta data
info(fmu)
#################### Begin information for FMU ####################
Model name: Longitudinaldynamic.LongitudinaldynamicmodelContinuous
FMI-Version: 2.0
GUID: {669889ab-7ab7-4fac-be92-96b6cd0b86a6}
Generation tool: Dymola Version 2020x (64-bit), 2019-10-10
Generation time: 2022-07-22T09:32:50Z
Var. naming conv.: structured
Event indicators: 28
Inputs: 0
Outputs: 0
States: 6
33554432 ["driver.accelerationPedalController.PI.x"]
33554433 ["driver.brakePedalController.PI.x"]
33554434 ["drivingCycle.s"]
33554435 ["dynamics.accelerationCalculation.integrator.y"]
33554436 ["dynamics.accelerationCalculation.limiter.u", "dynamics.accelerationCalculation.limIntegrator.y", "dynamics.accelerationCalculation.limiter.simplifiedExpr"]
33554437 ["result.integrator.y"]
Parameters: 102
16777216 ["i_gear"]
16777217 ["rho_L"]
16777218 ["vehMass"]
16777219 ["payload"]
...
16777303 ["dynamics.accelerationCalculation.limIntegrator.y_start"]
16777304 ["dynamics.accelerationCalculation.limiter.uMax"]
16777305 ["dynamics.accelerationCalculation.limiter.uMin"]
16777306 ["result.integrator.k"]
16777307 ["result.integrator.y_start"]
Supports Co-Simulation: true
Model identifier: Longitudinaldynamic_LongitudinaldynamicmodelContinuous
Get/Set State: true
Serialize State: true
Dir. Derivatives: true
Var. com. steps: true
Input interpol.: true
Max order out. der.: 1
Supports Model-Exchange: true
Model identifier: Longitudinaldynamic_LongitudinaldynamicmodelContinuous
Get/Set State: true
Serialize State: true
Dir. Derivatives: true
##################### End information for FMU #####################
One can find many useful things, like the number of states (6), inputs (0) and outputs (0), their names and information about supported features.
2.2 Simulating the FMU
Simulating is as easy as calling simulate
. Note, that we are putting in the parameter dictionary data.params
from above. This FMU has many events, these are detected and handled automatically by FMI.jl.
# let's run a simulation from `tStart` to `tStop`, use the parameters we just viewed for the simulation run
resultFMU = simulate(fmu, # the loaded FMU of the VLDM
(tStart, tStop); # the simulation time range
parameters=data.params, # the parameters for the VLDM
showProgress=showProgress, # show (or don't) the progres bar
recordValues=:derivatives, # record all state derivatives
saveat=tSave) # save solution points at `tSave`
display(resultFMU)
Model name:
Longitudinaldynamic.LongitudinaldynamicmodelContinuous
Success:
true
f(x)-Evaluations:
In-place: 441149
Out-of-place: 0
Jacobian-Evaluations:
∂ẋ_∂p: 0
∂ẋ_∂x: 0
∂ẋ_∂u: 0
∂y_∂p: 0
∂y_∂x: 0
∂y_∂u: 0
∂e_∂p: 0
∂e_∂x: 0
∂e_∂u: 0
∂xr_∂xl: 0
Gradient-Evaluations:
∂ẋ_∂t: 0
∂y_∂t: 0
∂e_∂t: 0
Callback-Evaluations:
Condition (event-indicators): 756903
Time-Choice (event-instances): 58371
Affect (event-handling): 58409
Save values: 5838
Steps completed: 62982
States [5838]:
0.0 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
0.1 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 14.260000000955804]
0.2 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 28.52000000095581]
0.3 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 42.78000000095581]
0.4 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 57.04000000095582]
0.5 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 71.30000000095582]
0.6 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 85.56000000095582]
0.7 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 99.82000000095582]
0.8 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 114.08000000095585]
...
583.7 [-0.001916469855443799, -0.054120690516540136, 3131.826061088337, 3131.346886892768, -4.101183271996665e-5, 1.425987553834784e6]
Values [5838]:
0.0 (0.0, 0.0, 0.0, 0.0, -0.1773972602739726, 142.6)
0.1 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.2 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.3 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.4 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.5 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.6 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.7 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
0.8 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
...
583.7 (0.0, 0.0, 0.0, 0.0, 0.0, 142.6)
Events [58409]:
Time-Event @ 0.01s (state-change: false)
Time-Event @ 0.02s (state-change: false)
Time-Event @ 0.03s (state-change: false)
Time-Event @ 0.04s (state-change: false)
Time-Event @ 0.05s (state-change: false)
Time-Event @ 0.06s (state-change: false)
Time-Event @ 0.07s (state-change: false)
Time-Event @ 0.08s (state-change: false)
Time-Event @ 0.09s (state-change: false)
...
Time-Event @ 583.7s (state-change: false)
This way, you can see interesting metadata on the solution process, like the number of evaluations of the ODE-function, sensitivity or callback evaluations.
We can use the plot
command to plot simulation results from FMUs, too!
# Plot the simulation results
fig = plot(resultFMU; # the simulation result from above
values=false, # don't plot values (:derivatives)
stateIndices=6:6, # only plot states 6 to 6 -> so state 6 ;-)
ylabel="Cumulative consumption [Ws]", # set the title for the y-label
label="FMU") # title the plot line
# further plot the (measurement) data values `consumption_val` and deviation between measurements `consumption_dev`
plot!(fig, data.cumconsumption_t, data.cumconsumption_val; label="Data", ribbon=data.cumconsumption_dev, fillalpha=0.3)
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mfmiPlot(...): Number of time events (58370) exceeding 100, disabling automatic plotting of time events (can be forced with keyword `timeEvents=true`).
The simulation result we already know from the introduction!
3. NeuralFMU setup
Equipped with data and a simulation model, we can setup the NeuralFMU as introduced in the workshop.
3.1 Pre- and Post-Processing
We gather the three derivative values from the last simulation run, to have values for initialization of the pre- and post-processing layers.
# variable we want to manipulate - why we are picking exactly these three is shown a few lines later ;-)
manipulatedDerVars = ["der(dynamics.accelerationCalculation.integrator.y)",
"der(dynamics.accelerationCalculation.limIntegrator.y)",
"der(result.integrator.y)"]
manipulatedDerVals = getValue(resultFMU, manipulatedDerVars)
# what happens without proper transformation between FMU- and ANN-domain?
plot(resultFMU.values.t, manipulatedDerVals[1,:][1]; label="original", xlabel="t [s]", ylabel="velocity [m/s]")
But what happens if we put the velocity into the hyperbolic tangent function?
plot!(resultFMU.values.t, tanh.(manipulatedDerVals[1,:][1]); label="tanh(velocity)")
It gets saturated drastically! That's why we need shift- and scale layers for online pre- and post-processing!
We introduce the ShiftScale
-layer for pre-processing our data.
# pre- and post-processing
preProcess = ShiftScale(manipulatedDerVals); # we put in the derivatives recorded above, FMIFlux shift and scales so we have a data mean of 0 and a standard deivation of 1 (other activation functions / ranges are possible!)
How does the velocity look after pushing it through the ShiftScale
-layer?
testVals = collect(tanh(preProcess(collect(val[t] for val in manipulatedDerVals))[1]) for t in 1:length(resultFMU.values.t))
plot!(resultFMU.values.t,
testVals;
label="tanh(preProcess(velocity))")
You can clearly see, that after pre-processing, the trajectory (green) still mirrors the dynamical behavior of the original system (blue), while the not pre-processed option (orange) just saturates values.
# we add some additional "buffer" - this is not necessary but helps to preserve peaks
preProcess.scale[:] *= 0.25;
# initialize the postProcess as inverse of the preProcess, but only take indices 2 and 3 (we don't need 1, the vehicle velocity)
postProcess = ScaleShift(preProcess; indices=2:3);
3.2 Building the NeuralFMU
To make this more usable, we put the entire NeuralFMU building process (including the pre- and post-processing we had a detailed look on) into a dedicated function build_FMU
.
# function that builds the considered NeuralFMU on basis of a given FMU (FMI-Version 2.0) `f`
function build_NFMU(f::FMU2)
# pre- and post-processing
preProcess = ShiftScale(manipulatedDerVals) # we put in the derivatives recorded above, FMIFlux shift and scales so we have a data mean of 0 and a standard deviation of 1
preProcess.scale[:] *= 0.25 # add some additional "buffer"
postProcess = ScaleShift(preProcess; indices=2:3) # initialize the postProcess as inverse of the preProcess, but only take indices 2 and 3 (we don't need 1, the vehicle velocity)
# cache
cache = CacheLayer() # allocate a cache layer
cacheRetrieve = CacheRetrieveLayer(cache) # allocate a cache retrieve layer, link it to the cache layer
# we have two signals (acceleration, consumption) and two sources (ANN, FMU), so four gates:
# (1) acceleration from FMU (gate=1.0 | open)
# (2) consumption from FMU (gate=1.0 | open)
# (3) acceleration from ANN (gate=0.0 | closed)
# (4) consumption from ANN (gate=0.0 | closed)
# the accelerations [1,3] and consumptions [2,4] are paired
gates = ScaleSum([1.0, 1.0, 0.0, 0.0], [[1,3], [2,4]]) # gates with sum
# setup the NeuralFMU topology
model = Chain(x -> f(; x=x, dx_refs=:all), # take `x`, put it into the FMU, retrieve all derivatives `dx`
dx -> cache(dx), # cache `dx`
dx -> dx[4:6], # forward only dx[4, 5, 6]
preProcess, # pre-process `dx`
Dense(3, 32, tanh), # Dense Layer 3 -> 32 with `tanh` activation
Dense(32, 2, tanh), # Dense Layer 32 -> 2 with `tanh` activation
postProcess, # post process `dx`
dx -> cacheRetrieve(5:6, dx), # dynamics FMU | dynamics ANN
gates, # compute resulting dx from ANN + FMU
dx -> cacheRetrieve(1:4, dx)) # stack together: dx[1,2,3,4] from cache + dx[5,6] from gates
solver = Tsit5()
# new NeuralFMU
neuralFMU = ME_NeuralFMU(f, # the FMU used in the NeuralFMU
model, # the model we specified above
(tStart, tStop), # a default start ad stop time for solving the NeuralFMU
solver;
saveat=tSave) # the time points to save the solution at
neuralFMU.modifiedState = false # speed optimization (NeuralFMU state equals FMU state)
return neuralFMU
end
build_NFMU (generic function with 1 method)
Let's test the NeuralFMU: First, load the FMU und built a NeuralFMU from it.
# build NeuralFMU
neuralFMU = build_NFMU(fmu);
Next, do a simulation for a given start state x0
from FMIZoo.jl.
# simulate and plot the (uninitialized) NeuralFMU
resultNFMU = neuralFMU(x0, # the start state to solve the ODE
(tStart, tStop); # the simulation range
parameters=data.params, # the parameters for the VLDM
showProgress=showProgress, # show progress (or not)
saveat=tSave) # the time points to save the solution at
display(resultNFMU)
Model name:
Longitudinaldynamic.LongitudinaldynamicmodelContinuous
Success:
true
f(x)-Evaluations:
In-place: 409926
Out-of-place: 0
Jacobian-Evaluations:
∂ẋ_∂p: 0
∂ẋ_∂x: 0
∂ẋ_∂u: 0
∂y_∂p: 0
∂y_∂x: 0
∂y_∂u: 0
∂e_∂p: 0
∂e_∂x: 0
∂e_∂u: 0
∂xr_∂xl: 0
Gradient-Evaluations:
∂ẋ_∂t: 0
∂y_∂t: 0
∂e_∂t: 0
Callback-Evaluations:
Condition (event-indicators): 702407
Time-Choice (event-instances): 58371
Affect (event-handling): 58409
Save values: 0
Steps completed: 58440
States [5838]:
0.0 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
0.1 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 14.260000000955804]
0.2 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 28.52000000095581]
0.3 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 42.78000000095581]
0.4 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 57.04000000095582]
0.5 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 71.30000000095582]
0.6 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 85.56000000095582]
0.7 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 99.82000000095582]
0.8 [-1.0313822985067192e-12, -1.5470734477600789e-12, 0.0, 7.219676089547036e-12, -1.0224379404543445e-5, 114.08000000095585]
...
583.7 [-0.0019167094095930257, -0.05412068948173644, 3131.826061088355, 3131.346887534511, -4.3267310775923446e-5, 1.4259877656765594e6]
Events [58409]:
Time-Event @ 0.01s (state-change: false)
Time-Event @ 0.02s (state-change: false)
Time-Event @ 0.03s (state-change: false)
Time-Event @ 0.04s (state-change: false)
Time-Event @ 0.05s (state-change: false)
Time-Event @ 0.06s (state-change: false)
Time-Event @ 0.07s (state-change: false)
Time-Event @ 0.08s (state-change: false)
Time-Event @ 0.09s (state-change: false)
...
Time-Event @ 583.7s (state-change: false)
As for the FMU, we can display the NeuralFMU simulation result and check some statistics.
Now, let's have a look on the cumulative consumption plot ...
# plot the NeuralFMU, original FMU and data (cumulative consumption)
fig = plot(resultNFMU; stateIndices=6:6, stateEvents=false, timeEvents=false, label="NeuralFMU (untrained)", ylabel="cumulative consumption [Ws]")
plot!(fig, resultFMU; stateIndices=6:6, values=false, stateEvents=false, timeEvents=false, label="FMU")
plot!(fig, data.cumconsumption_t, data.cumconsumption_val, label="Data")
As you can see, the FMU und NeuralFMU result looks identically - and this is what we expect for a fully open FMU gate and a fully closed ANN gate!
Finally, unload the FMU and invalidate the NeuralFMU.
# unload FMU / invalidate NeuralFMU
unloadFMU(fmu)
neuralFMU = nothing
4. Training the NeuralFMU
An untrained NeuralFMU is not that impressive - so let's train it a bit.
We start by defining a time sequence (the time points of data measurements) and the cumulative consumption values we want to train for.
# prepare training data
train_t = data.consumption_t
# data is as "array of arrays" required (often we have multidimensional data)
train_data = collect([d] for d in data.cumconsumption_val)
5838-element Vector{Vector{Float64}}:
[0.0]
[-0.41296068176650935]
[0.26787411983582043]
[0.7202168791949798]
[1.0714482470335085]
[1.390037422822217]
[2.1200151652794643]
[2.5196535613914306]
[2.656369007464336]
[2.993187294279602]
[3.4693116134235407]
[4.049369938809381]
[4.673174216401814]
⋮
[1.3879359188013095e6]
[1.3879515067827937e6]
[1.3879669882976608e6]
[1.3879825294049252e6]
[1.3879980607748663e6]
[1.3880134565080018e6]
[1.3880287579379592e6]
[1.388044098663902e6]
[1.388059371012591e6]
[1.388074504338062e6]
[1.3880896849414955e6]
[1.3881049434185931e6]
The data sequence is too long to train on it all at once - so we need to batch our data.
First, we introduce some hyperparameters. Training success always depends on a good choice of hyperparameters, we use the following hyperparameters in this workshop:
BATCHDUR
the duration of a single batch element (length) in seconds.TRAINDUR
specifies the training duration (measured on data) in seconds.ETA
the update rate $\eta$ of the Adam optimizer.BETA1
the first momentum coefficient $\beta_1$ of the Adam optimizer.BETA2
the second momentum coefficient $\beta_2$ of the Adam optimizer.LASTWEIGHT
a weighting factor between the last solution point and all remaining solution points.SCHEDULER
an identifier for the batch scheduler, can be:Sequential
,:Random
or:LossAccumulation
.LOSS
an identifier for the loss function to use,:MAE
or:MSE
.
Next, the loss function is defined. The loss is computed on basis of a given solution
and data
. Dependent on the hyperparameter LOSS
, either :MAE
or :MSE
is used to compute the loss. The hyperparameter LASTWEIGHT
determines how much the last solution point is weight against the remaining solution points. For example a value of $0.3$ determines that the last point of the solution contributes $30\%$ to the loss, whereas all remaining solution points contribute $70\%$ in total.
function _lossFct(solution::FMUSolution, data::VLDM_Data, LOSS::Symbol, LASTWEIGHT::Real=1.0/length(data.consumption_t) )
# determine the start/end indices `ts` and `te` in the data array (sampled with 10Hz)
ts = dataIndexForTime(solution.states.t[1])
te = dataIndexForTime(solution.states.t[end])
# retrieve the data from NeuralODE ("where we are") and data from measurements ("where we want to be") and an allowed deviation ("we are unsure about")
nfmu_cumconsumption = getState(solution, 6; isIndex=true)
cumconsumption = data.cumconsumption_val[ts:te]
cumconsumption_dev = data.cumconsumption_dev[ts:te]
Δcumconsumption = 0.0
if LOSS == :MAE
Δcumconsumption = FMIFlux.Losses.mae_last_element_rel_dev(nfmu_cumconsumption, # NeuralFMU
cumconsumption, # data target
cumconsumption_dev, # data uncertainty
LASTWEIGHT) # how much do we scale the last point compared to the remaining ones?
elseif LOSS == :MSE
Δcumconsumption = FMIFlux.Losses.mse_last_element_rel_dev(nfmu_cumconsumption,
cumconsumption,
cumconsumption_dev,
LASTWEIGHT)
else
@assert false, "Unknown LOSS: `$(LOSS)`"
end
return Δcumconsumption
end
_lossFct (generic function with 2 methods)
Finally, the function train!
is defined, that triggers a new training run for a given set of hyperparameters hyper_params
, a training resource resource
and the current training index ind
.
# resource = training time horizon (duration of data seen)
function train!(hyper_params, resource, ind)
# make the runs deterministic by fixing the random seed
Random.seed!(1234)
# training duration (in seconds) equals the given resource
TRAINDUR = resource
# unpack the hyperparameters
ETA, BETA1, BETA2, BATCHDUR, LASTWEIGHT, SCHEDULER, LOSS = hyper_params
# compute the number of training steps TRAINDUR / BATCHDUR, but do at least one step
steps = max(round(Int, TRAINDUR/BATCHDUR), 1)
# print a bit of info
@info "--------------\nStarting run $(ind) with parameters: $(hyper_params) and resource $(resource) doing $(steps) step(s).\n--------------------"
# load our FMU (we take one from the FMIZoo.jl, exported with Dymola 2020x)
fmu = loadFMU("VLDM", "Dymola", "2020x"; type=:ME)
# built the NeuralFMU on basis of the loaded FMU `fmu`
neuralFMU = build_NFMU(fmu)
# a more efficient execution mode
singleInstanceMode(fmu, true)
# batch the data (time, targets), train only on model output index 6, plot batch elements
batch = batchDataSolution(neuralFMU, # our NeuralFMU model
t -> FMIZoo.getStateVector(data, t), # a function returning a start state for a given time point `t`, to determine start states for batch elements
train_t, # data time points
train_data; # data cumulative consumption
batchDuration=BATCHDUR, # duration of one batch element
indicesModel=6:6, # model indices to train on (6 equals the state `cumulative consumption`)
plot=false, # don't show intermediate plots (try this outside of Jupyter)
parameters=data.params, # use the parameters (map file paths) from *FMIZoo.jl*
showProgress=showProgress) # show or don't show progress bar, as specified at the very beginning
# limit the maximum number of solver steps to 1000 * BATCHDUR (longer batch elements get more steps)
# this allows the NeuralFMU to do 10x more steps (average) than the original FMU, but more should not be tolerated (to stiff system)
solverKwargsTrain = Dict{Symbol, Any}(:maxiters => round(Int, 1000*BATCHDUR))
# a smaller dispatch for our custom loss function, only taking the solution object
lossFct = (solution::FMUSolution) -> _lossFct(solution, data, LOSS, LASTWEIGHT)
# selecting a scheduler for training
scheduler = nothing
if SCHEDULER == :Random
# a scheduler that picks a random batch element
scheduler = RandomScheduler(neuralFMU, batch; applyStep=1, plotStep=0)
elseif SCHEDULER == :Sequential
# a scheduler that picks one batch element after another (in chronological order)
scheduler = SequentialScheduler(neuralFMU, batch; applyStep=1, plotStep=0)
elseif SCHEDULER == :LossAccumulation
# a scheduler that picks the element with largest accumulated loss:
# - after every training step, the accumulated loss for every batch element is increased by the current loss value
# - when picking a batch element, the accumulated loss is reset to zero
# - this promotes selecting elements with larger losses more often, but also prevents starving of elements with small losses
scheduler = LossAccumulationScheduler(neuralFMU, batch, lossFct; applyStep=1, plotStep=0, updateStep=1)
else
@error "Unknown SCHEDULER: ´$(SCHEDULER)´."
return nothing
end
# loss for training, do a simulation run on a batch element taken from the scheduler
loss = p -> FMIFlux.Losses.loss(neuralFMU, # the NeuralFMU to simulate
batch; # the batch to take an element from
p=p, # the NeuralFMU training parameters (given as input)
parameters=data.params, # the FMU parameters
lossFct=lossFct, # our custom loss function
batchIndex=scheduler.elementIndex, # the index of the batch element to take, determined by the chosen scheduler
logLoss=true, # log losses after every evaluation
showProgress=showProgress, # show progress bar (or don't)
solverKwargsTrain...) # the solver kwargs defined above
# gather the parameters from the NeuralFMU
params = FMIFlux.params(neuralFMU)
# initialize the scheduler, keywords are passed to the NeuralFMU
FMIFlux.initialize!(scheduler; parameters=data.params, p=params[1], showProgress=showProgress)
# initialize Adam optimizer with our hyperparameters
optim = Adam(ETA, (BETA1, BETA2))
# the actual training
FMIFlux.train!(loss, # the loss function for training
neuralFMU, # the neural FMU including the parameters to train
Iterators.repeated((), steps), # an iterator repeating `steps` times
optim; # the optimizer to train
gradient=:ReverseDiff, # ForwardDiff leads to good results for multi-event systems
chunk_size=32, # ForwardDiff chunk_size (=number of parameter estimations per run) - only if ForwardDiff is used
cb=() -> FMIFlux.update!(scheduler), # update the scheduler after every step
proceed_on_assert=true) # proceed, even if assertions are thrown, with the next step
# the default execution mode
singleInstanceMode(fmu, false)
# save our result parameters
FMIFlux.saveParameters(neuralFMU, joinpath(@__DIR__, "params", "$(ind).jld2"))
# simulate the NeuralFMU on a validation trajectory
resultNFMU = neuralFMU(x0, (data_validation.consumption_t[1], data_validation.consumption_t[end]); parameters=data_validation.params, showProgress=showProgress, maxiters=1e7, saveat=data_validation.consumption_t)
# determine loss on validation data (if the simulation was successful)
validation_loss = nothing
if resultNFMU.success
# compute the loss on VALIDATION data
validation_loss = _lossFct(resultNFMU, # the NeuralFMU
data_validation, # the validation data set
:MSE) # use MSE
end
# unload FMU
unloadFMU(fmu)
# return the loss (or `nothing` if no loss can be determined)
return validation_loss
end
train! (generic function with 1 method)
If you want to do hyper parameter optimization, uncomment/remove all code that comes from here on. The following is for demonstration purpose only.
Let's check if the train function is working for a given set of hyperparameters.
# check if the train function is working for a set of given (random) hyperparameters
# ([ ETA, BETA1, BETA2, BATCHDUR, LASTWEIGHT, SCHEDULER, LOSS], RESOURCE, INDEX)
train!([0.0001, 0.9, 0.999, 4.0, 0.7, :Random, :MSE], 8.0, 1)
[36m[1m┌ [22m[39m[36m[1mInfo: [22m[39m--------------
[36m[1m│ [22m[39mStarting run 1 with parameters: Any[0.0001, 0.9, 0.999, 4.0, 0.7, :Random, :MSE] and resource 8.0 doing 2 step(s).
[36m[1m└ [22m[39m--------------------
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mCurrent step: 0 | Current element=0 | Next element=92
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mCurrent step: 1 | Current element=92 | Next element=55
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mAVG: 1.20e+04 | MAX: 1.75e+06 | SUM: 1.75e+06
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mCurrent step: 2 | Current element=55 | Next element=106
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mAVG: 1.21e+04 | MAX: 1.75e+06 | SUM: 1.75e+06
9.046692788787865e9
5. Results
After training with a set of good hyperparameters, results can be loaded (one set is already prepared if you skipped the optimization).
# load our FMU (we take one from the FMIZoo.jl, exported with Dymola 2020x)
fmu = loadFMU("VLDM", "Dymola", "2020x"; type=:ME)
# build NeuralFMU
neuralFMU = build_NFMU(fmu)
# load parameters from hyperparameter optimization
FMIFlux.loadParameters(neuralFMU, joinpath(@__DIR__, "juliacon_2023.jld2"))
# simulate and plot the NeuralFMU
resultNFMU = neuralFMU(x0, (tStart, tStop); parameters=data.params, showProgress=showProgress, saveat=tSave)
resultFMU = simulate(fmu, (tStart, tStop); parameters=data.params, showProgress=showProgress, saveat=tSave)
# plot the NeuralFMU, original FMU and data (cumulative consumption)
fig = plot(resultNFMU; stateIndices=6:6, stateEvents=false, timeEvents=false, label="NeuralFMU", ylabel="cumulative consumption [m/s]")
plot!(fig, resultFMU; stateIndices=6:6, values=false, stateEvents=false, timeEvents=false, label="FMU")
plot!(fig, data.cumconsumption_t, data.cumconsumption_val, label="Data")
We also have a ready-to-use function that calculates different errors and plots them.
plotCumulativeConsumption(resultNFMU, resultFMU, data; filename=joinpath(@__DIR__, "comparison_train_100.png"))
Because the deviation is small, let's check the last 10% of WLTC focussed, so from 90% to 100%.
plotCumulativeConsumption(resultNFMU, resultFMU, data; range=(0.9, 1.0), filename=joinpath(@__DIR__, "comparison_train_10.png"))
Finally, we should check the results on validation data: The full WLTC cycle.
# get start and stop for the validation cycle (full WLTC)
tStart_validation = data_validation.cumconsumption_t[1]
tStop_validation = data_validation.cumconsumption_t[end]
tSave_validation = data_validation.cumconsumption_t
# simulate the NeuralFMU on validation data
resultNFMU = neuralFMU(x0, (tStart_validation, tStop_validation); parameters=data_validation.params, showProgress=showProgress, saveat=tSave_validation)
resultFMU = simulate(fmu, (tStart_validation, tStop_validation); parameters=data_validation.params, showProgress=showProgress, saveat=tSave_validation)
plotCumulativeConsumption(resultNFMU, resultFMU, data_validation; filename=joinpath(@__DIR__, "comparison_validation_100.png"))
... and the last 10% ...
plotCumulativeConsumption(resultNFMU, resultFMU, data_validation; range=(0.9, 1.0), filename=joinpath(@__DIR__, "comparison_validation_10.png"))
Check out the error values in the legend: This is an enhancement of factor x326 on MSE, x22 on MAE and x11 on MAX error, wow!
Finally some plotting "sugar": A plot showing for which locations in derivative-space the model enhanced the cumulative consumption prediction the most:
plotEnhancements(neuralFMU, fmu, data; filename=joinpath(@__DIR__, "gif_1.gif"))
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSaved animation to D:\a\FMIFlux.jl\FMIFlux.jl\examples\jupyter-src\gif_1.gif
After we finished, let's finally unload the FMU and invalidate the NeuralFMU.
# unload FMU / invalidate NeuralFMU
unloadFMU(fmu)
neuralFMU = nothing
But: We did look on some results, but did not talk about where the used hyperparameters came from ...
They come from hyperparameter optimization - and this step is necessary for NeuralODEs too!
Optional: Organize as module
If you want, you can place all code inside of a module named NODE_Training
, this simplifies hyper parameter optimization (if you want to do one).
# for hyper parameter optimization, place the code in a `module`
# uncomment the following three lines and place them at the very beginning
#module NODE_Training
#using DistributedHyperOpt
#using DistributedHyperOpt.Distributed
# ... and uncomment the following line
#end # NODE_Training