This section explains the design of the xarray-simlab modeling framework. It is useful mostly for users who want to create new models from scratch or customize existing models. Users who only want to run simulations from existing models may skip this section.
For more practical details on how using the API to create, inspect and run models, see the relevant sections of this user guide.
The xarray-simlab framework is built on a very few concepts that allow great flexibility in model customization:
These are detailed here below.
Models are instances of the
Model class. They
consist of ordered, immutable collections of processes. The
ordering is inferred automatically from the given processes (see below).
The Model class also implements specific methods for:
- running simulations,
- easy creation of new Model objects from existing ones by dropping, adding or replacing one or more processes.
Processes are defined as Python classes that are decorated by
process(). The role of a process is twofold:
- declare a given subset of the variables used in a model,
- define a specific set of instructions that use or compute values for these variables during a model run.
Conceptually, a process is a logical component of a computational model. It may for example represent a particular physical mechanism that is described in terms of one or more state variables (e.g., scalar or vector fields) and one or more operations – with or without parameters – that modify those state variables through time. Note that some processes may be time-independent or may even be used to declare variables without implementing any computation.
xarray-simlab does not provide any built-in logic for tasks like generating computational meshes or setting boundary conditions, which should rather be implemented in 3rd-party libraries as processes. Even those tasks may be too specialized to justify including them in this framework, which aims to be as general as possible.
A process-ified class behaves mostly like any other regular Python
class, i.e., there is a-priori nothing that prevents you from using
the common object-oriented features as you like. The only difference
is that you can here create classes in a very succinct way without
boilerplate, i.e., you don’t need to implement dunder methods like
__repr__ as this is handled by the framework. In
fact, this framework uses and extends the attrs package:
process() is a wrapper around
attr.s() and the
functions used to create variables (see below) are thin wrappers
Variables are the most basic elements of a model. They are declared in
processes as class attributes, using
Declaring variables mainly consists of defining useful metadata such
- labeled dimensions (or no dimension for scalars),
- predefined meta-data attributes, e.g., a short description,
- user-defined meta-data attributes, e.g., units or math symbol,
- the intent for a variable, i.e., whether the process
intent='in'), updates (
intent='inout') or computes (
intent='out') a value for that variable.
xarray-simlab does not distinguish between model parameters, input
and output variables. All can be declared using
Like different physical mechanisms involve some common state variables (e.g., temperature or pressure), different processes may operate on common variables.
In xarray-simlab, a variable is declared at a unique place, i.e.,
within one and only one process. Using common variables across
processes is achieved by declaring
variables. These are simply references to variables that are declared
in other processes.
You can use foreign variables for almost any computation inside a
process just like original variables. The only difference is that
intent='inout' is not supported for a foreign variable, i.e., a
process may either need or compute a value of a foreign variable but
may not update it (otherwise it would not be possible to unambiguously
determine process dependencies – see below). For the same reason,
only one process in a model may compute a value of a variable (i.e.,
The great advantage of declaring variables at unique places is that all their meta-data are defined once. However, a downside of this approach is that foreign variables may potentially add many hard-coded links between processes, which makes harder reusing these processes independently of each other.
In some cases, using group variables may provide an elegant alternative to hard-coded links between processes.
The membership of variables to a group is defined via their
attribute. If you want to use in a separate process all the variables
of a group, instead of explicitly declaring foreign variables you can
group() variable. The latter behaves like an
iterable of foreign variables pointing to each of the variables that
are members of the group, across the model.
Note that group variables only support
intent='in', i.e, group
variables should only be used to get the values of multiple foreign
variables of a same group.
Group variables are useful particularly in cases where you want to combine (aggregate) different processes that act on the same variable, e.g. in landscape evolution modeling combine the effect of different erosion processes on the evolution of the surface elevation. This way you can easily add or remove processes to/from a model and avoid missing or broken links between processes.
On-demand variables are like regular variables, except that their
value is not intended to be computed systematically, e.g., at the
beginning or at each time step of a simulation, but instead only at a
given few times (or not at all). These are declared using
on_demand() and must implement in the same
process-ified class a dedicated method – i.e., decorated with
foo is the name of the variable – that
returns their value. They have always
On-demand variables are useful, e.g., for optional model diagnostics.
A model run is divided into four successive stages:
- run step
- finalize step
During a simulation, stages 1 and 4 are run only once while stages 2 and 3 are repeated for a given number of (time) steps.
Each process-ified class may provide its own computation instructions
by implementing specific methods named
.finalize() for each
stage above, respectively. Note that this is entirely optional. For
example, time-independent processes (e.g., for setting model grids)
usually implement stage 1 only. In a few cases, the role of a process
may even consist of just declaring some variables that are used
Get / set variable values inside a process¶
Once you have declared a variable as a class attribute in a process, you
can further get and/or set its value like it was defined as a property
of that class. For example, if you declare a variable
foo you can
self.foo to get/set its value inside one method of that
This is exactly what does the
process() decorator: it
takes all variables declared as class attributes and turns them into
properties, which may be read-only depending on the
intent set for
Basically, the getter (setter) methods of these properties read (write) values from (into) a simple key-value store (except for on-demand variables). Currently the store is fully in-memory but it could be easily replaced by an on-disk or a distributed store. The xarray-simlab’s modeling framework can thus be viewed as a thin object-oriented layer built on top of an abstract key-value store.
Process dependencies and ordering¶
The order in which processes are executed during a simulation is critical. For example, if the role of a process is to compute a value for a given variable, then the execution of this process must happen before the execution of all other processes that use the same variable in their computation.
In a model, the processes and their dependencies together form the
nodes and the edges of a Directed Acyclic Graph (DAG). The graph
topology is fully determined by the
intent set for each variable
or foreign variable declared in each process. An ordering that is
computationally consistent can then be obtained using topological
sorting. This is done at Model object creation. The same ordering is
used at every stage of a model run.
In principle, the DAG structure would also allow running the processes in parallel at every stage of a model run. This is not yet implemented, though.
In a model, inputs are variables that need a value to be set by the user before running a simulation.
Like process ordering, inputs are automatically retrieved at Model
object creation by looking at the
intent set for all variables and
foreign variables in the model. A variable is a model input if it has
intent set to
'inout' and if it has no linked
foreign variable with