This section explains the design of the xarray-simlab modeling framework. It is useful mostly for users who want to create new models from scratch or customize existing models. Users who only want to run simulations from existing models may skip this section.
For more practical details on how using the API to create, inspect and run models, see the relevant sections of this user guide.
The xarray-simlab framework is built on a very few concepts that allow great flexibility in model customization:
These are detailed here below.
Models are instances of the
Model class. They
consist of ordered, immutable collections of processes. The
ordering is inferred automatically from the given processes (see below).
The Model class also implements specific methods for:
easy creation of new Model objects from existing ones by dropping, adding or replacing one or more processes.
Processes are defined as Python classes that are decorated by
process(). The role of a process is twofold:
declare a given subset of the variables used in a model,
define a specific set of instructions that use or compute values for these variables during a model run.
Conceptually, a process is a logical component of a computational model. It may for example represent a particular physical mechanism that is described in terms of one or more state variables (e.g., scalar or vector fields) and one or more operations – with or without parameters – that modify those state variables through time. Note that some processes may be time-independent or may even be used to declare variables without implementing any computation.
xarray-simlab does not provide any built-in logic for tasks like generating computational meshes or setting boundary conditions, which should rather be implemented in 3rd-party libraries as processes. Even those tasks may be too specialized to justify including them in this framework, which aims to be as general as possible.
A process-ified class behaves mostly like any other regular Python
class, i.e., there is a-priori nothing that prevents you from using
the common object-oriented features as you like. The only difference
is that you can here create classes in a very succinct way without
boilerplate, i.e., you don’t need to implement dunder methods like
__repr__ as this is handled by the framework. In
fact, this framework uses and extends the attrs package:
process() is a wrapper around
attr.s() and the
functions used to create variables (see below) are thin wrappers
Variables are the most basic elements of a model. They are declared in
processes as class attributes, using
Declaring variables mainly consists of defining useful metadata such
labeled dimensions (or no dimension for scalars),
predefined meta-data attributes, e.g., a short description,
user-defined meta-data attributes, e.g., units or math symbol,
the intent for a variable, i.e., whether the process needs (
intent='in'), updates (
intent='inout') or computes (
intent='out') a value for that variable.
It is also possible to set a default value as well as value validator(s). See attrs’ validators for more details.
xarray-simlab does not distinguish between model parameters, input
and output variables. All can be declared using
Like different physical mechanisms involve some common state variables (e.g., temperature or pressure), different processes may operate on common variables.
In xarray-simlab, a variable is declared at a unique place, i.e.,
within one and only one process. Using common variables across
processes is achieved by declaring
variables. These are simply references to variables that are declared
in other processes.
You can use foreign variables for almost any computation inside a
process just like original variables. The only difference is that
intent='inout' is not supported for a foreign variable, i.e., a
process may either need or compute a value of a foreign variable but
may not update it (otherwise it would not be possible to unambiguously
determine process dependencies – see below). For the same reason,
only one process in a model may compute a value of a variable (i.e.,
The great advantage of declaring variables at unique places is that all their meta-data are defined once. However, a downside of this approach is that foreign variables may potentially add many hard-coded links between processes, which makes harder reusing these processes independently of each other. Global and/or group variables may circumvent those hard-coded link.
Like foreign variables, global variables can be used for accessing (read / write) common variables in multiple processes. Unlike foreign variables, linking global variables is implicit and only based on their “global name”, which must identify it unambiguously in a model.
Most variables in xarray-simlab accept a
global_name argument. References to
these variables can be declared in other processes using
global_ref(). Those references are resolved when creating new a
Model object. Xarray-simlab also implements some safe-guards
against possible inconsistencies in a model (i.e., unresolved global names or
Global variables are a good alternative to foreign variables in the presence of
naming standards. It may increase the potential to reuse process classes and
build more flexible model structures. One major limitation, however, is that
variable metadata (e.g., description, dimensions, etc.) is not known for
global_ref() declarations until a model is created, which limits
introspection of the processes taken individually.
In some cases, using group variables may provide an elegant alternative to hard-coded links between processes.
The membership of variables to one or several groups is defined via their
groups attribute. If you want to reuse in a separate process all the
variables of a given group, instead of explicitly declaring each of them as
foreign variables you can simply declare a
group_dict()) variable. The latter behaves like an iterable (or
mapping) of foreign variables pointing to each of the variables (model-wise)
that are members of the same group.
Note that group variables implicitly have
intent='in', i.e, they could only
be used to get the values of multiple foreign variables, not set their values.
Group variables are useful particularly in cases where you want to combine (aggregate) different processes that act on the same variable, e.g. in landscape evolution modeling combine the effect of different erosion processes on the evolution of the surface elevation. This way you can easily add or remove processes to/from a model and avoid missing or broken links between processes.
On-demand variables are like regular variables, except that their
value is not intended to be computed systematically, e.g., at the
beginning or at each time step of a simulation, but instead only at a
given few times (or not at all). These are declared using
on_demand() and must implement in the same
process-ified class a dedicated method – i.e., decorated with
foo is the name of the variable – that
returns their value. They implicitly have
On-demand variables are useful, e.g., for optional model diagnostics.
Index variables are intended for indexing data of other variables in a model
like, e.g., coordinate labels of grid nodes. They are declared using
index(). They implicitly have
intent='out', although their
values could be computed from other input variables.
Sometimes we need to share between processes one or more arbitrary objects,
e.g., callables or instances of custom classes that have no array-like
interface. Those objects should be declared in process-decorated classes using
Within a model, those ‘object’ variables are reserved for internal use only,
i.e., they never require an input value (they implicitly have
and they can’t be saved as outputs as their value may not be compatible with the
xarray data model. Of course, it is still possible to create those objects using
data from other (input) variables declared in the process. Likewise, their data
could still be coerced into a scalar or an array and be saved as output via
A model run is divided into four successive stages:
During a simulation, stages 1 and 4 are run only once while stages 2 and 3 are repeated for a given number of (time) steps. Stage 4 is always run even when an exception is raised during stage 1, 2 or 3.
process() decorated class may provide its own computation
instructions by implementing specific “runtime” methods named
.finalize() for each stage above,
respectively. Note that this is entirely optional. For example, time-independent
processes (e.g., for setting model grids) usually implement stage 1 only. In a
few cases, the role of a process may even consist of just declaring some
variables that are used elsewhere.
Runtime methods may be decorated by
runtime(). This is useful if
one needs to access the value of some runtime-specific variables like the
current step, time step duration, etc. from within those methods. Runtime
methods may also return a
RuntimeSignal() to control the
workflow, e.g., break the execution of the current stage.
It is also possible to monitor and/or control simulations independently of any model, using runtime hooks. See Section Monitor Model Runs.
Get / set variable values inside a process¶
Once you have declared a variable as a class attribute in a process, you can
further get and/or set its value like a regular instance attribute. For example,
if you declare a variable
foo you can just use
self.foo to get/set its
value inside one method of that class.
process() decorator takes all variables
declared as class attributes and turns them into properties, which may be
read-only depending on the
intent set for the variables. For all variables
except on-demand variables, the getter/setter methods of those properties
read/write values via a simple dictionary that is common to a simulation. Note
that those properties are created only for the case where a
decorated class is used within a
Process dependencies and ordering¶
The order in which processes are executed during a simulation is critical. For example, if the role of a process is to compute a value for a given variable, then the execution of this process must happen before the execution of all other processes that use the same variable in their computation.
In a model, the processes and their dependencies together form the
nodes and the edges of a Directed Acyclic Graph (DAG). The graph
topology is fully determined by the
intent set for each variable
or foreign variable declared in each process. An ordering that is
computationally consistent can then be obtained using topological
sorting. This is done at Model object creation. The same ordering is
used at every stage of a model run.
The DAG structure also allows running the processes in parallel at every stage of a model run, see Section Single-model parallelism.
In a model, inputs are variables that need a value to be set by the user before running a simulation.
Like process ordering, inputs are automatically retrieved at Model
object creation by looking at the
intent set for all variables and
foreign variables in the model. A variable is a model input if it has
intent set to
'inout' and if it has no linked
foreign variable with