xarray.Dataset.xsimlab.run¶
-
Dataset.xsimlab.
run
(model=None, batch_dim=None, check_dims='strict', validate='inputs', store=None, encoding=None, decoding=None, hooks=None, parallel=False, scheduler=None, safe_mode=True)¶ Run the model.
- Parameters
model (
xsimlab.Model
object, optional) – Reference model. If None, tries to get model from context.batch_dim (str, optional) – Dimension label in the input dataset used to run a batch of simulations.
check_dims ({'strict', 'transpose'}, optional) –
Check the dimension(s) of each input variable given in Dataset. It may be one of the following options:
’strict’: the dimension labels must exactly correspond to (one of) the label sequences defined by their respective model variables (default)
’transpose’: input variables might be transposed in order to match (one of) the label sequences defined by their respective model variables
Note that
batch_dim
(if any) and clock dimensions are excluded from this check. If None is given, no check is performed.validate ({'inputs', 'all'}, optional) –
Define what will be validated using the variable’s validators defined in
model
’s processes (if any). It may be one of the following options:’inputs’: validate only values given as inputs (default)
’all’: validate both input values and values set through foreign variables in process classes
The latter may significantly impact performance, but it may be useful for debugging. If None is given, no validation is performed.
store (str or
collections.abc.MutableMapping
orzarr.Group
object, optional) – If a string (path) is given, simulation I/O data will be saved in that specified directory in the file system. If None is given (default), all data will be saved in memory. This parameter also directly accepts a zarr group object or (most of) zarr store objects for more storage options (see notes below).encoding (dict, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g.,
{'my_variable': {'dtype': 'int16', 'fill_value': -9999,}, ...}
. Encoding options provided here override encoding options defined in model variables (seevariable()
for a full list of of options available). Additionally, ‘chunks’ and ‘synchronizer’ options are supported here.decoding (dict, optional) – Options passed as keyword arguments to
xarray.open_zarr()
to load the simulation outputs from the zarr store as a new xarray dataset.hooks (list, optional) – One or more runtime hooks, i.e., functions decorated with
runtime_hook()
or instances ofRuntimeHook
. The latter can also be used using thewith
statement or using theirregister()
method.parallel (bool, optional) – If True, run the simulation(s) in parallel using Dask (default: False). If a dimension label is set for
batch_dim
, each simulation in the batch will be run in parallel. Otherwise, the processes inmodel
will be executed in parallel for each simulation stage.scheduler (str, optional) – Dask’s scheduler used to run the simulation(s) in parallel. See
dask.compute()
. It also accepts any instance ofdistributed.Client
.safe_mode (bool, optional) – If True (default), a clone of
model
will be used to run each simulation so that it is safe to run multiple simulations simultaneously (provided that the code executed inmodel
is thread-safe too). Generally safe mode shouldn’t be disabled, except in a few cases (e.g., debugging).
- Returns
output – Another Dataset with both model inputs and outputs. The data is (lazily) loaded from the zarr store used to save inputs and outputs.
- Return type
Dataset
Notes
xarray-simlab uses the zarr library (https://zarr.readthedocs.io) to save model inputs and outputs during a simulation. zarr provides a common interface to multiple storage solutions (e.g., in memory, on disk, cloud-based storage, databases, etc.). Some stores may not work well with xarray-simlab, though. For example
zarr.storage.ZipStore
is not supported because it is not possible to write data to a dataset after it has been created.xarray-simlab uses the dask library (https://docs.dask.org) to run the simulation(s) in parallel. Dask is a powerful library that allows running tasks (either simulations or model processes) on a single machine (multi-threads or multi-processes) or even on a distributed architecture (HPC, Cloud). Even though xarray-simlab includes some safeguards against race conditions, those might still occur under some circumstances and thus require extra care. In particular:
The code implemented in the process classes of
model
must be thread-safe if a dask multi-threaded scheduler is used, and must be serializable if a multi-process or distributed scheduler is used.Multi-process or distributed schedulers are not well supported or may have poor performance when running the
model
processes in parallel (i.e., single-model parallelism), depending on the amount of data shared between the processes. Seexsimlab.Model.execute()
for more details.Not all zarr stores are safe to write in multiple threads or processes. For example,
zarr.storage.MemoryStore
used by default is safe to write in multiple threads but not in multiple processes.If chunks are specified in
encoding
with chunk size > 1 forbatch_dim
, then one of the zarr synchronizers should be used too, otherwise model output values will not be saved properly. Pickzarr.sync.ThreadSynchronizer
orzarr.sync.ProcessSynchronizer
depending on which dask scheduler is used. Also, check that the (distributed) scheduler doesn’t use both multiple processes and multiple threads.