xarray.Dataset.xsimlab.run

Dataset.xsimlab.run(model=None, batch_dim=None, check_dims='strict', validate='inputs', store=None, encoding=None, decoding=None, hooks=None, parallel=False, scheduler=None, safe_mode=True)

Run the model.

Parameters
  • model (xsimlab.Model object, optional) – Reference model. If None, tries to get model from context.

  • batch_dim (str, optional) – Dimension label in the input dataset used to run a batch of simulations.

  • check_dims ({'strict', 'transpose'}, optional) –

    Check the dimension(s) of each input variable given in Dataset. It may be one of the following options:

    • ’strict’: the dimension labels must exactly correspond to (one of) the label sequences defined by their respective model variables (default)

    • ’transpose’: input variables might be transposed in order to match (one of) the label sequences defined by their respective model variables

    Note that batch_dim (if any) and clock dimensions are excluded from this check. If None is given, no check is performed.

  • validate ({'inputs', 'all'}, optional) –

    Define what will be validated using the variable’s validators defined in model’s processes (if any). It may be one of the following options:

    • ’inputs’: validate only values given as inputs (default)

    • ’all’: validate both input values and values set through foreign variables in process classes

    The latter may significantly impact performance, but it may be useful for debugging. If None is given, no validation is performed.

  • store (str or collections.abc.MutableMapping or zarr.Group object, optional) – If a string (path) is given, simulation I/O data will be saved in that specified directory in the file system. If None is given (default), all data will be saved in memory. This parameter also directly accepts a zarr group object or (most of) zarr store objects for more storage options (see notes below).

  • encoding (dict, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., {'my_variable': {'dtype': 'int16', 'fill_value': -9999,}, ...}. Encoding options provided here override encoding options defined in model variables (see variable() for a full list of of options available). Additionally, ‘chunks’ and ‘synchronizer’ options are supported here.

  • decoding (dict, optional) – Options passed as keyword arguments to xarray.open_zarr() to load the simulation outputs from the zarr store as a new xarray dataset.

  • hooks (list, optional) – One or more runtime hooks, i.e., functions decorated with runtime_hook() or instances of RuntimeHook. The latter can also be used using the with statement or using their register() method.

  • parallel (bool, optional) – If True, run the simulation(s) in parallel using Dask (default: False). If a dimension label is set for batch_dim, each simulation in the batch will be run in parallel. Otherwise, the processes in model will be executed in parallel for each simulation stage.

  • scheduler (str, optional) – Dask’s scheduler used to run the simulation(s) in parallel. See dask.compute(). It also accepts any instance of distributed.Client.

  • safe_mode (bool, optional) – If True (default), a clone of model will be used to run each simulation so that it is safe to run multiple simulations simultaneously (provided that the code executed in model is thread-safe too). Generally safe mode shouldn’t be disabled, except in a few cases (e.g., debugging).

Returns

output – Another Dataset with both model inputs and outputs. The data is (lazily) loaded from the zarr store used to save inputs and outputs.

Return type

Dataset

Notes

xarray-simlab uses the zarr library (https://zarr.readthedocs.io) to save model inputs and outputs during a simulation. zarr provides a common interface to multiple storage solutions (e.g., in memory, on disk, cloud-based storage, databases, etc.). Some stores may not work well with xarray-simlab, though. For example zarr.storage.ZipStore is not supported because it is not possible to write data to a dataset after it has been created.

xarray-simlab uses the dask library (https://docs.dask.org) to run the simulation(s) in parallel. Dask is a powerful library that allows running tasks (either simulations or model processes) on a single machine (multi-threads or multi-processes) or even on a distributed architecture (HPC, Cloud). Even though xarray-simlab includes some safeguards against race conditions, those might still occur under some circumstances and thus require extra care. In particular:

  • The code implemented in the process classes of model must be thread-safe if a dask multi-threaded scheduler is used, and must be serializable if a multi-process or distributed scheduler is used.

  • Multi-process or distributed schedulers are not well supported or may have poor performance when running the model processes in parallel (i.e., single-model parallelism), depending on the amount of data shared between the processes. See xsimlab.Model.execute() for more details.

  • Not all zarr stores are safe to write in multiple threads or processes. For example, zarr.storage.MemoryStore used by default is safe to write in multiple threads but not in multiple processes.

  • If chunks are specified in encoding with chunk size > 1 for batch_dim, then one of the zarr synchronizers should be used too, otherwise model output values will not be saved properly. Pick zarr.sync.ThreadSynchronizer or zarr.sync.ProcessSynchronizer depending on which dask scheduler is used. Also, check that the (distributed) scheduler doesn’t use both multiple processes and multiple threads.