macrostat.sample package

Submodules

macrostat.sample.sampler module

Class designed to facilitate the sampling of the model’s parameterspace

class macrostat.sample.sampler.Sampler(model: ~macrostat.models.model.Model, worker_function: callable = <function timeseries_worker>, output_folder: str = 'samples', cpu_count: int = 1, batchsize: int = None)[source]

Bases: object

extract(columns: list = None, indices: list = None, chunksize: int = 100000)[source]

Extract the results from the output file.

The function uses a pandas chunkreader to extract the data from the output file. It is possible to extract only a subset of the columns, parameter IDs, or indices. This reduces the memory footprint when dealing with a large number of parameterizations.

Parameters:
  • columns (list) – List of columns to extract

  • pids (list) – List of parameter IDs to extract i.e. the batch number

  • indices (list) – List of indices to extract

  • chunksize (int (default 100000)) – Chunksize to read in the data

generate_tasks(*args, **kwargs) list[tuple][source]

Generate tasks for the parallel processor.

This method should return a list of tuples that will be passed to the worker function. By default, the first item in the tuple is the model object, and all remaining items are the arguments that will be passed to the model.simulate() function.

classmethod load(filename)[source]

Class method to load an instance of Sampler. Usage:

sampler = Sampler.load(filename)

Parameters:

filename (str or Path) – path to the targeted Sampler

sample(tqdm_info: str = 'Sampling')[source]

Run in parallel the sampling of the model’s parameterspace by generating a set of tasks and executing them in parallel

Parameters:

tqdm_info (str (default "Sampling")) – Information to be displayed in the tqdm progress bar

save(name: str = 'sampler')[source]

Save the Sampler object as a PKL for later use

save_outputs(raw_outputs: list, batch: int)[source]

Save the raw outputs to disk.

The model’s outputs are in the form of a pandas DataFrame. This method should save the outputs to disk in a format that can be easily read back in later. Generically, it writes a CSV file with the outputs in a MultiIndex format. However, this can be overwritten to save in a different format.

Parameters:
  • raw_outputs (list) – List of outputs from the parallel processing. By default, batchprocessing.timeseries_worker returns a tuple of (*task_arguments, output)

  • batch (int (default None)) – Batch number to save the outputs. Assumes that the batchsize is constant.

macrostat.sample.sobol module

Class designed to facilitate the sampling of the model’s parameterspace

class macrostat.sample.sobol.SobolSampler(model: ~macrostat.models.model.Model, bounds: dict, sample_power: int = 10, seed: int = 0, logspace: bool = False, worker_function: callable = <function timeseries_worker>, simulation_args: tuple = (), output_folder: str = 'sobol_samples', cpu_count: int = 1, batchsize: int = None)[source]

Bases: Sampler

generate_tasks()[source]

Generate tasks for the parallel processor based on the Sobol sequence for the model’s parameterspace using the bounds set in the class.

Here the scipy.stats.qmc.Sobol class is used to generate the Sobol sequence, specifically the random_base2 method is used to generate the samples, as it is has slightly better space filling properties than with a custom number of samples.

Returns:

List of tuples containing the model and the task to be processed

Return type:

list[tuple]

Module contents