BaseSampler#

class macrostat.sample.sampler.BaseSampler(model: Model, bounds: dict | None = None, logspace: bool = False, worker_function: callable = <function timeseries_worker>, simulation_args: tuple = (), output_folder: str = 'samples', cpu_count: int = 1, batchsize: int = None, save_to_disk: bool = True, output_filetype: str = 'csv', output_compression: str | None = None)[source]

Bases: object

generate_parameters()[source]

Generate parameters for the parallel processor

generate_tasks(points: DataFrame)[source]

Generate tasks for the parallel processor based on the parameters generated by the generate_parameters method.

Parameters:

points (pd.DataFrame) – DataFrame containing the points to be processed

Returns:

List of tuples containing the model and the task to be processed

Return type:

list[tuple]

sample(verbose: bool = False, points: DataFrame = None)[source]

Run in parallel the sampling of the model’s parameterspace by generating a set of tasks and executing them in parallel

Parameters:

verbose (bool (default False)) – Whether to print progress information

save_outputs(data: DataFrame, batch: int)[source]

Save the raw outputs to disk.

The model’s outputs are in the form of a pandas DataFrame. This method should save the outputs to disk in a format that can be easily read back in later. Generically, it writes a CSV file with the outputs in a MultiIndex format. However, this can be overwritten to save in a different format.

Parameters:
  • data (pd.DataFrame) – The samples run in this dataset

  • batch (int) – Batch number to save the outputs. Assumes that the batchsize is constant.

transform_outputs(raw_outputs: list, batch: int)[source]

Concatenate the raw outputs into a single pandas dataframe

Parameters:
  • raw_outputs (list) – List of outputs from the parallel processing. By default, batchprocessing.timeseries_worker returns a tuple of (*task_arguments, output)

  • batch (int) – Batch number to save the outputs. Assumes that the batchsize is constant.

Returns:

output

Return type:

pd.DataFrame

verify_bounds(bounds: dict) None[source]

Verify that the bounds are correctly set, in particular 0. Check that the parameters are in the model 1. That there is a lower and upper bound for each parameter 2. That the lower bound is smaller than the upper bound 3. That the bounds are in the correct order 4. If the bounds are in logspace, that the bounds are either both positive or both negative 5. If the bounds are in logspace, that either bound is not zero

Parameters:
  • bounds (dict[str, tuple]) – Dictionary containing the bounds for each parameter to be sampled

  • logspace (bool) – Whether to sample the parameters in logspace

Return type:

None

Raises:

ValueError – If the bounds are not correctly set