Differentiability and Jacobian Tools#

This page describes the macrostat.diff module, which provides utilities for computing and validating Jacobians of MacroStat models using PyTorch.

Overview#

The differentiability tools are designed for two use cases:

Development-time checks that a model is differentiable and that gradients are sensible (no NaNs/Infs, forward vs reverse AD match, etc.).
Calibration and analysis that require Jacobians or gradient information with respect to model parameters.

The core pieces live in macrostat.diff:

macrostat.diff.JacobianBase
macrostat.diff.JacobianAutograd
macrostat.diff.JacobianNumerical
macrostat.diff.check_model_differentiability()
macrostat.diff.compare_jacobian_dicts()

Quick start#

Given a MacroStat model instance (for example macrostat.models.GL06SIMEX or the small linear test model macrostat.models.LINEAR2D), you can define a scalar loss and compute gradients as follows:

import torch
from macrostat.models import get_model
from macrostat.diff import JacobianAutograd, JacobianNumerical

# Build a model
ModelClass = get_model("LINEAR2D")
model = ModelClass()

# Define a loss on the model outputs
target = torch.tensor([1.0, 0.0])

def loss_fn(output: dict[str, torch.Tensor]) -> torch.Tensor:
    y = output["State"][-1]  # final 2D state
    return torch.nn.functional.mse_loss(y, target)

# Autograd-based gradients
jac_auto = JacobianAutograd(model)
grads_rev = jac_auto.compute(loss_fn=loss_fn, mode="rev")

# Numerical finite-difference gradients (default: log-space, eps=1e-3)
jac_num_log = JacobianNumerical(model)
grads_num_log = jac_num_log.compute(loss_fn=loss_fn, mode="central")

# Or in direct space, useful when any parameter equals zero
jac_num_direct = JacobianNumerical(
    model, epsilon=1e-5, parameter_space="direct"
)
grads_num_direct = jac_num_direct.compute(loss_fn=loss_fn, mode="central")

Direct vs log-space perturbations#

JacobianNumerical supports two perturbation schemes, controlled by the parameter_space argument.

Direct space parameter_space="direct" perturbs each parameter additively:

\[\frac{\partial f}{\partial p} \approx \frac{f(p + \varepsilon) - f(p - \varepsilon)}{2\varepsilon}.\]

This is the textbook central difference. It works for any parameter value, including zero, but the absolute step size has no relation to the parameter scale, so the same epsilon is too small for parameters with magnitudes near unity and too large for parameters near zero.

Log space parameter_space="log" (the default) perturbs each parameter multiplicatively:

\[\frac{\partial f}{\partial p} \approx \frac{f(p \cdot e^{\varepsilon}) - f(p \cdot e^{-\varepsilon})} {p \cdot (e^{\varepsilon} - e^{-\varepsilon})}.\]

The perturbation is now a fixed fraction of the parameter value, so the same epsilon (default 1e-3, roughly a 0.1% relative step) is appropriate across parameters of very different magnitudes. This is the recommended default for SFC and behavioral models, where parameter scales differ by many orders of magnitude.

Choice of epsilon is a tradeoff: large values amplify truncation error from the finite-difference formula, small values amplify floating-point roundoff. For float32 in log space, values between 1e-4 and 1e-2 typically work well; 1e-3 is the package default.

Zero-parameter caveat. Log-space perturbation is undefined when a parameter equals zero (you cannot take the log of zero, and the denominator vanishes). JacobianNumerical detects this case during compute and raises ValueError: Cannot use log-space with zero parameters: [...]. For example, the LINEAR2D test model has a parameter x0_2 that defaults to zero, so its tests must explicitly pass parameter_space="direct". Negative parameters are handled in log space using a sign-preserving transformation sign(p) * exp(log(abs(p)) +/- eps).

High-level differentiability check#

For a more complete diagnostic, use macrostat.diff.check_model_differentiability():

from macrostat.diff import check_model_differentiability

report = check_model_differentiability(
    model=model,
    loss_fn=loss_fn,
    scenario=0,
    rtol=1e-5,
    atol=1e-8,
    compare_forward_reverse=True,
    compare_numerical=True,
    numerical_mode="central",
    epsilon=1e-3,
    parameter_space="log",
)

print(report.summary())

The resulting macrostat.diff.DifferentiabilityReport contains:

nan_or_inf – whether any gradient contains NaNs or Infs.
rel_err_fwd_rev – relative discrepancy between forward- and reverse-mode autograd gradients.
rel_err_autodiff_num – relative discrepancy between autograd and numerical finite-difference gradients.

Rather than only returning a pass/fail flag, the report is intended to help you interpret how accurate the gradients are (e.g. “accurate to about 1e-3 relative error” on a given model and loss).

Set parameter_space="direct" if your model has any parameter equal to zero. The default is "log".

Per-parameter Jacobian comparison#

When the high-level check reports a disagreement, the next question is which parameter is responsible. compare_jacobian_dicts() performs an element-wise, per-parameter comparison of two Jacobian dictionaries and returns a JacobianComparisonReport with the following fields per parameter:

max_abs_diff, mean_abs_diff – absolute differences across all elements of the parameter’s Jacobian slice.
max_rel_diff, mean_rel_diff – the same, normalised by max(|a|, |b|) clipped to one.
num_close / num_elements – how many entries are within the atol/rtol tolerance.
has_nan_inf – whether either Jacobian contains NaNs or Infs.

A typical workflow:

from macrostat.diff import (
    JacobianAutograd, JacobianNumerical, compare_jacobian_dicts,
)

jac_auto = JacobianAutograd(model).compute(loss_fn=loss_fn, mode="rev")
jac_num = JacobianNumerical(model).compute(loss_fn=loss_fn, mode="central")

report = compare_jacobian_dicts(
    jac_auto, jac_num, method_a="autograd", method_b="numerical"
)
print(report.summary())

The summary prints one row per parameter, sorted by descending relative disagreement, so the offending parameters are at the top. See Jacobian diagnostics for a runnable walkthrough that applies this workflow to LINEAR2D and several GL06 variants.

Expected disagreements#

Two classes of “disagreement” between autograd and numerical Jacobians are expected and not bugs.

Constraint-derived parameters. When a model declares a LinearConstraint, the derived (residual) parameter is no longer free: its value is fixed by the adding-up identity. Autograd correctly reports a zero gradient for the derived parameter and absorbs the sensitivity into the free parameters of the same group. Numerical Jacobians computed by perturbing the free parameters agree with autograd; numerical Jacobians computed by perturbing the derived parameter directly will disagree, because such a perturbation would violate the constraint. See Constraints for the residual parameterization and Parameter constraints with LinearConstraint for a worked example.

Step-function operations. Operations like torch.where(x > 0, a, b), torch.clamp, and any other discontinuous switch are non-differentiable through the condition argument. Autograd reports zero gradient at the discontinuity, while finite differences see a finite jump whenever the perturbation straddles it. The disagreement is real but local, and it disappears when the step function is replaced by a smooth approximation. The Behavior base class provides diffwhere(), tanhmask(), diffmin(), and diffmax() for this purpose. Jacobian diagnostics shows how to localize this kind of disagreement to a specific parameter, timestep, and output.

Test model: LINEAR2D#

To validate the implementation itself, MacroStat ships with a tiny linear test model macrostat.models.LINEAR2D.LINEAR2D. Its dynamics are:

\[x_{t+1} = A x_t,\]

where \(x_t\) is a 2D state vector and \(A\) is a fully parameterised 2x2 matrix. This model has an analytical Jacobian, so it is used in the test suite to confirm that:

Forward and reverse autograd agree to high precision, and
Autograd gradients match finite-difference gradients up to a small relative error.

Because LINEAR2D has a parameter (x0_2) that defaults to zero, log-space perturbation is undefined for it, and its differentiability tests set parameter_space="direct" explicitly.