API Reference

multigrad

class diffopt.multigrad.OnePointModel(aux_data: Any = None, comm: Any = None, loss_func_has_aux: bool = False, sumstats_func_has_aux: bool = False)[source]

Allows differentiable one-point calculations to be performed on separate MPI ranks, and automatically sums over each rank controlled by the comm. This is an abstract base class only. The user must personally define the calc_partial_sumstats_from_params and calc_loss_from_sumstats methods

Parameters:
  • aux_data (Any (default=None)) – Any auxiliary data for easy access within sumstats or loss functions

  • comm (Comm (default=COMM_WORLD)) – MPI communicator

  • loss_func_has_aux (bool (default=False)) – If true, calc_partial_sumstats_from_params(x) -> (y, aux) and calc_loss_from_sumstats(y, aux) -> … signatures will be assumed

  • sumstats_func_has_aux (bool (default=False)) – If true, calc_loss_from_sumstats(…) -> (loss, aux) signature will be assumed

calc_dloss_dparams(params, randkey=None)[source]

Calculate the gradient of the loss w.r.t. model parameters given

Parameters:
  • params (array-like) – Model parameters

  • randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods

Returns:

Gradient of the loss with respect to each parameter

Return type:

array

calc_loss_and_grad_from_params(params, randkey=None)[source]

Calculate the loss and its gradient.

This function returns the equivalent of (calc_loss_from_params(x), calc_dloss_dparams(x)) but it is significantly cheaper than calling them separately

Parameters:
  • params (array-like) – Model parameters

  • randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods

Returns:

  • float – The loss evaluated at the parameters given

  • array – Gradient of the loss with respect to each parameter

calc_loss_from_params(params, randkey=None)[source]

Calculate the loss evaluated at a given set of parameters

Parameters:
  • params (array-like) – Model parameters

  • randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods

Returns:

The loss evaluated at the parameters given

Return type:

float

calc_loss_from_sumstats(sumstats, sumstats_aux=None, randkey=None)[source]

Custom method to map summary statistics to loss

calc_partial_sumstats_from_params(params, randkey=None)[source]

Custom method to map parameters to summary statistics

calc_sumstats_from_params(params, total=True, randkey=None)[source]

Compute summary statistics at given parameters

Parameters:
  • params (array-like) – Model parameters

  • total (bool (default=True)) – If true (default), sumstats will be summed over all MPI ranks

  • randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods

Returns:

Summary statistics evaluated at given parameters

Return type:

array

run_adam(guess, nsteps=100, param_bounds=None, learning_rate=0.01, randkey=None, const_randkey=False, thin=1, progress=True, comm=None)[source]

Run adam to descend the gradient and optimize the model parameters, given an initial guess. Stochasticity is allowed if randkey is passed.

Parameters:
  • guess (array-like) – The starting parameters.

  • nsteps (int (default=100)) – The number of steps to take.

  • param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None

  • learning_rate (float (default=0.001)) – The adam learning rate.

  • randkey (int | PRNG Key (default=None)) – If given, a new PRNG Key will be generated at each iteration and be passed to calc_loss_and_grad_from_params() as the “randkey” kwarg

  • const_randkey (bool (default=False)) – By default, randkey is regenerated at each gradient descent iteration. Remove this behavior by setting const_randkey=True

  • thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters

  • progress (bool, optional) – Display tqdm progress bar, by default True

Returns:

  • params (jnp.array) – The trial parameters at each iteration.

  • losses (jnp.array) – The loss values at each iteration.

run_bfgs(guess, maxsteps=100, param_bounds=None, randkey=None, thin=1, progress=True, comm=None)[source]

Run BFGS to descend the gradient and optimize the model parameters, given an initial guess. Stochasticity must be held fixed via a random key

Parameters:
  • guess (array-like) – The starting parameters.

  • maxsteps (int (default=100)) – The number of steps to take.

  • param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None

  • randkey (int | PRNG Key (default=None)) – Since BFGS requires a deterministic function, this key will be passed to calc_loss_and_grad_from_params() as the “randkey” kwarg as a constant at every iteration

  • thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters

  • progress (bool, optional) – Display tqdm progress bar, by default True

Returns:

  • params (jnp.array) – The trial parameters at each iteration.

  • losses (jnp.array) – The loss values at each iteration.

  • result (OptimizeResult (contains the following attributes):) – message : str, describes reason of termination success : boolean, True if converged fun : float, minimum loss found x : array of parameters at minimum loss found jac : array of gradient of loss at minimum loss found nfev : int, number of function evaluations nit : int, number of gradient descent iterations

run_lhs_param_scan(xmins, xmaxs, n_dim, num_evaluations, seed=None, randkey=None)[source]

Compute sumstat and loss values over a Latin Hypercube sample

Parameters:
  • xmins (float | array-like) – Lower bound on each parameter

  • xmaxs (float | array-like) – Upper bound on each parameter

  • n_dim (int) – Number of parameters

  • num_evaluations (int) – Number of Latin Hypercube samples to draw and evaluate

  • seed (int (default=None)) – Seed to make LHD draws reproducible, randomized by default

  • randkey (PRNGKey | int (default=None)) – Random key passed to each sumstat and loss evaluation

Returns:

  • params (array-like) – Parameters (drawn in Latin Hypercube shape)

  • sumstats (array-like) – Sumstats evaluated at each draw of parameters

  • losses (array-like) – Loss evaluated at each draw of parameters

run_simple_grad_descent(guess, nsteps=100, learning_rate=0.01, thin=1, progress=True)[source]

Descend the gradient with a fixed learning rate to optimize parameters, given an initial guess. Stochasticity not allowed.

Parameters:
  • guess (array-like) – The starting parameters.

  • nsteps (int (default=100)) – The number of steps to take.

  • learning_rate (float (default=0.001)) – The fixed learning rate.

  • thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters

  • progress (bool, optional) – Display tqdm progress bar, by default True

Returns:

loss : array of loss values returned at each iteration params : array of trial parameters at each iteration aux : array of aux values returned at each iteration

Return type:

GradientDescentResult (contains the following attributes)

class diffopt.multigrad.OnePointGroup(models: tuple[OnePointModel, ...] | OnePointModel, main_comm: Any = None)[source]

Allows different OnePointModels to simultaneously perform their calc_loss_and_grad_from_params method. The results are summed.

Parameters:
  • models (tuple[OnePointModel]) – Sequence of models, each providing a loss component to be summed.

  • main_comm (Comm (default=COMM_WORLD)) – MPI communicator for the entire group (each model should be assigned its own sub-communicator)

diffopt.multigrad.split_subcomms(num_groups=None, ranks_per_group=None, comm=None)[source]

Split comm into sub-comms (not grouped by nodes)

Parameters:
  • num_groups (int, optional) – Specify the number of evenly divided groups of subcomms

  • ranks_per_group (list[int], optional) – Specify the number of ranks given to each sub-comm

  • comm (MPI.Comm, optional) – Specify a sub-communicator to split into sub-sub-communicators

Returns:

  • subcomm (MPI.Comm) – The sub-comm that now controls this process

  • num_groups (int) – The number of groups of subcomms (same as input if not None)

  • group_rank (int) – The rank of this group (0 <= subcomm_rank < num_subcomms)

diffopt.multigrad.split_subcomms_by_node(comm=None)[source]

Split comm into sub-comms (grouped by nodes)

Parameters:

comm (MPI.Comm, optional) – Specify a sub-communicator to split into sub-sub-communicators

Returns:

  • subcomm (MPI.Comm) – The sub-comm that now controls this process

  • num_groups (int) – The number of groups of subcomms (= number of nodes)

  • group_rank (int) – The rank of this group (0 <= subcomm_rank < num_subcomms)

diffopt.multigrad.reduce_sum(value, root=None, comm=None)[source]

Returns the sum of value across all MPI processes

Parameters:
  • value (np.ndarray | float | int) – value input by each MPI process to be summed

  • root (int, optional) – rank of the process to receive and sum the values, by default None (broadcast result to all ranks)

  • comm (MPI.Intracomm (default = MPI.COMM_WORLD)) – option to pass a sub-communicator in case the operation is not performed by all MPI ranks

Returns:

Sum of values given by each rank of the communicator

Return type:

np.ndarray | float

kdescent

class diffopt.kdescent.KPretrainer(kernel_centers: ndarray, fourier_positions: ndarray, kernelcov: ndarray, kde_counts: ndarray, kde_err: ndarray, fourier_counts: ndarray, fourier_err: ndarray, num_eval_kernels: int, num_eval_fourier_positions: int, num_pretrain_kernels: int, num_pretrain_fourier_positions: int, bandwidth_factor: float, fourier_range_factor: float, covariant_kernels: bool, inverse_density_weight_power: float, training_sum_of_weights: float, seed: int)[source]

Stores precomputed kernel and Fourier counts for training data, with kernel centers sampled from the training data PDF (via gaussian_kde). Provides save/load functionality.

classmethod from_training_data(training_x, training_weights=None, num_eval_kernels=None, num_eval_fourier_positions=None, num_pretrain_kernels=None, num_pretrain_fourier_positions=None, bandwidth_factor=1.0, fourier_range_factor=1.0, covariant_kernels=False, inverse_density_weight_power=0.0, num_idw_draws=None, chunk_size=None, seed=0, comm=None)[source]

Create a pre-trained KPretrainer object from training data.

Parameters:
  • training_x (array-like) – Training data of shape (n_data, n_features)

  • training_weights (array-like, optional) – Training weights of shape (n_data,), by default None

  • num_eval_kernels (int, optional) – Number of KDE kernels to appriximate the PDF, by default 10*ndim

  • num_eval_fourier_positions (int, optional) – Number of points to evaluate the ECF, by default 10*ndim

  • num_pretrain_kernels (int, optional) – Number of KDE kernels to precompute training data PDF, by default 300*num_eval_kernels

  • num_pretrain_fourier_positions (int, optional) – Number of points to precompute training data ECF, by default 300*num_eval_fourier_positions

  • bandwidth_factor (float, optional) – Increase or decrease the kernel bandwidth, by default 1.0

  • fourier_range_factor (float, optional) – Increase or decrease the Fourier search space, by default 1.0

  • covariant_kernels (bool, optional) – If True, kernels will align with the principle components of the training data, which can blow up kernel count values if cov matrix has near-zero eigenvalues. By default False

  • inverse_density_weight_power (float, optional) – At 1.0, this will weight the kernel selection by the inverse density of the training data. This is useful for selecting kernels in low-density regions. No selection weighting by default

  • num_idw_draws (int, optional) – Number of KDE draws + evaluations in total for the importance resampling to determine kernel selection with inverse density weighting. By default 100*num_pretrain_kernels

  • chunk_size (int, optional) – Chunk size for pre-computation of training KDE counts, to prevent memory overflow. If None, chunk_size will default to max(num_eval_kernels, num_eval_fourier_positions)

  • seed (int, optional) – Random seed for reproducibility, by default 0

  • comm (MPI Communicator, optional) – Distribute pre-computation of training kernel counts across ranks, assuming full training data is loaded and identical across ranks.

classmethod load(filename)[source]

Load a pre-trained object from disk .npz file

save(filename)[source]

Save the pre-trained object to disk as a .npz numpy zip file

class diffopt.kdescent.KCalc(pretrainer)[source]
__init__(pretrainer)[source]

This KDE object is the fundamental building block of kdescent. It can be used to compare randomized evaluations of the PDF and ECF by training data to model predictions.

Parameters:

pretrainer (KPretrainer) – A pre-trained KPretrainer object that precomputes possible kernel centers and their associated training data counts.

compare_fourier_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[False] = False, comm: Any = None) Tuple[Array, Array][source]
compare_fourier_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[True] = True, comm: Any = None) Tuple[Array, Array, Array]

Return randomly-placed evaluations of the ECF (Empirical Characteristic Function = Fourier-transformed PDF)

Parameters:
  • x (array-like) – Model data of shape (n_model_data, n_features)

  • weights (array-like, optional) – Effective counts with shape (n_model_data,). If supplied, the ECF will be weighted as sum(weights * exp^(…)) at each evaluation in k-space instead of simply sum(exp^(…))

  • return_err (bool) – If true, also return the uncertainty of all training Fourier counts values according to the effective sample size (ESS) in each kernel

  • comm (MPI Communicator, optional) – For parallel computing, this guarantees consistent kernel placements by all MPI ranks within the comm, by default None. WARNING: Do not pass in an MPI communicator here if you plan on JIT compiling; just pass identical randkeys for each MPI rank

Returns:

  • prediction (jnp.ndarray (complex-valued)) – CF evaluations measured on x. Has shape (num_kernels,)

  • truth (jnp.ndarray (complex-valued)) – CF evaluations measured on training_x. This is always different due to the random evaluation kernels. Has shape (num_kernels,)

  • err (jnp.ndarray) – Returned if return_err=True, uncertainties of each Fourier count in truth equal to truth/sqrt(ESS)

compare_kde_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[False] = False, comm: Any = None) Tuple[Array, Array][source]
compare_kde_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[True] = True, comm: Any = None) Tuple[Array, Array, Array]

Realize kernel centers and return all kernel-weighted counts

Parameters:
  • x (array-like) – Model data of shape (n_model_data, n_features)

  • weights (array-like, optional) – Effective counts with shape (n_model_data,). If supplied, function will return sum(weights * kernel_weights) within each kernel instead of simply sum(kernel_weights)

  • return_err (bool) – If true, also return the uncertainty of all training KDE counts values according to the effective sample size (ESS) in each kernel

  • comm (MPI Communicator, optional) – For parallel computing, this guarantees consistent kernel placements by all MPI ranks within the comm, by default None. WARNING: Do not pass in an MPI communicator here if you plan on JIT compiling; just pass identical randkeys for each MPI rank

Returns:

  • prediction (jnp.ndarray) – KDE counts measured on x. Has shape (num_kernels,)

  • truth (jnp.ndarray) – KDE counts measured on training_x. This is always different due to the random kernel placements. Has shape (num_kernels,)

  • err (jnp.ndarray) – Returned if return_err=True, uncertainties of each KDE count in truth equal to truth/sqrt(ESS)

diffopt.kdescent.adam(lossfunc, guess, nsteps=100, param_bounds=None, learning_rate=0.01, randkey=1, const_randkey=False, thin=1, progress=True, **other_kwargs)[source]

Perform gradient descent

Parameters:
  • lossfunc (callable) – Function to be minimized via gradient descent. Must be compatible with jax.jit and jax.grad. Must have signature f(params, **other_kwargs)

  • guess (array-like) – The starting parameters.

  • nsteps (int, optional) – Number of gradient descent iterations to perform, by default 100

  • param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None

  • learning_rate (float, optional) – Initial Adam learning rate, by default 0.05

  • randkey (int, optional) – Random seed or key, by default 1. If not None, lossfunc must accept the “randkey” keyword argument, e.g. lossfunc(params, randkey=key)

  • const_randkey (bool, optional) – By default (False), randkey is regenerated at each gradient descent iteration. Remove this behavior by setting const_randkey=True

  • thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters

  • progress (bool, optional) – Display tqdm progress bar, by default True

Returns:

  • params (jnp.array) – The trial parameters at each iteration.

  • losses (jnp.array) – The loss values at each iteration.

diffopt.kdescent.bfgs(lossfunc, guess, maxsteps=100, param_bounds=None, randkey=None, thin=1, progress=True)[source]

Run BFGS to descend the gradient and optimize the model parameters, given an initial guess. Stochasticity must be held fixed via a random key

Parameters:
  • lossfunc (callable) – Function to be minimized via gradient descent. Must be compatible with jax.jit and jax.grad. Must have signature f(params, **other_kwargs)

  • guess (array-like) – The starting parameters.

  • maxsteps (int, optional) – The maximum number of steps to take, by default 100.

  • param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None

  • randkey (int | PRNG Key, optional) – Since BFGS requires a deterministic function, this key will be passed to calc_loss_and_grad_from_params() as the “randkey” kwarg as a constant at every iteration, by default None

  • thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters

  • progress (bool, optional) – Display tqdm progress bar, by default True

Returns:

  • params (jnp.array) – The trial parameters at each iteration.

  • losses (jnp.array) – The loss values at each iteration.

  • result (OptimizeResult (contains the following attributes):) – message : str, describes reason of termination success : boolean, True if converged fun : float, minimum loss found x : array of parameters at minimum loss found jac : array of gradient of loss at minimum loss found nfev : int, number of function evaluations nit : int, number of gradient descent iterations

multiswarm

class diffopt.multiswarm.ParticleSwarm(nparticles, ndim, xlow, xhigh, seed=0, inertial_weight=1.0, cognitive_weight=0.21, social_weight=0.07, vmax_frac=0.4, ranks_per_particle=None, comm=None)[source]
__init__(nparticles, ndim, xlow, xhigh, seed=0, inertial_weight=1.0, cognitive_weight=0.21, social_weight=0.07, vmax_frac=0.4, ranks_per_particle=None, comm=None)[source]

Initialize particles and MPI communicators to be used for PSO

Parameters:
  • nparticles (int) – Number of particles (~100+ recommended)

  • ndim (int) – Dimensionality (i.e. number of model parameters to fit)

  • xlow (int | Array[int]) – Lower bounds on each parameter

  • xhigh (int | Array[int]) – Upper bounds on each parameter

  • seed (int | PRNGKey, optional) – Seed for all pseudo-randomness, by default 0

  • inertial_weight (float, optional) – Retain this fraction of the velocity from previous timestep, by default 1.0

  • cognitive_weight (float, optional) – Weight pulling particles towards their personal best location ever found, by default 0.21

  • social_weight (float, optional) – Weight pulling particles towards the global best location ever found, recommended ~1/3 of cognitive_weight, by default 0.07

  • vmax_frac (float, optional) – Maximum velocity particles are allowed to travel, as a fraction of their box width per dimension, by default 0.4

  • ranks_per_particle (int, optional) – Set this to manually control intra-particle parallelization, even if there are not enough ranks for nparticles * ranks_per_particle. By default (None), inter-particle parallelization is prioritized

  • comm (MPI.Comm, optional) – MPI Communicator, by default COMM_WORLD

run_pso(lossfunc, nsteps=100, progress=True, keep_init_random_state=False)[source]

Run particle swarm optimization (PSO)

Parameters:
  • lossfunc (callable) – The function we want to find the global minimum of. To be called with signature lossfunc(x) where x is an array of shape (ndim,)

  • nsteps (int, optional) – Number of time step iterations, by default 100

  • progress (bool, optional) – Display tqdm progress bar, by default True

  • keep_init_random_state (bool, optional) – Set True to be able to rerun an identical run, or False (default) to continue a run by manually setting swarm.x_init and swarm.v_init

Returns:

“swarm_x_history”np.ndarray of shape (nsteps, nparticles, ndim)

Position of all particles (trial params) at each time step

”swarm_v_history”: np.ndarray of shape (nsteps, nparticles, ndim)

Velocity of all particles at each time step

”swarm_loss_history”: np.ndarray of shape (nsteps, nparticles)

Loss of all particles at each time step

”runtime”: float

Time in seconds, as measured on each rank, to perform PSO

Return type:

Results dictionary with the following keys

diffopt.multiswarm.get_best_loss_and_params(loss_history, params_history)[source]

Return the best loss and its corresponding parameters from the full results arrays returned by run_pso()

Parameters:
  • loss_history (Array[float] of shape (nsteps, nparticles)) – Loss of all particles at each time, given by “swarm_loss_history”

  • params_history (Array[float] of shape (nsteps, nparticles, ndim)) – Position of all particles at each time, given by “swarm_x_history”

Returns:

  • float – Minimum loss value

  • nd.ndarray[float] – Parameters that produced the minimum loss

diffopt.multiswarm.get_subcomm(nparticles, ranks_per_particle=None, comm=None, return_particles_on_this_rank=False)[source]

Initialize MPI communicators to be used for PSO

Parameters:
  • nparticles (int) – Number of particles

  • ranks_per_particle (int, optional) – Set this to manually control intra-particle parallelization, even if there are not enough ranks for nparticles * ranks_per_particle. By default (None), inter-particle parallelization is prioritized

  • comm (MPI.Comm, optional) – MPI Communicator, by default COMM_WORLD

  • return_particles_on_this_rank (bool, optional) – If true, return tuple (subcomm, particles_on_this_rank). By default, only subcomm is returned

Returns:

  • subcomm (MPI.Comm) – This rank’s subcommunicator, which can only talk to its “group”

  • particles_on_this_rank (list) – If return_particles_on_this_rank=True this list will be returned, specifying the indices of particles this group is responsible for