API Reference
multigrad
- class diffopt.multigrad.OnePointModel(aux_data: Any = None, comm: Any = None, loss_func_has_aux: bool = False, sumstats_func_has_aux: bool = False)[source]
Allows differentiable one-point calculations to be performed on separate MPI ranks, and automatically sums over each rank controlled by the comm. This is an abstract base class only. The user must personally define the calc_partial_sumstats_from_params and calc_loss_from_sumstats methods
- Parameters:
aux_data (Any (default=None)) – Any auxiliary data for easy access within sumstats or loss functions
comm (Comm (default=COMM_WORLD)) – MPI communicator
loss_func_has_aux (bool (default=False)) – If true, calc_partial_sumstats_from_params(x) -> (y, aux) and calc_loss_from_sumstats(y, aux) -> … signatures will be assumed
sumstats_func_has_aux (bool (default=False)) – If true, calc_loss_from_sumstats(…) -> (loss, aux) signature will be assumed
- calc_dloss_dparams(params, randkey=None)[source]
Calculate the gradient of the loss w.r.t. model parameters given
- Parameters:
params (array-like) – Model parameters
randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods
- Returns:
Gradient of the loss with respect to each parameter
- Return type:
array
- calc_loss_and_grad_from_params(params, randkey=None)[source]
Calculate the loss and its gradient.
This function returns the equivalent of (calc_loss_from_params(x), calc_dloss_dparams(x)) but it is significantly cheaper than calling them separately
- Parameters:
params (array-like) – Model parameters
randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods
- Returns:
float – The loss evaluated at the parameters given
array – Gradient of the loss with respect to each parameter
- calc_loss_from_params(params, randkey=None)[source]
Calculate the loss evaluated at a given set of parameters
- Parameters:
params (array-like) – Model parameters
randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods
- Returns:
The loss evaluated at the parameters given
- Return type:
float
- calc_loss_from_sumstats(sumstats, sumstats_aux=None, randkey=None)[source]
Custom method to map summary statistics to loss
- calc_partial_sumstats_from_params(params, randkey=None)[source]
Custom method to map parameters to summary statistics
- calc_sumstats_from_params(params, total=True, randkey=None)[source]
Compute summary statistics at given parameters
- Parameters:
params (array-like) – Model parameters
total (bool (default=True)) – If true (default), sumstats will be summed over all MPI ranks
randkey (PRNGKey | int (default=None)) – If set to a value other than None, the “randkey” kwarg will be passed to user-defined methods
- Returns:
Summary statistics evaluated at given parameters
- Return type:
array
- run_adam(guess, nsteps=100, param_bounds=None, learning_rate=0.01, randkey=None, const_randkey=False, thin=1, progress=True, comm=None)[source]
Run adam to descend the gradient and optimize the model parameters, given an initial guess. Stochasticity is allowed if randkey is passed.
- Parameters:
guess (array-like) – The starting parameters.
nsteps (int (default=100)) – The number of steps to take.
param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None
learning_rate (float (default=0.001)) – The adam learning rate.
randkey (int | PRNG Key (default=None)) – If given, a new PRNG Key will be generated at each iteration and be passed to calc_loss_and_grad_from_params() as the “randkey” kwarg
const_randkey (bool (default=False)) – By default, randkey is regenerated at each gradient descent iteration. Remove this behavior by setting const_randkey=True
thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters
progress (bool, optional) – Display tqdm progress bar, by default True
- Returns:
params (jnp.array) – The trial parameters at each iteration.
losses (jnp.array) – The loss values at each iteration.
- run_bfgs(guess, maxsteps=100, param_bounds=None, randkey=None, thin=1, progress=True, comm=None)[source]
Run BFGS to descend the gradient and optimize the model parameters, given an initial guess. Stochasticity must be held fixed via a random key
- Parameters:
guess (array-like) – The starting parameters.
maxsteps (int (default=100)) – The number of steps to take.
param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None
randkey (int | PRNG Key (default=None)) – Since BFGS requires a deterministic function, this key will be passed to calc_loss_and_grad_from_params() as the “randkey” kwarg as a constant at every iteration
thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters
progress (bool, optional) – Display tqdm progress bar, by default True
- Returns:
params (jnp.array) – The trial parameters at each iteration.
losses (jnp.array) – The loss values at each iteration.
result (OptimizeResult (contains the following attributes):) – message : str, describes reason of termination success : boolean, True if converged fun : float, minimum loss found x : array of parameters at minimum loss found jac : array of gradient of loss at minimum loss found nfev : int, number of function evaluations nit : int, number of gradient descent iterations
- run_lhs_param_scan(xmins, xmaxs, n_dim, num_evaluations, seed=None, randkey=None)[source]
Compute sumstat and loss values over a Latin Hypercube sample
- Parameters:
xmins (float | array-like) – Lower bound on each parameter
xmaxs (float | array-like) – Upper bound on each parameter
n_dim (int) – Number of parameters
num_evaluations (int) – Number of Latin Hypercube samples to draw and evaluate
seed (int (default=None)) – Seed to make LHD draws reproducible, randomized by default
randkey (PRNGKey | int (default=None)) – Random key passed to each sumstat and loss evaluation
- Returns:
params (array-like) – Parameters (drawn in Latin Hypercube shape)
sumstats (array-like) – Sumstats evaluated at each draw of parameters
losses (array-like) – Loss evaluated at each draw of parameters
- run_simple_grad_descent(guess, nsteps=100, learning_rate=0.01, thin=1, progress=True)[source]
Descend the gradient with a fixed learning rate to optimize parameters, given an initial guess. Stochasticity not allowed.
- Parameters:
guess (array-like) – The starting parameters.
nsteps (int (default=100)) – The number of steps to take.
learning_rate (float (default=0.001)) – The fixed learning rate.
thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters
progress (bool, optional) – Display tqdm progress bar, by default True
- Returns:
loss : array of loss values returned at each iteration params : array of trial parameters at each iteration aux : array of aux values returned at each iteration
- Return type:
GradientDescentResult (contains the following attributes)
- class diffopt.multigrad.OnePointGroup(models: tuple[OnePointModel, ...] | OnePointModel, main_comm: Any = None)[source]
Allows different OnePointModels to simultaneously perform their calc_loss_and_grad_from_params method. The results are summed.
- Parameters:
models (tuple[OnePointModel]) – Sequence of models, each providing a loss component to be summed.
main_comm (Comm (default=COMM_WORLD)) – MPI communicator for the entire group (each model should be assigned its own sub-communicator)
- diffopt.multigrad.split_subcomms(num_groups=None, ranks_per_group=None, comm=None)[source]
Split comm into sub-comms (not grouped by nodes)
- Parameters:
num_groups (int, optional) – Specify the number of evenly divided groups of subcomms
ranks_per_group (list[int], optional) – Specify the number of ranks given to each sub-comm
comm (MPI.Comm, optional) – Specify a sub-communicator to split into sub-sub-communicators
- Returns:
subcomm (MPI.Comm) – The sub-comm that now controls this process
num_groups (int) – The number of groups of subcomms (same as input if not None)
group_rank (int) – The rank of this group (0 <= subcomm_rank < num_subcomms)
- diffopt.multigrad.split_subcomms_by_node(comm=None)[source]
Split comm into sub-comms (grouped by nodes)
- Parameters:
comm (MPI.Comm, optional) – Specify a sub-communicator to split into sub-sub-communicators
- Returns:
subcomm (MPI.Comm) – The sub-comm that now controls this process
num_groups (int) – The number of groups of subcomms (= number of nodes)
group_rank (int) – The rank of this group (0 <= subcomm_rank < num_subcomms)
- diffopt.multigrad.reduce_sum(value, root=None, comm=None)[source]
Returns the sum of value across all MPI processes
- Parameters:
value (np.ndarray | float | int) – value input by each MPI process to be summed
root (int, optional) – rank of the process to receive and sum the values, by default None (broadcast result to all ranks)
comm (MPI.Intracomm (default = MPI.COMM_WORLD)) – option to pass a sub-communicator in case the operation is not performed by all MPI ranks
- Returns:
Sum of values given by each rank of the communicator
- Return type:
np.ndarray | float
kdescent
- class diffopt.kdescent.KPretrainer(kernel_centers: ndarray, fourier_positions: ndarray, kernelcov: ndarray, kde_counts: ndarray, kde_err: ndarray, fourier_counts: ndarray, fourier_err: ndarray, num_eval_kernels: int, num_eval_fourier_positions: int, num_pretrain_kernels: int, num_pretrain_fourier_positions: int, bandwidth_factor: float, fourier_range_factor: float, covariant_kernels: bool, inverse_density_weight_power: float, training_sum_of_weights: float, seed: int)[source]
Stores precomputed kernel and Fourier counts for training data, with kernel centers sampled from the training data PDF (via gaussian_kde). Provides save/load functionality.
- classmethod from_training_data(training_x, training_weights=None, num_eval_kernels=None, num_eval_fourier_positions=None, num_pretrain_kernels=None, num_pretrain_fourier_positions=None, bandwidth_factor=1.0, fourier_range_factor=1.0, covariant_kernels=False, inverse_density_weight_power=0.0, num_idw_draws=None, chunk_size=None, seed=0, comm=None)[source]
Create a pre-trained KPretrainer object from training data.
- Parameters:
training_x (array-like) – Training data of shape (n_data, n_features)
training_weights (array-like, optional) – Training weights of shape (n_data,), by default None
num_eval_kernels (int, optional) – Number of KDE kernels to appriximate the PDF, by default 10*ndim
num_eval_fourier_positions (int, optional) – Number of points to evaluate the ECF, by default 10*ndim
num_pretrain_kernels (int, optional) – Number of KDE kernels to precompute training data PDF, by default 300*num_eval_kernels
num_pretrain_fourier_positions (int, optional) – Number of points to precompute training data ECF, by default 300*num_eval_fourier_positions
bandwidth_factor (float, optional) – Increase or decrease the kernel bandwidth, by default 1.0
fourier_range_factor (float, optional) – Increase or decrease the Fourier search space, by default 1.0
covariant_kernels (bool, optional) – If True, kernels will align with the principle components of the training data, which can blow up kernel count values if cov matrix has near-zero eigenvalues. By default False
inverse_density_weight_power (float, optional) – At 1.0, this will weight the kernel selection by the inverse density of the training data. This is useful for selecting kernels in low-density regions. No selection weighting by default
num_idw_draws (int, optional) – Number of KDE draws + evaluations in total for the importance resampling to determine kernel selection with inverse density weighting. By default 100*num_pretrain_kernels
chunk_size (int, optional) – Chunk size for pre-computation of training KDE counts, to prevent memory overflow. If None, chunk_size will default to max(num_eval_kernels, num_eval_fourier_positions)
seed (int, optional) – Random seed for reproducibility, by default 0
comm (MPI Communicator, optional) – Distribute pre-computation of training kernel counts across ranks, assuming full training data is loaded and identical across ranks.
- class diffopt.kdescent.KCalc(pretrainer)[source]
- __init__(pretrainer)[source]
This KDE object is the fundamental building block of kdescent. It can be used to compare randomized evaluations of the PDF and ECF by training data to model predictions.
- Parameters:
pretrainer (KPretrainer) – A pre-trained KPretrainer object that precomputes possible kernel centers and their associated training data counts.
- compare_fourier_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[False] = False, comm: Any = None) Tuple[Array, Array][source]
- compare_fourier_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[True] = True, comm: Any = None) Tuple[Array, Array, Array]
Return randomly-placed evaluations of the ECF (Empirical Characteristic Function = Fourier-transformed PDF)
- Parameters:
x (array-like) – Model data of shape (n_model_data, n_features)
weights (array-like, optional) – Effective counts with shape (n_model_data,). If supplied, the ECF will be weighted as sum(weights * exp^(…)) at each evaluation in k-space instead of simply sum(exp^(…))
return_err (bool) – If true, also return the uncertainty of all training Fourier counts values according to the effective sample size (ESS) in each kernel
comm (MPI Communicator, optional) – For parallel computing, this guarantees consistent kernel placements by all MPI ranks within the comm, by default None. WARNING: Do not pass in an MPI communicator here if you plan on JIT compiling; just pass identical randkeys for each MPI rank
- Returns:
prediction (jnp.ndarray (complex-valued)) – CF evaluations measured on x. Has shape (num_kernels,)
truth (jnp.ndarray (complex-valued)) – CF evaluations measured on training_x. This is always different due to the random evaluation kernels. Has shape (num_kernels,)
err (jnp.ndarray) – Returned if return_err=True, uncertainties of each Fourier count in truth equal to truth/sqrt(ESS)
- compare_kde_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[False] = False, comm: Any = None) Tuple[Array, Array][source]
- compare_kde_counts(randkey: Any, x: Any, weights: Any = None, return_err: Literal[True] = True, comm: Any = None) Tuple[Array, Array, Array]
Realize kernel centers and return all kernel-weighted counts
- Parameters:
x (array-like) – Model data of shape (n_model_data, n_features)
weights (array-like, optional) – Effective counts with shape (n_model_data,). If supplied, function will return sum(weights * kernel_weights) within each kernel instead of simply sum(kernel_weights)
return_err (bool) – If true, also return the uncertainty of all training KDE counts values according to the effective sample size (ESS) in each kernel
comm (MPI Communicator, optional) – For parallel computing, this guarantees consistent kernel placements by all MPI ranks within the comm, by default None. WARNING: Do not pass in an MPI communicator here if you plan on JIT compiling; just pass identical randkeys for each MPI rank
- Returns:
prediction (jnp.ndarray) – KDE counts measured on x. Has shape (num_kernels,)
truth (jnp.ndarray) – KDE counts measured on training_x. This is always different due to the random kernel placements. Has shape (num_kernels,)
err (jnp.ndarray) – Returned if return_err=True, uncertainties of each KDE count in truth equal to truth/sqrt(ESS)
- diffopt.kdescent.adam(lossfunc, guess, nsteps=100, param_bounds=None, learning_rate=0.01, randkey=1, const_randkey=False, thin=1, progress=True, **other_kwargs)[source]
Perform gradient descent
- Parameters:
lossfunc (callable) – Function to be minimized via gradient descent. Must be compatible with jax.jit and jax.grad. Must have signature f(params, **other_kwargs)
guess (array-like) – The starting parameters.
nsteps (int, optional) – Number of gradient descent iterations to perform, by default 100
param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None
learning_rate (float, optional) – Initial Adam learning rate, by default 0.05
randkey (int, optional) – Random seed or key, by default 1. If not None, lossfunc must accept the “randkey” keyword argument, e.g. lossfunc(params, randkey=key)
const_randkey (bool, optional) – By default (False), randkey is regenerated at each gradient descent iteration. Remove this behavior by setting const_randkey=True
thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters
progress (bool, optional) – Display tqdm progress bar, by default True
- Returns:
params (jnp.array) – The trial parameters at each iteration.
losses (jnp.array) – The loss values at each iteration.
- diffopt.kdescent.bfgs(lossfunc, guess, maxsteps=100, param_bounds=None, randkey=None, thin=1, progress=True)[source]
Run BFGS to descend the gradient and optimize the model parameters, given an initial guess. Stochasticity must be held fixed via a random key
- Parameters:
lossfunc (callable) – Function to be minimized via gradient descent. Must be compatible with jax.jit and jax.grad. Must have signature f(params, **other_kwargs)
guess (array-like) – The starting parameters.
maxsteps (int, optional) – The maximum number of steps to take, by default 100.
param_bounds (Sequence, optional) – Lower and upper bounds of each parameter of “shape” (ndim, 2). Pass None as the bound for each unbounded parameter, by default None
randkey (int | PRNG Key, optional) – Since BFGS requires a deterministic function, this key will be passed to calc_loss_and_grad_from_params() as the “randkey” kwarg as a constant at every iteration, by default None
thin (int, optional) – Return parameters for every thin iterations, by default 1. Set thin=0 to only return final parameters
progress (bool, optional) – Display tqdm progress bar, by default True
- Returns:
params (jnp.array) – The trial parameters at each iteration.
losses (jnp.array) – The loss values at each iteration.
result (OptimizeResult (contains the following attributes):) – message : str, describes reason of termination success : boolean, True if converged fun : float, minimum loss found x : array of parameters at minimum loss found jac : array of gradient of loss at minimum loss found nfev : int, number of function evaluations nit : int, number of gradient descent iterations
multiswarm
- class diffopt.multiswarm.ParticleSwarm(nparticles, ndim, xlow, xhigh, seed=0, inertial_weight=1.0, cognitive_weight=0.21, social_weight=0.07, vmax_frac=0.4, ranks_per_particle=None, comm=None)[source]
- __init__(nparticles, ndim, xlow, xhigh, seed=0, inertial_weight=1.0, cognitive_weight=0.21, social_weight=0.07, vmax_frac=0.4, ranks_per_particle=None, comm=None)[source]
Initialize particles and MPI communicators to be used for PSO
- Parameters:
nparticles (int) – Number of particles (~100+ recommended)
ndim (int) – Dimensionality (i.e. number of model parameters to fit)
xlow (int | Array[int]) – Lower bounds on each parameter
xhigh (int | Array[int]) – Upper bounds on each parameter
seed (int | PRNGKey, optional) – Seed for all pseudo-randomness, by default 0
inertial_weight (float, optional) – Retain this fraction of the velocity from previous timestep, by default 1.0
cognitive_weight (float, optional) – Weight pulling particles towards their personal best location ever found, by default 0.21
social_weight (float, optional) – Weight pulling particles towards the global best location ever found, recommended ~1/3 of cognitive_weight, by default 0.07
vmax_frac (float, optional) – Maximum velocity particles are allowed to travel, as a fraction of their box width per dimension, by default 0.4
ranks_per_particle (int, optional) – Set this to manually control intra-particle parallelization, even if there are not enough ranks for nparticles * ranks_per_particle. By default (None), inter-particle parallelization is prioritized
comm (MPI.Comm, optional) – MPI Communicator, by default COMM_WORLD
- run_pso(lossfunc, nsteps=100, progress=True, keep_init_random_state=False)[source]
Run particle swarm optimization (PSO)
- Parameters:
lossfunc (callable) – The function we want to find the global minimum of. To be called with signature lossfunc(x) where x is an array of shape (ndim,)
nsteps (int, optional) – Number of time step iterations, by default 100
progress (bool, optional) – Display tqdm progress bar, by default True
keep_init_random_state (bool, optional) – Set True to be able to rerun an identical run, or False (default) to continue a run by manually setting swarm.x_init and swarm.v_init
- Returns:
- “swarm_x_history”np.ndarray of shape (nsteps, nparticles, ndim)
Position of all particles (trial params) at each time step
- ”swarm_v_history”: np.ndarray of shape (nsteps, nparticles, ndim)
Velocity of all particles at each time step
- ”swarm_loss_history”: np.ndarray of shape (nsteps, nparticles)
Loss of all particles at each time step
- ”runtime”: float
Time in seconds, as measured on each rank, to perform PSO
- Return type:
Results dictionary with the following keys
- diffopt.multiswarm.get_best_loss_and_params(loss_history, params_history)[source]
Return the best loss and its corresponding parameters from the full results arrays returned by run_pso()
- Parameters:
loss_history (Array[float] of shape (nsteps, nparticles)) – Loss of all particles at each time, given by “swarm_loss_history”
params_history (Array[float] of shape (nsteps, nparticles, ndim)) – Position of all particles at each time, given by “swarm_x_history”
- Returns:
float – Minimum loss value
nd.ndarray[float] – Parameters that produced the minimum loss
- diffopt.multiswarm.get_subcomm(nparticles, ranks_per_particle=None, comm=None, return_particles_on_this_rank=False)[source]
Initialize MPI communicators to be used for PSO
- Parameters:
nparticles (int) – Number of particles
ranks_per_particle (int, optional) – Set this to manually control intra-particle parallelization, even if there are not enough ranks for nparticles * ranks_per_particle. By default (None), inter-particle parallelization is prioritized
comm (MPI.Comm, optional) – MPI Communicator, by default COMM_WORLD
return_particles_on_this_rank (bool, optional) – If true, return tuple (subcomm, particles_on_this_rank). By default, only subcomm is returned
- Returns:
subcomm (MPI.Comm) – This rank’s subcommunicator, which can only talk to its “group”
particles_on_this_rank (list) – If return_particles_on_this_rank=True this list will be returned, specifying the indices of particles this group is responsible for