NCGR

ncgr

Created on Tue Feb 25 15:55:31 2020

@author: arlan

class ncgr.ncgr_fullfield(fcst_netcdf, hc_netcdf, obs_netcdf, out_netcdf, a, b, model_dict, obs_dict, clim_netcdf=None, terc_interp=None, sigma_eqn='s3', es_tol=0.05, pred_pval=0.05, options=None)
  • Performs non-homogeneous censored gaussian regression (NCGR) [1] on a forecast ice-free date (IFD) or freeze-up date (FUD) field using input NetCDF files.

  • An output NetCDF file is created that contains several quantities relevant to calibration.

Args:
fcst_netcdf (str):

Path of the NetCDF file containing the ensemble forecast IFDs or FUDs to be calibrated. Requirements for the file are as follows:

  • The structure/shape of the IFD/FUD variable should be: (time, ensemble members, latitude, longitude), although the actual names of those dimensions and corresponding variables (if they are included in the NetCDF files) can be anything.

  • Relevant variable names and dimension names are to be specified by the user with the model_dict argument provided to this function.

  • The variable for the time coordinate, which should represent the forecast initialization date, should follow CF conventions (https://cfconventions.org/). In particular, it’s necessary for the initialization year be retrievable from the time variable using the netCDF4.num2date() function (https://unidata.github.io/netcdf4-python/netCDF4/index.html).

  • Masked locations will be ignored in calibration and will be written as masked locations in the output NetCDF file.

hc_netcdf (str):

Path of the NetCDF file containing the model ensemble forecast IFDs or FUDs used to train NCGR. This file should contain several years of model re-forecasts (hindcasts), and may in fact include data for years after the year of the forecast being calibrated (e.g. if leave-one-out cross-validation is being performed). The requirements for the file are the same as for fcst_netcdf, except that the variable corresponding to the time coordinate should contain several dates (i.e. the initialization date for each forecast).

obs_netcdf (str):

Path of the NetCDF file containing the observed IFDs or FUDs used to train NCGR. This file should contain several years of observed IFDs/FUDs (corresponding to what was verified for the model hindcasts in hc_netcdf). Requirements for the file are as follows:

  • Relevant variable names and dimension names are to be specified by the user with the obs_dict argument provided to this function.

  • The structure/shape of the IFD/FUD variable should be: (time, latitude, longitude), although the actual names of those dimensions and corresponding variables (if they are included in the NetCDF files) can be anything.

  • The variable for the time coordinate, which should represent the initialization dates of the re-forecasts in hc_netcdf, should follow CF conventions (https://cfconventions.org/). In particular, it’s necessary for the initialization year be retrievable from the time variable using the netCDF4.num2date() function (https://unidata.github.io/netcdf4-python/netCDF4/index.html).

  • Masked locations will be ignored in calibration and will be written as masked locations in the output NetCDF file.

out_netcdf (str):

The absolute path of the NetCDF file to be written that contains relevant calibrated forecast quantities. For more information on this file see ncgr_fullfield.write_output().

a (float or int):

Minimum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to a when the event has occurred at the time of forecast initialization.

b (float or int):

Maximum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to b when the event does not occur by the end of the season.

model_dict (tuple(dict,dict)):

A tuple of two dictionaries. The first specifies the relevant variables in the NetCDF files assigned to fcst_netcdf and hc_netcdf. The second specifies the relevant demension names in those same files

For the first dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:

‘event_vn’: str: Variable name for the ice-free date or freeze-up date field(s)

‘time_vn’: str: Variable name for the forecast initialization date

For the second dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:

‘time_dn’: str: Dimension name for the forecast initialization date

‘ens_dn’: str: Dimension name for the forecast realization/ensemble member

Example:
model_dict = ({'event_vn' : 'ifd',
               'time_vn' : 'time'},
              {'time_dn' : 'time',
               'ens_dn' : 'ensemble'})
obs_dict (tuple(dict,dict)):

A tuple of two dictionaries. The first specifies the relevant variables in the NetCDF file assigned to obs_netcdf. The second specifies the relevant demension names in that same file.

For the first dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:

‘event_vn’: str: Variable name for ice-free date or freeze-up date field(s)

‘time_vn’: str: Variable name for the forecast initialization date

For the second dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:

‘time_dn’: str: Dimension name for the forecast initialization date

Example:
obs_dict = ({'event_vn' : 'ifd', 
               'time_vn' : 'time'},
              {'time_dn' : 'time'}) 
clim_netcdf (str, optional):

The absolute path of the NetCDF file that contains several years of observed IFDs or FUDs used to construct the climatology that forecast IFDs/FUDs will be in reference to. If this is included, forecast probabilities for each the early, near-normal, and late events, as well as ensemble-mean anomalies are included in the out_netcdf file. Requirements for this file are the same as those for obs_netcdf.

terc_interp (str or None, optional):

Interpolation scheme used to compute the terciles for the observed climatology. Default is None. Can be one of the following:

  • None: By default y_clim data are fit to the DCNORM distribution,

and terciles are computed using its _ppf() method.

  • ‘HD’: Estimate terciles using the Harrell-Davis estimator (see :py:class:scipy.stats.mstats.hdquantiles)

  • ‘nearest-rank’: Nearest rank or rank order method (see

https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method)

  • Any of the interpolation arguments for numpy.percentile.

sigma_eqn (str, optional):

Refers to the regression equation to be used for the \(\sigma\) parameter in the NCGR model. This can be one of ‘s1’, ‘s2’, or ‘s3’. These are defined by the regression equations below as \(\sigma_{I}\), \(\sigma_{II}\), and \(\sigma_{III}\), respectively. By default, sigma_eqn='s3' (i.e. \(\sigma_{III}\)).

es_tol (float or None, optional):

Early stopping threshold used for minimizing the CRPS. By default es_tol=0.05. Specifically, this argument sets the tol argument in scipy.optimize.minimize(method=’SLSQP’).

pred_pval (float, optional):

The p-value for determining statistical significance of the second predictor for the \(\sigma_{II}\) or \(\sigma_{III}\) regression equations. By default, pred_pval=0.05.

options (dict, optional):

A dictionary of options to pass to :py:method:’scipy.optimize.minimize’ corresponding to its options argument (see scipy.optimize.minimize(method=’SLSQP’)).

The following provides a brief description of NCGR; for a full description, see [1].

NCGR assumes the observed IFD or FUD, \(Y(t)\) (a random variable), conditioned on the ensemble forecast \(x_1(t),...,x_n(t)\) follows a DCNORM distribution – i.e. \(Y(t)|x_1(t),...,x_n(t)\sim N_{dc}(\mu(t),\sigma(t))\). The parameter \(\mu\) is modelled as

\[\mu(t) = \alpha_1\mu_{c}(t) + \alpha_2 x_{\langle i \rangle}^{d}(t)\]

The user can choose one of the following equations for modelling the paremter \(\sigma\)

\[\begin{split}\sigma_{I}(t) &=\beta_1\sigma_{c}, \\ \sigma_{II}(t) &=\beta_1\sigma_{c}+\beta_2 s_x(t), \\ \sigma_{III}(t) &=\beta_1\sigma_{c}+\beta_2 x_{\langle i \rangle}^{tc}(t)\end{split}\]

through the sigma argument, but by default \(\sigma=\sigma_{III}\).

The relevant methods contained in this class are:

calibrate_fullfield()

Performs NCGR on the forecast IFD or FUD field.

write_output()

Creates a NetCDF file containing several relevant fields to the NCGR-calibrated forecast.

1

Dirkson, A.​, B. Denis., M.,Sigmond., & Merryfield, W.J. (2020). Development and Calibration of SeasonalProbabilistic Forecasts of Ice-free Dates and Freeze-up Dates. ​Weather and Forecasting​. doi:10.1175/WAF-D-20-0066.1.

calibrate_fullfield()

Loops through the spatial grid and performs NCGR at each grid point.

Returns:
result (ndarray object):

An object array. If clim_netcdf is included as an argument to ncgr_fullfield, results is an object containing six arrays corresponding to the following variables:

mu_cal (ndarray), shape (nrow, ncol):

Calibrated \(\mu\) parameter for the predictive DCNORM distribution, where nrow is the number of rows and ncol is the number of columns provided as spatial coordinates in the NetCDF files given to ncgr_fullfield.

sigma_cal (ndarray), shape (nrow, ncol):

Calibrated \(\sigma\) parameter for the predictive DCNORM distribution, where nrow and ncol are defined as in mu_cal.

fcst_probs (ndarray), shape (3, nrow, ncol):

The three forecast probabilities for early, near-normal, and late sea-ice retreat/advance, where nrow and ncol are defined as in mu_cal.

fcst_pre (ndarray), shape (nrow, ncol):

The probability for the pre-occurrence of the event.

fcst_non (ndarray), shape (nrow, ncol):

The probability for the non-occurrence of the event.

clim_terc (ndarray), shape (2, nrow, ncol):

The two terciles (i.e. 1/3 and 2/3 quantiles) for the observed climatology, where nrow and ncol are defined as in mu_cal.

mean (float):

Expected value of the forecast DCNORM distribution.

mean_anom (float):

Anomaly of mean relative to climatology.

If climatology is not included as an argument to ncgr_fullfield, results contains two arrays corresponding to the following variables:

mu_cal (ndarray), shape (nrow, ncol):

Calibrated \(\mu\) parameter for the predictive DCNORM distribution, where nrow is the number of rows and ncol is the number of columns provided as spatial coordinates in the NetCDF files given to ncgr_fullfield.

sigma_cal (ndarray), shape (nrow, ncol):

Calibrated \(\sigma\) parameter for the predictive DCNORM distribution, where nrow and ncol are defined as in mu_cal.

fcst_pre (ndarray), shape (nrow, ncol):

The probability for the pre-occurrence of the event.

fcst_non (ndarray), shape (nrow, ncol):

The probability for the non-occurrence of the event.

write_output(result)

Writes the variables provided as arguments to this function to the NetCDF file out_netcdf, an argument provided to ncgr_fullfield. Note that this will remove and replace any previous version of out_netcdf.

Args:
mu_cal (ndarray), shape (nrow, ncol):

Calibrated \(\mu\) parameter for the predictive DCNORM distribution, where nrow is the number of rows and ncol is the number of columns provided as spatial coordinates in the NetCDF files given to ncgr_fullfield().

sigma_cal (ndarray), shape (nrow, ncol):

Calibrated \(\sigma\) parameter for the predictive DCNORM distribution, where nrow and ncol are defined as in mu_cal.

fcst_probs (ndarray), shape (3, nrow, ncol):

The three forecast probabilities for early, near-normal, and late sea-ice retreat/advance, where nrow and ncol are defined as in mu_cal.

fcst_pre (ndarray), shape (nrow, ncol):

Probability for the pre-occurrence of the event.

fcst_non (ndarray), shape (nrow, ncol):

Probability for the non-occurrence of the event.

clim_terc (ndarray), shape (2, nrow, ncol):

The two terciles (i.e. 1/3 and 2/3 quantiles) for the observed climatology, where nrow and ncol are defined as in mu_cal.

ens_mean (float):

Expected value of the forecast DCNORM distribution.

ens_mean_anom (float):

Anomaly of ens_mean relative to the climatological mean.


class ncgr.ncgr_gridpoint(a, b)
Args:
a (float or int):

Minimum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to a when the event has occurred at the time of forecast initialization.

b (float or int):

Maximum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to b when the event does not occur by the end of the season.

The following provides a brief description of NCGR; for a full description, see [1]_.

NCGR assumes the observed IFD or FUD, \(Y(t)\) (a random variable), conditioned on the ensemble forecast \(x_1(t),...,x_n(t)\) follows a DCNORM distribution – i.e. \(Y(t)|x_1(t),...,x_n(t)\sim N_{dc}(\mu(t),\sigma(t))\). The parameter \(\mu\) is modelled as

\[\mu(t) = lpha_1\mu_{c}(t) + lpha_2 x_{\langle i \]

angle}^{d}(t)

The user can choose one of the following equations for modelling the paremter \(\sigma\)

\[\sigma_{I}(t) &=eta_1\sigma_{c}, \ \sigma_{II}(t) &=eta_1\sigma_{c}+eta_2 s_x(t), \ \sigma_{III}(t) &=eta_1\sigma_{c}+eta_2 x_{\langle i \]

angle}^{tc}(t)

through the sigma argument, but by default \(\sigma=\sigma_{III}\).

The relevant method contained in this class is:

calibrate_gridpoint()

Performs NCGR on the forecast IFD or FUD at a single gridpoint.

%(after_notes)s

1

Dirkson, A.​, B. Denis., M.,Sigmond., & Merryfield, W.J. (2020). Development and Calibration of SeasonalProbabilistic Forecasts of Ice-free Dates and Freeze-up Dates. ​Weather and Forecasting​. doi:10.1175/WAF-D-20-0066.1.

build_model(X_t, X, Y, tau_t, t, sigma_eqn='s3', pred_pval=0.05)
Args:
X_t (ndarray), shape (n,):

Forecast to be calibrated, where n is the ensemble size.

X (ndarray), shape (N,n):

Forecasts for training NCGR, where N is the number of years for training and n is the ensemble size for a given forecast.

Y (ndarray), shape (N,):

Observations for training NCGR, where N is the number of years for training.

tau_t (ndarray), shape (N,):

Years corresponding to those used for training period (this should not contain the forecast year). The year should be based on the initialization date, not the date of the IFD or FUD event.

t (float or int):

The year in which the forecast is initialized.

sigma_eqn (str, optional):

Refers to the regression equation to be used for the \(\sigma\) parameter in the NCGR model. This can be one of ‘s1’, ‘s2’, or ‘s3’. These are defined by the regression equations below as \(\sigma_{I}\), \(\sigma_{II}\), and \(\sigma_{III}\), respectively. By default, sigma='s3' (i.e. \(\sigma_{III}\)).

pred_pval (float, optional):

The p-value for determining statistical significance of the second predictor for the \(\sigma_{II}\) or \(\sigma_{III}\) regression equations. By default, pred_pval=0.05.

Returns:
predictors_tau (ndarray or ndarray object), shape (2,2,N) or (2,)

Contains the predictors for both mu and sigma over the training period. It’s an object if mu and sigma have different numbers of predictors (i.e. when there is no second term in the equation for sigma). It’s an array if mu and sigma have the same number of predictors. Regardless, the following is always true for the first two entries:

predictors_tau[0] (ndarray), shape (2, N):

The predictors for mu over the training period

predictors_tau[1] (ndarray), shape (1, N) or (2, N):

The predictor(s) for sigma over the training period

predictors_t (ndarray or object array), shape (2,2) or shape (2,)

Contains the predictors for both mu and sigma for the forecast year t. It’s an object if mu and sigma have different numbers of predictors (i.e. when there is no second term in the equation for sigma). It’s an array if mu and sigma have the same number of predictors. Regardless, the following is always true for the first two entries:

predictors_tau[0] (ndarray), shape (2,):

The predictors for mu for the forecast year t

predictors_tau[1] (ndarray), shape (1,) or (2,):

The predictor(s) for sigma for the forecast year t

coeffs0 (array), shape (3,) or (4,):

Initial guesses for the coefficients in the NCGR regression equations.

optimizer(Y, predictors_tau, coeffs0, es_tol=0.05, options=None)

Function for minimizing the CRPS over N forecasts. For this, scipy.optimize.minimize(method=’SLSQP’) is called to minimize the analytic expression for the CRPS when the distribution under consideration for the forecast is the DCNORM distribution.

Args:
Y (ndarray), shape (N,):

Observations for training NCGR, where N is the number of years for training.

predictors_tau (ndarray or object of ndarray), shape (2,) or (2,4,N)

Predictors over the training period. If an object of ndarray, then predictors_tau[0] has shape (2,N) corresponding to predictors for \(mu\), and predictors_tau[1] has shape (N,) corresponding to the single predictor for \(sigma\).

coeffs0 (array), shape (3,) or (4,):

Initial guesses for the coefficients in the NCGR regression equations.

es_tol (float or None, optional):

Early stopping threshold used for minimizing the CRPS. By default es_tol=0.05. Specifically, this argument sets the tol argument in scipy.optimize.minimize(method=’SLSQP’).

options (dict, optional):

A dictionary of options to pass to :py:method:’scipy.optimize.minimize’ corresponding to its options argument (see scipy.optimize.minimize(method=’SLSQP’)).

Returns:
coeffs (array):

Optimized regression coefficients.


class ncgr.crps_funcs(a, b)

This class contains functions needed to perform CRPS minimization for the DCNORM distribution. It also contains a function for computing the CRPS when the forecast distribution takes the form of a DCNORM distribution (as it does for NCGR).

Args:
a (float or int):

Minimum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year.

b (float or int):

Maximum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year. The b argument must be larger than the a argument.

The methods contained in this class are:

crps_dcnorm()

Computes the CRPS for a set of forecsts and observations when the predictive distribution takes the form of a DCNORM distribution.

crps_ncgr()

The cost function used when executing scipy.optimize.mimize in the calibrate method. Computes the mean CRPS as a function of a set of hindcast CDFs (modelled by NCGR) and observed dates.

crps_ncgr_jac()

Called on in the calibrate method. Computes the jacobian matrix for the CRPS cost function.

crps_singleyear()

Called on in the calibrate method. Computes the CRPS for a single forecast CDF (modelled as a DCNORM distribution) and observation.

crps_dcnorm(y, mu, sigma)

Time mean continuous rank probability score (CRPS) when the distribution takes the form of a DCNORM distribution.

Args:
y (ndarray), shape (n,):

Observed dates, where n is the number of forecast/observation pairs.

mu (ndarray), shape (n,):

DCNORM parameter \(\mu\) for each of the 1,…,n forecast distributions.

sigma (ndarray), shape (n,):

DCNORM parameter \(\sigma\) for each of the 1,…,n forecast distributions.

Returns:
result (float):

Time mean CRPS.

crps_dcnorm_single(y, mu, sigma)

Continuous rank probability score (CRPS) for a single forecast when the distribution takes the form of a DCNORM distribution.

Args:
y (float or int):

Observed date.

mu (float or int):

DCNORM parameter \(\mu\).

sigma (float or int):

DCNORM parameter \(\sigma\)

Returns:
result (float):

CRPS

crps_ncgr(coeffs, predictors, y)
Args:
coeffs (list), shape (m,):

Coefficients in the NCGR regression equations, where m is the total number of coefficients/predictors. The first two values correspond to those for \(\mu\) and the remaining values correspond to those for \(\sigma\).

predictors (object), shape (n,):

Object containing the predictors, where n=2 is the number of distribution parameters. The shape of either predictors[0] or predictors[1] is (m,p), where m is the number of coefficients/predictors for the corresponding parameter, and p is the number of years in the training period self.tau_t.

y (ndarray), shape (p,):

Array of observed dates, where p is the number of years in the training period self.tau_t.

Returns:

The time-averaged continuous rank probability score (CRPS).

crps_ncgr_jac(coeffs, predictors, y)
Args:
coeffs (list), shape (n+m):

Coefficients in the NCGR regression equations, where n=2 is the number of distribution parameters and m is the number of predictors for a given parameter. The first two values are the coefficients for \(\mu\) and the remaining values are the coefficeints for \(\sigma\).

predictors (object), shape (n,):

Object containing the predictors, where n=2 is the number of distribution parameters. The shape of either predictors[0] or predictors[1] is (m,p), where m is the number of coefficients/predictors for the corresponding parameter, and p is the number of years in the training period self.tau_t.

y (ndarray), shape (p):

Array of observed dates, where p is the number of years in the training period self.tau_t.

Returns:
(ndarray), shape (m,):

The jacobian matrix of the time-averaged continuous rank probability score.

crps_ncgr_sy(params, y)

Computes the continuous rank probability score for a single forecast with DCNORM distribution.

Args:
params (list), shape (2,):

List containing the two DCNORM distribution parameters \(\mu\) and \(\sigma\).

y (float):

The observation.

Returns:
result (float):

The CRPS.


class ncgr.fcst_vs_clim(a, b)

Contains functions for computing forecast quantities relative to observed climataology.

Args:
a (float or int):

Minimum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year.

b (float or int):

Maximum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year. The b argument must be larger than the a argument.

fill_value (float):

The flag value given to an event probability when it doesn’t make sense to compute one; this occurs when the climatological terciles are equal to each other, for example.

event_probs(mu, sigma, y_clim, terc_interp=None)

Computes the forecast probabilities for an early, normal, or late event relative to some defined climatology.

Args:
mu (float):

The mu parameter for the DCNORM distribution

sigma (float):

The sigma parameter for the DCNORM distribution

y_clim (ndarray), shape (N,):

Array of climatological dates, where N is the number of dates (equiavalently years) used to compute climatological statistics.

terc_interp (str or None, optional):

Interpolation scheme used to compute the terciles for the observed climatology. Default is None. Can be one of the following:

  • None: By default y_clim data are fit to the DCNORM distribution,

and terciles are computed using its _ppf() method.

  • ‘HD’: Estimate terciles using the Harrell-Davis estimator (see :py:class:scipy.stats.mstats.hdquantiles)

  • ‘nearest-rank’: Nearest rank or rank order method (see

https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method)

  • Any of the interpolation arguments for numpy.percentile.

Returns:
result (object ndarray):

An object array containing 2 arrays. The first array has shape (3,) and contains the forecast probabilities for being the event occuring early, near-normal, or late, respectively. The second array has shape (2,) and contains the climatological terciles deliniating the event categories.

fcst_deterministic(mu, sigma, y_clim)

Computes the calibrated forecast ensemble mean and ensemble mean anomaly relative to climatology. Means for the forecast are calculated as the expected value of the calibrated DCNORM distribution.

Args:
mu (float):

The mu parameter for the DCNORM distribution

sigma (float):

The sigma parameter for the DCNORM distribution

y_clim (array):

Array of climatological dates.

Returns:
fcst_mean (float):

The expected value of the forecast DCNORM distribution round to the nearest day.

fcst_mean_anom (float):

The anomaly of mean relative to the mean of y_clim.

fcst_prenon(mu, sigma)

Computes the probabilities for the pre-occurrence and non-occurrence of the IFD/FUD event.

Args:
mu (float):

The mu parameter for the DCNORM distribution

sigma (float):

The sigma parameter for the DCNORM distribution

Returns:
pre_occur (float):

Probability for the pre-occurrence of the IFD/FUD.

non_occur (float):

Probability for the pre-occurrence of the IFD/FUD.


dcnorm

class dcnorm.dcnorm_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)

A doubly-censored normal (DCNORM) random variable, censored numerically below \(a\) and above \(b\).

The probability density function for a DCNORM random variable is:

\[f(x;\mu,\sigma)= \Phi\left( \frac{a-\mu}{\sigma}\right)\delta(x-a) + \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)1_{(a,b)} + \left[1 - \Phi\left( \frac{b-\mu}{\sigma}\right)\right]\delta(x-b),\]

where \(1_{(a,b)}=1\) when \(a<x<b\) and \(1_{(a,b)}=0\) otherwise; \(\delta(x)\) is the delta function; \(\phi(\cdot)\) and \(\Phi(\cdot)\) are respectively the PDf and CDF for a standard normal distribution with zero mean and unit variance. The support is \(a\leq X \leq b\), and requirements are \(\infty<\mu<\infty\) and \(\sigma>0\).

dcnorm_gen is an instance of a subclass of rv_continuous, and therefore inherits all of the methods within rv_continuous. Some of those methods have been subclassed here:

_argcheck

_cdf

_pdf

_ppf

_stats

Additional methods added to dcnorm are:

ecdf(x, data)

The empirical distribution function for a sample data at values x.

fit(data)

This replaces the fit method in rv_continuous. Computes the maximum likelihood estimates for the DCNORM distribution parameters.

>>> from dcnorm import dcnorm_gen
>>> import numpy as np
>>> import matplotlib.pyplot as plt

Set the minimum and maximum values allowed for the DCNORM distribution

>>> a=120
>>> b=273

Instantiate the DCNORM distribution class with the given min/max values

>>> dcnorm = dcnorm_gen(a=a,b=b)

Variables for the \(\mu\) and \(\sigma\) parameters for the DCNORM distribution

>>> m=132.
>>> s=20.

Create a distribution object (see rv_continuous for details on available methods)

>>> rv = dcnorm(m,s)

As an example, create a sample of 100 random draws from this distribution

>>> X = rv.rvs(size=100)

Now fit the sample to a DCNORM distribution (this is done using maximum likelihood (ML) estimation)

>>> m_f, s_f = dcnorm.fit(X)

Make a new distribution object with the ML estimates computed previously

>>> rv_f = dcnorm(m_f, s_f)

Make range of values for plotting

>>> x = np.linspace(a, b, 1000)

Plot the emperical CDF for the sample X, the true distribution defined originally, and the distribution obtained by fitting to X.

>>> plt.figure()    
>>> plt.plot(x, dcnorm.ecdf(x,X), 'k--', label='empirical CDF')
>>> plt.plot(x, rv.cdf(x), 'k-', label='true CDF')
>>> plt.plot(x, rv_f.cdf(x), 'r-', label='fitted CDF')   
>>> plt.legend()    
>>> plt.show()
cdf(x, m, s)

Cumulative distribution function of the given RV.

xarray_like

quantiles

arg1, arg2, arg3,…array_like

The shape parameter(s) for the distribution (see docstring of the instance object for more information)

locarray_like, optional

location parameter (default=0)

scalearray_like, optional

scale parameter (default=1)

cdfndarray

Cumulative distribution function evaluated at x

ecdf(x, data)

For computing the empirical cumulative distribution function (ecdf) of a given sample.

Args:
x (float or ndarray):

The value(s) at which the ecdf is evaluated

data (float or ndarray):

A sample for which to compute the ecdf.

Returns: ecdf_vals (ndarray):

The ecdf for X_samp, evaluated at x.

fit(y)

Return MLEs for shape (if applicable), location, and scale parameters from data.

MLE stands for Maximum Likelihood Estimate. Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates, self._fitstart(data) is called to generate such.

One can hold some parameters fixed to specific values by passing in keyword arguments f0, f1, …, fn (for shape parameters) and floc and fscale (for location and scale parameters, respectively).

dataarray_like

Data to use in calculating the MLEs.

argsfloats, optional

Starting value(s) for any shape-characterizing arguments (those not provided will be determined by a call to _fitstart(data)). No default value.

kwdsfloats, optional

Starting values for the location and scale parameters; no default. Special keyword arguments are recognized as holding certain parameters fixed:

  • f0…fn : hold respective shape parameters fixed. Alternatively, shape parameters to fix can be specified by name. For example, if self.shapes == "a, b", fa``and ``fix_a are equivalent to f0, and fb and fix_b are equivalent to f1.

  • floc : hold location parameter fixed to specified value.

  • fscale : hold scale parameter fixed to specified value.

  • optimizer : The optimizer to use. The optimizer must take func, and starting position as the first two arguments, plus args (for extra arguments to pass to the function to be optimized) and disp=0 to suppress output as keyword arguments.

mle_tupletuple of floats

MLEs for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm).

This fit is computed by maximizing a log-likelihood function, with penalty applied for samples outside of range of the distribution. The returned answer is not guaranteed to be the globally optimal MLE, it may only be locally optimal, or the optimization may fail altogether. If the data contain any of np.nan, np.inf, or -np.inf, the fit routine will throw a RuntimeError.

Generate some data to fit: draw random variates from the beta distribution

>>> from scipy.stats import beta
>>> a, b = 1., 2.
>>> x = beta.rvs(a, b, size=1000)

Now we can fit all four parameters (a, b, loc and scale):

>>> a1, b1, loc1, scale1 = beta.fit(x)

We can also use some prior knowledge about the dataset: let’s keep loc and scale fixed:

>>> a1, b1, loc1, scale1 = beta.fit(x, floc=0, fscale=1)
>>> loc1, scale1
(0, 1)

We can also keep shape parameters fixed by using f-keywords. To keep the zero-th shape parameter a equal 1, use f0=1 or, equivalently, fa=1:

>>> a1, b1, loc1, scale1 = beta.fit(x, fa=1, floc=0, fscale=1)
>>> a1
1

Not all distributions return estimates for the shape parameters. norm for example just returns estimates for location and scale:

>>> from scipy.stats import norm
>>> x = norm.rvs(a, b, size=1000, random_state=123)
>>> loc1, scale1 = norm.fit(x)
>>> loc1, scale1
(0.92087172783841631, 2.0015750750324668)

sitdates

Created on Mon Jan 28 10:56:59 2019

@author: arlan

class sitdates.sitdates(event, min_dates=None, max_dates=None)

This module contains several functions that are useful for setting and getting minimium and maximum possible dates, and for converting between day-of-year and date formats for plotting.

Args:
event (str) {‘ifd’,’fud’}:

Can be either ‘ifd’ or ‘fud’.

min_dates (array, optional), shape=(12,):

If provided, this array defines the minimum date (\(a\) value) allowed for the ice-free date or freeze-up date for each calendar month corresponding to the initialization month. If not provided, then time_functions.min_dates will be set to the values used in [1]. To only change a subset of those dates from [1], rather than including this argument it may be faster to use the set_min_date() function.

max_dates (array, optional), shape=(12,):

If provided, this array defines the maximum date (\(b\) value) allowed for the ice-free date or freeze-up date for each calendar month corresponding to the initialization month. If not provided, then time_functions.min_dates will be set to the values used in [1]. To only change a subset of those dates from [1], rather than including this argument it may be faster to use the set_min_date() function.

1

To be filled in with reference following publication.

date_to_doy(date)

Compute date in day-of-year format for a given date.

Args:
date (object):

A ‘’date object’’ made with the datetime.date() function (see: https://docs.python.org/3/library/datetime.html#date-objects).

Returns:

Day-of-year value

dates_to_doys(dates)

Compute dates in day-of-year format for a given list of dates.

Args:
dates (list):

A list of ‘’date objects’’ made with the datetime.date() function (see: https://docs.python.org/3/library/datetime.html#date-objects).

Returns:

List of days-of-year values

doy_to_date(doy, format='%m/%d')

Return a date for a given day-of-year value.

Args:
doy (int or float):

Day-of-year value

format (optional, ‘str’):

An acceptable format given to strftime() (default=’%m%d’; see: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior)

Returns:

Day of year integer

get_max(month)

Returns the value of the maximum possible ice-free/freeze-up date for a given initialization month.

Args:
month (int or array(N,dtype=’int’)):

Value(s) between 1 and 12 corresponding to the calendar month(s) for which the maximum date is to be returned.

Returns:

Day of year or days of year representing the maximum possible date for month.

get_min(month)

Returns the value of the minimum possible ice-free/freeze-up date for a given initialization month.

Args:
month (int or array(N,dtype=’int’)):

Value(s) between 1 and 12 corresponding to the calendar month(s) for which the minimum date is to be returned.

Returns:

Day of year or days of year representing the maximum possible date for month.

set_max(month, value)

Override pre-existing maximum date value(s) in time_functions.min_dates array.

Args:
month (int or array(N,dtype=’int’)):

Value(s) between 1 and 12 corresponding to the calendar month(s) for which the maximum date is to be overriden.

value (int, float, or array(N,)):

The date(s) in day-of-year format that will override the pre-existing maximum date(s) for the calendar month(s) provided in the month argument.

Returns:

None

set_min(month, value)

Override pre-existing minimum date value(s) in time_functions.min_dates array.

Args:
month (int or array(N,dtype=’int’)):

Value(s) between 1 and 12 corresponding to the calendar month(s) for which the minimum date is to be overriden.

value (int, float, or array(N,)):

The date(s) in day-of-year format that will override the pre-existing minimum date(s) for the calendar month(s) provided in the month argument.

Returns:

None