NCGR¶
ncgr¶
Created on Tue Feb 25 15:55:31 2020
@author: arlan
-
class
ncgr.
ncgr_fullfield
(fcst_netcdf, hc_netcdf, obs_netcdf, out_netcdf, a, b, model_dict, obs_dict, clim_netcdf=None, terc_interp=None, sigma_eqn='s3', es_tol=0.05, pred_pval=0.05, options=None)¶ Performs non-homogeneous censored gaussian regression (NCGR) [1] on a forecast ice-free date (IFD) or freeze-up date (FUD) field using input NetCDF files.
An output NetCDF file is created that contains several quantities relevant to calibration.
- Args:
- fcst_netcdf (str):
Path of the NetCDF file containing the ensemble forecast IFDs or FUDs to be calibrated. Requirements for the file are as follows:
The structure/shape of the IFD/FUD variable should be: (time, ensemble members, latitude, longitude), although the actual names of those dimensions and corresponding variables (if they are included in the NetCDF files) can be anything.
Relevant variable names and dimension names are to be specified by the user with the
model_dict
argument provided to this function.The variable for the time coordinate, which should represent the forecast initialization date, should follow CF conventions (https://cfconventions.org/). In particular, it’s necessary for the initialization year be retrievable from the time variable using the
netCDF4.num2date()
function (https://unidata.github.io/netcdf4-python/netCDF4/index.html).Masked locations will be ignored in calibration and will be written as masked locations in the output NetCDF file.
- hc_netcdf (str):
Path of the NetCDF file containing the model ensemble forecast IFDs or FUDs used to train NCGR. This file should contain several years of model re-forecasts (hindcasts), and may in fact include data for years after the year of the forecast being calibrated (e.g. if leave-one-out cross-validation is being performed). The requirements for the file are the same as for
fcst_netcdf
, except that the variable corresponding to the time coordinate should contain several dates (i.e. the initialization date for each forecast).- obs_netcdf (str):
Path of the NetCDF file containing the observed IFDs or FUDs used to train NCGR. This file should contain several years of observed IFDs/FUDs (corresponding to what was verified for the model hindcasts in
hc_netcdf
). Requirements for the file are as follows:Relevant variable names and dimension names are to be specified by the user with the
obs_dict
argument provided to this function.The structure/shape of the IFD/FUD variable should be: (time, latitude, longitude), although the actual names of those dimensions and corresponding variables (if they are included in the NetCDF files) can be anything.
The variable for the time coordinate, which should represent the initialization dates of the re-forecasts in
hc_netcdf
, should follow CF conventions (https://cfconventions.org/). In particular, it’s necessary for the initialization year be retrievable from the time variable using thenetCDF4.num2date()
function (https://unidata.github.io/netcdf4-python/netCDF4/index.html).Masked locations will be ignored in calibration and will be written as masked locations in the output NetCDF file.
- out_netcdf (str):
The absolute path of the NetCDF file to be written that contains relevant calibrated forecast quantities. For more information on this file see
ncgr_fullfield.write_output()
.- a (float or int):
Minimum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to
a
when the event has occurred at the time of forecast initialization.- b (float or int):
Maximum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to
b
when the event does not occur by the end of the season.- model_dict (tuple(dict,dict)):
A tuple of two dictionaries. The first specifies the relevant variables in the NetCDF files assigned to
fcst_netcdf
andhc_netcdf
. The second specifies the relevant demension names in those same filesFor the first dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:
‘event_vn’:
str
: Variable name for the ice-free date or freeze-up date field(s)‘time_vn’:
str
: Variable name for the forecast initialization dateFor the second dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:
‘time_dn’:
str
: Dimension name for the forecast initialization date‘ens_dn’:
str
: Dimension name for the forecast realization/ensemble member- Example:
model_dict = ({'event_vn' : 'ifd', 'time_vn' : 'time'}, {'time_dn' : 'time', 'ens_dn' : 'ensemble'})
- obs_dict (tuple(dict,dict)):
A tuple of two dictionaries. The first specifies the relevant variables in the NetCDF file assigned to
obs_netcdf
. The second specifies the relevant demension names in that same file.For the first dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:
‘event_vn’:
str
: Variable name for ice-free date or freeze-up date field(s)‘time_vn’:
str
: Variable name for the forecast initialization dateFor the second dictionary, its keys (which must follow the naming conventions used below) and values (specified by user) are:
‘time_dn’:
str
: Dimension name for the forecast initialization date- Example:
obs_dict = ({'event_vn' : 'ifd', 'time_vn' : 'time'}, {'time_dn' : 'time'})
- clim_netcdf (str, optional):
The absolute path of the NetCDF file that contains several years of observed IFDs or FUDs used to construct the climatology that forecast IFDs/FUDs will be in reference to. If this is included, forecast probabilities for each the early, near-normal, and late events, as well as ensemble-mean anomalies are included in the
out_netcdf
file. Requirements for this file are the same as those forobs_netcdf
.- terc_interp (str or None, optional):
Interpolation scheme used to compute the terciles for the observed climatology. Default is None. Can be one of the following:
None: By default y_clim data are fit to the DCNORM distribution,
and terciles are computed using its
_ppf()
method.‘HD’: Estimate terciles using the Harrell-Davis estimator (see :py:class:scipy.stats.mstats.hdquantiles)
‘nearest-rank’: Nearest rank or rank order method (see
https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method)
Any of the interpolation arguments for
numpy.percentile
.
- sigma_eqn (str, optional):
Refers to the regression equation to be used for the \(\sigma\) parameter in the NCGR model. This can be one of ‘s1’, ‘s2’, or ‘s3’. These are defined by the regression equations below as \(\sigma_{I}\), \(\sigma_{II}\), and \(\sigma_{III}\), respectively. By default,
sigma_eqn='s3'
(i.e. \(\sigma_{III}\)).- es_tol (float or None, optional):
Early stopping threshold used for minimizing the CRPS. By default
es_tol=0.05
. Specifically, this argument sets thetol
argument inscipy.optimize.minimize(method=’SLSQP’)
.- pred_pval (float, optional):
The p-value for determining statistical significance of the second predictor for the \(\sigma_{II}\) or \(\sigma_{III}\) regression equations. By default,
pred_pval=0.05
.- options (dict, optional):
A dictionary of options to pass to :py:method:’scipy.optimize.minimize’ corresponding to its
options
argument (seescipy.optimize.minimize(method=’SLSQP’)
).
The following provides a brief description of NCGR; for a full description, see [1].
NCGR assumes the observed IFD or FUD, \(Y(t)\) (a random variable), conditioned on the ensemble forecast \(x_1(t),...,x_n(t)\) follows a DCNORM distribution – i.e. \(Y(t)|x_1(t),...,x_n(t)\sim N_{dc}(\mu(t),\sigma(t))\). The parameter \(\mu\) is modelled as
\[\mu(t) = \alpha_1\mu_{c}(t) + \alpha_2 x_{\langle i \rangle}^{d}(t)\]The user can choose one of the following equations for modelling the paremter \(\sigma\)
\[\begin{split}\sigma_{I}(t) &=\beta_1\sigma_{c}, \\ \sigma_{II}(t) &=\beta_1\sigma_{c}+\beta_2 s_x(t), \\ \sigma_{III}(t) &=\beta_1\sigma_{c}+\beta_2 x_{\langle i \rangle}^{tc}(t)\end{split}\]through the
sigma
argument, but by default \(\sigma=\sigma_{III}\).The relevant methods contained in this class are:
calibrate_fullfield()
Performs NCGR on the forecast IFD or FUD field.
write_output()
Creates a NetCDF file containing several relevant fields to the NCGR-calibrated forecast.
- 1
Dirkson, A., B. Denis., M.,Sigmond., & Merryfield, W.J. (2020). Development and Calibration of SeasonalProbabilistic Forecasts of Ice-free Dates and Freeze-up Dates. Weather and Forecasting. doi:10.1175/WAF-D-20-0066.1.
-
calibrate_fullfield
()¶ Loops through the spatial grid and performs NCGR at each grid point.
- Returns:
- result (ndarray object):
An object array. If
clim_netcdf
is included as an argument toncgr_fullfield
,results
is an object containing six arrays corresponding to the following variables:- mu_cal (ndarray), shape (nrow, ncol):
Calibrated \(\mu\) parameter for the predictive DCNORM distribution, where nrow is the number of rows and ncol is the number of columns provided as spatial coordinates in the NetCDF files given to
ncgr_fullfield
.- sigma_cal (ndarray), shape (nrow, ncol):
Calibrated \(\sigma\) parameter for the predictive DCNORM distribution, where nrow and ncol are defined as in
mu_cal
.- fcst_probs (ndarray), shape (3, nrow, ncol):
The three forecast probabilities for early, near-normal, and late sea-ice retreat/advance, where nrow and ncol are defined as in
mu_cal
.- fcst_pre (ndarray), shape (nrow, ncol):
The probability for the pre-occurrence of the event.
- fcst_non (ndarray), shape (nrow, ncol):
The probability for the non-occurrence of the event.
- clim_terc (ndarray), shape (2, nrow, ncol):
The two terciles (i.e. 1/3 and 2/3 quantiles) for the observed climatology, where nrow and ncol are defined as in
mu_cal
.- mean (float):
Expected value of the forecast DCNORM distribution.
- mean_anom (float):
Anomaly of
mean
relative to climatology.
If climatology is not included as an argument to
ncgr_fullfield
,results
contains two arrays corresponding to the following variables:- mu_cal (ndarray), shape (nrow, ncol):
Calibrated \(\mu\) parameter for the predictive DCNORM distribution, where nrow is the number of rows and ncol is the number of columns provided as spatial coordinates in the NetCDF files given to
ncgr_fullfield
.- sigma_cal (ndarray), shape (nrow, ncol):
Calibrated \(\sigma\) parameter for the predictive DCNORM distribution, where nrow and ncol are defined as in
mu_cal
.- fcst_pre (ndarray), shape (nrow, ncol):
The probability for the pre-occurrence of the event.
- fcst_non (ndarray), shape (nrow, ncol):
The probability for the non-occurrence of the event.
-
write_output
(result)¶ Writes the variables provided as arguments to this function to the NetCDF file
out_netcdf
, an argument provided toncgr_fullfield
. Note that this will remove and replace any previous version ofout_netcdf
.- Args:
- mu_cal (ndarray), shape (nrow, ncol):
Calibrated \(\mu\) parameter for the predictive DCNORM distribution, where nrow is the number of rows and ncol is the number of columns provided as spatial coordinates in the NetCDF files given to
ncgr_fullfield()
.- sigma_cal (ndarray), shape (nrow, ncol):
Calibrated \(\sigma\) parameter for the predictive DCNORM distribution, where nrow and ncol are defined as in
mu_cal
.- fcst_probs (ndarray), shape (3, nrow, ncol):
The three forecast probabilities for early, near-normal, and late sea-ice retreat/advance, where nrow and ncol are defined as in
mu_cal
.- fcst_pre (ndarray), shape (nrow, ncol):
Probability for the pre-occurrence of the event.
- fcst_non (ndarray), shape (nrow, ncol):
Probability for the non-occurrence of the event.
- clim_terc (ndarray), shape (2, nrow, ncol):
The two terciles (i.e. 1/3 and 2/3 quantiles) for the observed climatology, where nrow and ncol are defined as in
mu_cal
.- ens_mean (float):
Expected value of the forecast DCNORM distribution.
- ens_mean_anom (float):
Anomaly of
ens_mean
relative to the climatological mean.
-
class
ncgr.
ncgr_gridpoint
(a, b)¶ - Args:
- a (float or int):
Minimum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to
a
when the event has occurred at the time of forecast initialization.- b (float or int):
Maximum possible date for the event in day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). See definition in [1]. IFDs or FUDs provided in input NetCDF files should be set to the value provided to
b
when the event does not occur by the end of the season.
The following provides a brief description of NCGR; for a full description, see [1]_.
NCGR assumes the observed IFD or FUD, \(Y(t)\) (a random variable), conditioned on the ensemble forecast \(x_1(t),...,x_n(t)\) follows a DCNORM distribution – i.e. \(Y(t)|x_1(t),...,x_n(t)\sim N_{dc}(\mu(t),\sigma(t))\). The parameter \(\mu\) is modelled as
\[\mu(t) = lpha_1\mu_{c}(t) + lpha_2 x_{\langle i \]
angle}^{d}(t)
The user can choose one of the following equations for modelling the paremter \(\sigma\)
\[\sigma_{I}(t) &=eta_1\sigma_{c}, \ \sigma_{II}(t) &=eta_1\sigma_{c}+eta_2 s_x(t), \ \sigma_{III}(t) &=eta_1\sigma_{c}+eta_2 x_{\langle i \]angle}^{tc}(t)
through the
sigma
argument, but by default \(\sigma=\sigma_{III}\).The relevant method contained in this class is:
calibrate_gridpoint()
Performs NCGR on the forecast IFD or FUD at a single gridpoint.
%(after_notes)s
- 1
Dirkson, A., B. Denis., M.,Sigmond., & Merryfield, W.J. (2020). Development and Calibration of SeasonalProbabilistic Forecasts of Ice-free Dates and Freeze-up Dates. Weather and Forecasting. doi:10.1175/WAF-D-20-0066.1.
-
build_model
(X_t, X, Y, tau_t, t, sigma_eqn='s3', pred_pval=0.05)¶ - Args:
- X_t (ndarray), shape (n,):
Forecast to be calibrated, where n is the ensemble size.
- X (ndarray), shape (N,n):
Forecasts for training NCGR, where N is the number of years for training and n is the ensemble size for a given forecast.
- Y (ndarray), shape (N,):
Observations for training NCGR, where N is the number of years for training.
- tau_t (ndarray), shape (N,):
Years corresponding to those used for training period (this should not contain the forecast year). The year should be based on the initialization date, not the date of the IFD or FUD event.
- t (float or int):
The year in which the forecast is initialized.
- sigma_eqn (str, optional):
Refers to the regression equation to be used for the \(\sigma\) parameter in the NCGR model. This can be one of ‘s1’, ‘s2’, or ‘s3’. These are defined by the regression equations below as \(\sigma_{I}\), \(\sigma_{II}\), and \(\sigma_{III}\), respectively. By default,
sigma='s3'
(i.e. \(\sigma_{III}\)).- pred_pval (float, optional):
The p-value for determining statistical significance of the second predictor for the \(\sigma_{II}\) or \(\sigma_{III}\) regression equations. By default,
pred_pval=0.05
.
- Returns:
- predictors_tau (ndarray or ndarray object), shape (2,2,N) or (2,)
Contains the predictors for both mu and sigma over the training period. It’s an object if mu and sigma have different numbers of predictors (i.e. when there is no second term in the equation for sigma). It’s an array if mu and sigma have the same number of predictors. Regardless, the following is always true for the first two entries:
- predictors_tau[0] (ndarray), shape (2, N):
The predictors for mu over the training period
- predictors_tau[1] (ndarray), shape (1, N) or (2, N):
The predictor(s) for sigma over the training period
- predictors_t (ndarray or object array), shape (2,2) or shape (2,)
Contains the predictors for both mu and sigma for the forecast year t. It’s an object if mu and sigma have different numbers of predictors (i.e. when there is no second term in the equation for sigma). It’s an array if mu and sigma have the same number of predictors. Regardless, the following is always true for the first two entries:
- predictors_tau[0] (ndarray), shape (2,):
The predictors for mu for the forecast year t
- predictors_tau[1] (ndarray), shape (1,) or (2,):
The predictor(s) for sigma for the forecast year t
- coeffs0 (array), shape (3,) or (4,):
Initial guesses for the coefficients in the NCGR regression equations.
-
optimizer
(Y, predictors_tau, coeffs0, es_tol=0.05, options=None)¶ Function for minimizing the CRPS over N forecasts. For this,
scipy.optimize.minimize(method=’SLSQP’)
is called to minimize the analytic expression for the CRPS when the distribution under consideration for the forecast is the DCNORM distribution.- Args:
- Y (ndarray), shape (N,):
Observations for training NCGR, where N is the number of years for training.
- predictors_tau (ndarray or object of ndarray), shape (2,) or (2,4,N)
Predictors over the training period. If an object of ndarray, then predictors_tau[0] has shape (2,N) corresponding to predictors for \(mu\), and predictors_tau[1] has shape (N,) corresponding to the single predictor for \(sigma\).
- coeffs0 (array), shape (3,) or (4,):
Initial guesses for the coefficients in the NCGR regression equations.
- es_tol (float or None, optional):
Early stopping threshold used for minimizing the CRPS. By default
es_tol=0.05
. Specifically, this argument sets thetol
argument inscipy.optimize.minimize(method=’SLSQP’)
.- options (dict, optional):
A dictionary of options to pass to :py:method:’scipy.optimize.minimize’ corresponding to its
options
argument (seescipy.optimize.minimize(method=’SLSQP’)
).
- Returns:
- coeffs (array):
Optimized regression coefficients.
-
class
ncgr.
crps_funcs
(a, b)¶ This class contains functions needed to perform CRPS minimization for the DCNORM distribution. It also contains a function for computing the CRPS when the forecast distribution takes the form of a DCNORM distribution (as it does for NCGR).
- Args:
- a (float or int):
Minimum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year.
- b (float or int):
Maximum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year. The
b
argument must be larger than thea
argument.
The methods contained in this class are:
crps_dcnorm()
Computes the CRPS for a set of forecsts and observations when the predictive distribution takes the form of a DCNORM distribution.
crps_ncgr()
The cost function used when executing
scipy.optimize.mimize
in thecalibrate
method. Computes the mean CRPS as a function of a set of hindcast CDFs (modelled by NCGR) and observed dates.crps_ncgr_jac()
Called on in the
calibrate
method. Computes the jacobian matrix for the CRPS cost function.crps_singleyear()
Called on in the
calibrate
method. Computes the CRPS for a single forecast CDF (modelled as a DCNORM distribution) and observation.
-
crps_dcnorm
(y, mu, sigma)¶ Time mean continuous rank probability score (CRPS) when the distribution takes the form of a DCNORM distribution.
- Args:
- y (ndarray), shape (n,):
Observed dates, where n is the number of forecast/observation pairs.
- mu (ndarray), shape (n,):
DCNORM parameter \(\mu\) for each of the 1,…,n forecast distributions.
- sigma (ndarray), shape (n,):
DCNORM parameter \(\sigma\) for each of the 1,…,n forecast distributions.
- Returns:
- result (float):
Time mean CRPS.
-
crps_dcnorm_single
(y, mu, sigma)¶ Continuous rank probability score (CRPS) for a single forecast when the distribution takes the form of a DCNORM distribution.
- Args:
- y (float or int):
Observed date.
- mu (float or int):
DCNORM parameter \(\mu\).
- sigma (float or int):
DCNORM parameter \(\sigma\)
- Returns:
- result (float):
CRPS
-
crps_ncgr
(coeffs, predictors, y)¶ - Args:
- coeffs (list), shape (m,):
Coefficients in the NCGR regression equations, where m is the total number of coefficients/predictors. The first two values correspond to those for \(\mu\) and the remaining values correspond to those for \(\sigma\).
- predictors (object), shape (n,):
Object containing the predictors, where n=2 is the number of distribution parameters. The shape of either predictors[0] or predictors[1] is (m,p), where m is the number of coefficients/predictors for the corresponding parameter, and p is the number of years in the training period
self.tau_t
.- y (ndarray), shape (p,):
Array of observed dates, where p is the number of years in the training period
self.tau_t
.
- Returns:
The time-averaged continuous rank probability score (CRPS).
-
crps_ncgr_jac
(coeffs, predictors, y)¶ - Args:
- coeffs (list), shape (n+m):
Coefficients in the NCGR regression equations, where n=2 is the number of distribution parameters and m is the number of predictors for a given parameter. The first two values are the coefficients for \(\mu\) and the remaining values are the coefficeints for \(\sigma\).
- predictors (object), shape (n,):
Object containing the predictors, where n=2 is the number of distribution parameters. The shape of either predictors[0] or predictors[1] is (m,p), where m is the number of coefficients/predictors for the corresponding parameter, and p is the number of years in the training period
self.tau_t
.- y (ndarray), shape (p):
Array of observed dates, where p is the number of years in the training period
self.tau_t
.
- Returns:
- (ndarray), shape (m,):
The jacobian matrix of the time-averaged continuous rank probability score.
-
crps_ncgr_sy
(params, y)¶ Computes the continuous rank probability score for a single forecast with DCNORM distribution.
- Args:
- params (list), shape (2,):
List containing the two DCNORM distribution parameters \(\mu\) and \(\sigma\).
- y (float):
The observation.
- Returns:
- result (float):
The CRPS.
-
class
ncgr.
fcst_vs_clim
(a, b)¶ Contains functions for computing forecast quantities relative to observed climataology.
- Args:
- a (float or int):
Minimum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year.
- b (float or int):
Maximum possible date for the event in non leap year day-of-year units; e.g. 1=Jan 1, 91=April 1, 365=Dec 31). A value larger than 365 is regarded as a date for the following year. The
b
argument must be larger than thea
argument.- fill_value (float):
The flag value given to an event probability when it doesn’t make sense to compute one; this occurs when the climatological terciles are equal to each other, for example.
-
event_probs
(mu, sigma, y_clim, terc_interp=None)¶ Computes the forecast probabilities for an early, normal, or late event relative to some defined climatology.
- Args:
- mu (float):
The mu parameter for the DCNORM distribution
- sigma (float):
The sigma parameter for the DCNORM distribution
- y_clim (ndarray), shape (N,):
Array of climatological dates, where N is the number of dates (equiavalently years) used to compute climatological statistics.
- terc_interp (str or None, optional):
Interpolation scheme used to compute the terciles for the observed climatology. Default is None. Can be one of the following:
None: By default y_clim data are fit to the DCNORM distribution,
and terciles are computed using its
_ppf()
method.‘HD’: Estimate terciles using the Harrell-Davis estimator (see :py:class:scipy.stats.mstats.hdquantiles)
‘nearest-rank’: Nearest rank or rank order method (see
https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method)
Any of the interpolation arguments for
numpy.percentile
.
- Returns:
- result (object ndarray):
An object array containing 2 arrays. The first array has shape (3,) and contains the forecast probabilities for being the event occuring early, near-normal, or late, respectively. The second array has shape (2,) and contains the climatological terciles deliniating the event categories.
-
fcst_deterministic
(mu, sigma, y_clim)¶ Computes the calibrated forecast ensemble mean and ensemble mean anomaly relative to climatology. Means for the forecast are calculated as the expected value of the calibrated DCNORM distribution.
- Args:
- mu (float):
The mu parameter for the DCNORM distribution
- sigma (float):
The sigma parameter for the DCNORM distribution
- y_clim (array):
Array of climatological dates.
- Returns:
- fcst_mean (float):
The expected value of the forecast DCNORM distribution round to the nearest day.
- fcst_mean_anom (float):
The anomaly of
mean
relative to the mean ofy_clim
.
-
fcst_prenon
(mu, sigma)¶ Computes the probabilities for the pre-occurrence and non-occurrence of the IFD/FUD event.
- Args:
- mu (float):
The mu parameter for the DCNORM distribution
- sigma (float):
The sigma parameter for the DCNORM distribution
- Returns:
- pre_occur (float):
Probability for the pre-occurrence of the IFD/FUD.
- non_occur (float):
Probability for the pre-occurrence of the IFD/FUD.
dcnorm¶
-
class
dcnorm.
dcnorm_gen
(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)¶ A doubly-censored normal (DCNORM) random variable, censored numerically below \(a\) and above \(b\).
The probability density function for a DCNORM random variable is:
\[f(x;\mu,\sigma)= \Phi\left( \frac{a-\mu}{\sigma}\right)\delta(x-a) + \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)1_{(a,b)} + \left[1 - \Phi\left( \frac{b-\mu}{\sigma}\right)\right]\delta(x-b),\]where \(1_{(a,b)}=1\) when \(a<x<b\) and \(1_{(a,b)}=0\) otherwise; \(\delta(x)\) is the delta function; \(\phi(\cdot)\) and \(\Phi(\cdot)\) are respectively the PDf and CDF for a standard normal distribution with zero mean and unit variance. The support is \(a\leq X \leq b\), and requirements are \(\infty<\mu<\infty\) and \(\sigma>0\).
dcnorm_gen
is an instance of a subclass ofrv_continuous
, and therefore inherits all of the methods withinrv_continuous
. Some of those methods have been subclassed here:_argcheck
_cdf
_pdf
_ppf
_stats
Additional methods added to
dcnorm
are:ecdf(x, data)
The empirical distribution function for a sample
data
at valuesx
.fit(data)
This replaces the
fit
method inrv_continuous
. Computes the maximum likelihood estimates for the DCNORM distribution parameters.
>>> from dcnorm import dcnorm_gen >>> import numpy as np >>> import matplotlib.pyplot as plt
Set the minimum and maximum values allowed for the DCNORM distribution
>>> a=120 >>> b=273
Instantiate the DCNORM distribution class with the given min/max values
>>> dcnorm = dcnorm_gen(a=a,b=b)
Variables for the \(\mu\) and \(\sigma\) parameters for the DCNORM distribution
>>> m=132. >>> s=20.
Create a distribution object (see
rv_continuous
for details on available methods)>>> rv = dcnorm(m,s)
As an example, create a sample of 100 random draws from this distribution
>>> X = rv.rvs(size=100)
Now fit the sample to a DCNORM distribution (this is done using maximum likelihood (ML) estimation)
>>> m_f, s_f = dcnorm.fit(X)
Make a new distribution object with the ML estimates computed previously
>>> rv_f = dcnorm(m_f, s_f)
Make range of values for plotting
>>> x = np.linspace(a, b, 1000)
Plot the emperical CDF for the sample X, the true distribution defined originally, and the distribution obtained by fitting to X.
>>> plt.figure() >>> plt.plot(x, dcnorm.ecdf(x,X), 'k--', label='empirical CDF') >>> plt.plot(x, rv.cdf(x), 'k-', label='true CDF') >>> plt.plot(x, rv_f.cdf(x), 'r-', label='fitted CDF') >>> plt.legend() >>> plt.show()
-
cdf
(x, m, s)¶ Cumulative distribution function of the given RV.
- xarray_like
quantiles
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- cdfndarray
Cumulative distribution function evaluated at x
-
ecdf
(x, data)¶ For computing the empirical cumulative distribution function (ecdf) of a given sample.
- Args:
- x (float or ndarray):
The value(s) at which the ecdf is evaluated
- data (float or ndarray):
A sample for which to compute the ecdf.
- Returns: ecdf_vals (ndarray):
The ecdf for X_samp, evaluated at x.
-
fit
(y)¶ Return MLEs for shape (if applicable), location, and scale parameters from data.
MLE stands for Maximum Likelihood Estimate. Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates,
self._fitstart(data)
is called to generate such.One can hold some parameters fixed to specific values by passing in keyword arguments
f0
,f1
, …,fn
(for shape parameters) andfloc
andfscale
(for location and scale parameters, respectively).- dataarray_like
Data to use in calculating the MLEs.
- argsfloats, optional
Starting value(s) for any shape-characterizing arguments (those not provided will be determined by a call to
_fitstart(data)
). No default value.- kwdsfloats, optional
Starting values for the location and scale parameters; no default. Special keyword arguments are recognized as holding certain parameters fixed:
f0…fn : hold respective shape parameters fixed. Alternatively, shape parameters to fix can be specified by name. For example, if
self.shapes == "a, b"
,fa``and ``fix_a
are equivalent tof0
, andfb
andfix_b
are equivalent tof1
.floc : hold location parameter fixed to specified value.
fscale : hold scale parameter fixed to specified value.
optimizer : The optimizer to use. The optimizer must take
func
, and starting position as the first two arguments, plusargs
(for extra arguments to pass to the function to be optimized) anddisp=0
to suppress output as keyword arguments.
- mle_tupletuple of floats
MLEs for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g.
norm
).
This fit is computed by maximizing a log-likelihood function, with penalty applied for samples outside of range of the distribution. The returned answer is not guaranteed to be the globally optimal MLE, it may only be locally optimal, or the optimization may fail altogether. If the data contain any of np.nan, np.inf, or -np.inf, the fit routine will throw a RuntimeError.
Generate some data to fit: draw random variates from the beta distribution
>>> from scipy.stats import beta >>> a, b = 1., 2. >>> x = beta.rvs(a, b, size=1000)
Now we can fit all four parameters (
a
,b
,loc
andscale
):>>> a1, b1, loc1, scale1 = beta.fit(x)
We can also use some prior knowledge about the dataset: let’s keep
loc
andscale
fixed:>>> a1, b1, loc1, scale1 = beta.fit(x, floc=0, fscale=1) >>> loc1, scale1 (0, 1)
We can also keep shape parameters fixed by using
f
-keywords. To keep the zero-th shape parametera
equal 1, usef0=1
or, equivalently,fa=1
:>>> a1, b1, loc1, scale1 = beta.fit(x, fa=1, floc=0, fscale=1) >>> a1 1
Not all distributions return estimates for the shape parameters.
norm
for example just returns estimates for location and scale:>>> from scipy.stats import norm >>> x = norm.rvs(a, b, size=1000, random_state=123) >>> loc1, scale1 = norm.fit(x) >>> loc1, scale1 (0.92087172783841631, 2.0015750750324668)
sitdates¶
Created on Mon Jan 28 10:56:59 2019
@author: arlan
-
class
sitdates.
sitdates
(event, min_dates=None, max_dates=None)¶ This module contains several functions that are useful for setting and getting minimium and maximum possible dates, and for converting between day-of-year and date formats for plotting.
- Args:
- event (str) {‘ifd’,’fud’}:
Can be either ‘ifd’ or ‘fud’.
- min_dates (array, optional), shape=(12,):
If provided, this array defines the minimum date (\(a\) value) allowed for the ice-free date or freeze-up date for each calendar month corresponding to the initialization month. If not provided, then
time_functions.min_dates
will be set to the values used in [1]. To only change a subset of those dates from [1], rather than including this argument it may be faster to use theset_min_date()
function.- max_dates (array, optional), shape=(12,):
If provided, this array defines the maximum date (\(b\) value) allowed for the ice-free date or freeze-up date for each calendar month corresponding to the initialization month. If not provided, then
time_functions.min_dates
will be set to the values used in [1]. To only change a subset of those dates from [1], rather than including this argument it may be faster to use theset_min_date()
function.
- 1
To be filled in with reference following publication.
-
date_to_doy
(date)¶ Compute date in day-of-year format for a given date.
- Args:
- date (object):
A ‘’date object’’ made with the
datetime.date()
function (see: https://docs.python.org/3/library/datetime.html#date-objects).
- Returns:
Day-of-year value
-
dates_to_doys
(dates)¶ Compute dates in day-of-year format for a given list of dates.
- Args:
- dates (list):
A list of ‘’date objects’’ made with the
datetime.date()
function (see: https://docs.python.org/3/library/datetime.html#date-objects).
- Returns:
List of days-of-year values
-
doy_to_date
(doy, format='%m/%d')¶ Return a date for a given day-of-year value.
- Args:
- doy (int or float):
Day-of-year value
- format (optional, ‘str’):
An acceptable format given to
strftime()
(default=’%m%d’; see: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior)
- Returns:
Day of year integer
-
get_max
(month)¶ Returns the value of the maximum possible ice-free/freeze-up date for a given initialization month.
- Args:
- month (int or array(N,dtype=’int’)):
Value(s) between 1 and 12 corresponding to the calendar month(s) for which the maximum date is to be returned.
- Returns:
Day of year or days of year representing the maximum possible date for
month
.
-
get_min
(month)¶ Returns the value of the minimum possible ice-free/freeze-up date for a given initialization month.
- Args:
- month (int or array(N,dtype=’int’)):
Value(s) between 1 and 12 corresponding to the calendar month(s) for which the minimum date is to be returned.
- Returns:
Day of year or days of year representing the maximum possible date for
month
.
-
set_max
(month, value)¶ Override pre-existing maximum date value(s) in
time_functions.min_dates
array.- Args:
- month (int or array(N,dtype=’int’)):
Value(s) between 1 and 12 corresponding to the calendar month(s) for which the maximum date is to be overriden.
- value (int, float, or array(N,)):
The date(s) in day-of-year format that will override the pre-existing maximum date(s) for the calendar month(s) provided in the
month
argument.
- Returns:
None
-
set_min
(month, value)¶ Override pre-existing minimum date value(s) in
time_functions.min_dates
array.- Args:
- month (int or array(N,dtype=’int’)):
Value(s) between 1 and 12 corresponding to the calendar month(s) for which the minimum date is to be overriden.
- value (int, float, or array(N,)):
The date(s) in day-of-year format that will override the pre-existing minimum date(s) for the calendar month(s) provided in the
month
argument.
- Returns:
None