Documentation¶

beinf (zero- and one- inflated beta distribution)¶

class beinf.beinf_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)[source]¶

A zero- and one- inflated beta (BEINF) random variable

The probability density function for BEINF is:: \[f(x;a,b,p,q) = p(1-q)\delta(x)+(1-p)f_{\mathrm{beta}}(x;a,b) + pq\delta(x-1)\]

where

\[f_{\mathrm{beta}}(x;a,b)=\frac{1}{\mathrm{B}(a,b)}x^{a-1}(1-x)^{b-1},\]

\(\delta(x)\) is the delta function, and \(\mathrm{B}(a,b)\) is the beta function (scipy.special.beta).

beinf takes \(a>0\), \(b>0\), \(0\leq p \leq 1\), \(0\leq q \leq 1\) as shape parameters.

beinf is an instance of a subclass of rv_continuous, and therefore inherits all of the methods within rv_continuous. Some of those methods have been subclassed here:

_argcheck

_cdf

_pdf

_ppf

Additional methods added to beinf are:

beta_moments(data_sub): Computes the parameters for the beta distribution using the method of moments.
cdf_eval(x,data_params,data): Chooses the appropriate method for evaluating the cumulative distribution function for data at points x, and computes it.
check_cases(data): Checks to see if data satisfies any of cases 1-4.
ecdf(x, X_samp, p=None, q=None): The empirical distribution function X_samp at points x.
fit(data): This replaces the fit method in rv_continuous, but is not subclassed. Computes the MLE parameters for the BEINF distribution.
fit_beta(data_sub): Computes the MLE parameters for the beta distribution.

beta_moments(data_sub)[source]¶

Computes the method of moments estimates of a and b for the beta distribution.

Args:

data_sub (ndarray):: The data to be fit to the beta distribution. The values in data_sub should lie on the open interval (0,1).

Returns: a,b (floats):

The shape parameters for the beta distribution.

cdf_eval(x, data_params, data)[source]¶

Evaluates the cumulative distribution function (empirically or parametrically) for a random variable \(X\sim\mathrm{BEINF}(a,b,p,q)\). When \(a\) and \(b\) are unkown (i.e. when a=np.inf and b=np.inf), the cdf is evaluated using ecdf(). When \(a\) and \(b\) are known or when \(p=1\), the cdf is evaluated using cdf(). This is used after applying TAQM (see the tutorial on using the taqm class).

Args:

x (float or ndarray):: The value(s) at which the ecdf is evaluated.
data_params (ndarray):: An array containing the four parameters a, b, p, and q.
data (ndarray):: The data which defines the ecdf when \(a\) and \(b\) are unknown (i.e. when a=np.inf and b=np.inf). This array may contain the values np.inf, 0, and 1.

Returns: cdf_vals (ndarry):

The cdf or ecdf for \(X\sim\mathrm{BEINF}(a,b,p,q)\) evaluated at x.

check_cases(data)[source]¶

Checks whether data satisfies any of cases 1-4 described in Section 3b of Dirkson et al, 2018. When True, the parameters a and b for the beta-portion of the BEINF distribution cannot be fit.

Args:

data (ndarray):: The values for which to test if cases 1-4 apply.

Returns: True or False

Boolean value of True or False. True if any of the cases are satisfied; False if all of the cases are not satisfied.

ecdf(x, X_samp, p=None, q=None)[source]¶

For computing the empirical cumulative distribution function (ecdf) for a random variable \(X\sim\mathrm{BEINF}(a,b,p,q)\) when the parameters \(a\) and \(b\) (or additionally \(p\) and \(q\)) are unkown.

Args:

x (float or ndarray):: The value(s) at which the ecdf is evaluated
X_samp (float or ndarray):: A sample that lies on either: the open interval (0,1) when p and q are included as arguments, or the closed interval [0,1] when p and q are not included as arguments.
p (float, optional):: Shape parameter(s) for the beinf distribution.
q (float, optional):: Shape parameter(s) for the beinf distribution.

Returns: ecdf_vals (ndarray):

The ecdf for X_samp, evaluated at x.

fit(data)[source]¶

Computes the MLE parameters \(a\), \(b\), \(p\), and \(q\) for the BEINF distribution to the given data.

Args:

data (ndarray):: The data to be fit to the BEINF distribution.

Returns: a,b, p, q (floats):

The shape parameters for the BEINF distribution. When \(a\) and \(b\) cannot be fit, they are returned as a=np.inf and b=np.inf.

fit_beta(data_sub)[source]¶

Computes the MLE parameters \(a\) and \(b\) for the beta distribution to the given data.

Args:

data_sub (ndarray):: The data to be fit to the beta distribution. The values in data_sub should lie on the open interval (0,1).

Returns: a,b (floats):

The shape parameters for the beta distribution.

taqm (trend-adjusted quantile mapping)¶

class taqm.taqm[source]¶

Contains the methods needed for performing trend-adjusted quantile mapping (TAQM). It relies on the methods from the beinf class.

calibrate(x_params,y_params,x_t_params,X,Y,X_t,trust_sharp_fcst=False): calibrated forecast BEINF parameters and calibrated forecast ensemble
fit_data(X,Y,X_t): BEINF parameters for the TAMH, TAOH, and the raw forecast
lin(a1,b1,T): linear equation values
piecewise_lin(a1,b1,a2,b2,t_b,T): piece-wise linear equation values
trend_adjust_1p(data_all,tau_t,t): trend-adjusted values using a single period
trend_adjust_2p(data_all,tau_t,t,t_b=1999): trend-adjusted values using two periods
unpack_params(array): the individual parameters stored in array

calibrate(x_params, y_params, x_t_params, X, Y, X_t, trust_sharp_fcst=False)[source]¶

Calibrates the raw forecast BEINF paramaters \(a_{x_t}\), \(b_{x_t}\), \(p_{x_t}\) and \(q_{x_t}\). This method carries out the calibration step described in section 5c in Dirkson et al, 2018.

Args:

x_params (ndarray):: An array containing the four parameters of the BEINF distribution for the TAMH ensemble time series.
y_params (ndarray):: An array containing the four parameters of the BEINF distribution for the TAOH time series.
x_t_params (ndarray):: An array containing the four parameters of the BEINF distribution for the raw forecast ensemble.
trust_sharp_fcst (boolean, optional):: True to revert to the raw forecast when \(p_{x_t}=1\). False to revert to the TAOH distribution when \(p_{x_t}=1\).

Returns: x_t_cal_params (ndarray), X_t_cal_beta (ndarray):

x_t_cal_params contains the four BEINF distribution parameters for the calibrated forecast: \(a_{\hat{x}_t}\), \(b_{\hat{x}_t}\), \(p_{\hat{x}_t}\) and \(q_{\hat{x}_t}\). When \(a_{\hat{x}_t}\) and \(b_{\hat{x}_t}\) could not be fit, they are returned as a=np.inf and b=np.inf.

X_t_cal_beta contains the calibrated forecast ensemble (np.inf replaces replace 0’s and 1’s in the ensemble). This array contains all np.inf values when any of \(p_y=1\), \(p_x=1\), or \(p_{x_t}=1\), or when all parameters in x_t_cal_params are defined (none are equal to np.inf).

fit_params(X, Y, X_t)[source]¶

Fits X (the TAMH ensemble time series), Y (the TAOH time series), and X_t (the raw forecast ensemble) to the BEINF distribution. This method carries out the fiting procedure described in section 5b of Dirkson et al, 2018.

Args:

X (ndarray):: The TAMH ensemble time series of size NxM.
Y (ndarray):: The TAOH time series of size N.
X_t (ndarray):: The raw forecast ensemble of size M.

Returns: x_params (ndarray), y_params (ndarray), x_t_params (ndarray):

The shape parameters \(a\), \(b\), \(p\), \(q\) for the BEINF distribution fitted to each X, Y, and X_t (see beinf.fit()).

lin(a1, b1, T)[source]¶

Evaluates the piece-wise linear equation

(1)¶\[z = a_1 T + b_1 \]

at \(T\).

Args:

a1 (float):: The slope in (1).
b1 (float):: The z-intercept in (1).
t_b (float):: The breakpoint for \(z\) in (1).
T (float or ndarray):: The point(s) at which (1) is evaluated.

Returns: z (float or ndarray):

The value of (1) at each point T. Values less than 0 and greater than 1 are clipped to 0 and 1, respectively.

piecewise_lin(a1, b1, a2, b2, t_b, T)[source]¶

Evaluates the piece-wise linear equation

(2)¶\[\begin{split}z = \begin{cases} a_1 T + b_1, & T<t_b \\ a_2 T + b_2, & T>t_b \end{cases} \end{split}\]

at \(T\).

Args:

a1, a2 (floats):: The slopes in (2).
b1, b2 (floats):: The z-intercepts in (2).
t_b (float):: The breakpoint for \(z\) in (2).
T (float or ndarray):: The point(s) at which (2) is evaluated.

Returns: z (float or ndarray):

The value of (2) at each point T. Values less than 0 and greater than 1 are clipped to 0 and 1, respectively.

trend_adjust_1p(data_all, tau_t, t)[source]¶

Linearly detrend data_all and re-center about its linear least squares fit evaluated at \(T=t\). One may want to use this trend adjustment over trend_adjust_2p() if the hindcast record is over the more recent record.

Args:

data_all (ndarray):: A time series of size N, or an ensemble time series of size NxM, where M is the number of ensemble members.
tau_t (ndarray):: All hindcast years exluding the forecast year.
t (float):: The forecast year.

Returns: data_ta (ndarray):

Trend-adjusted values with same shape as data_all.

trend_adjust_2p(data_all, tau_t, t, t_b=1999)[source]¶

Piece-wise linearly detrend data_all and re-center it about its non-linear least squares fit to Eq. (2) evaluated at \(T=\) t. This method carries out the trend-adjustment technique described in section 5a of Dirkson et al, 2018. The non-linear least squares fit constrains Eq. (2) to be continuous at \(T=t_b\).

Args:

data_all (ndarray):: A time series of size N, or an ensemble time series of size NxM, where M is the number of ensemble members.
tau_t (ndarray):: All hindcast years exluding the forecast year t.
t (float):: The forecast year.
t_b (float):: The breakpoint year in Eq. (2).

Returns: data_ta (ndarray):

Trend-adjusted values with same shape as data_all.

unpack_params(array)[source]¶

Unpacks the individual parameters a, b, p, q from array.

Args:

array (ndarray):: Array containing the four parameters a,b,p,q for a BEINF distribution.

Returns: a,b,p,q (floats):

The individual parameters for the BEINF disribution.