genESD module¶
The genesd
module offers the following
functions to detect outliers in a univariate array using the
generalized extreme Studentized deviate test:
Function |
Description |
---|---|
Identifies outliers in a data array. |
|
Computes the Ri statistics for the generalized ESD test. |
|
Computes the critical values for the generalized ESD test. |
- araucaria.stats.genesd.genesd(data, r, alpha)[source]¶
Identifes outliers in a data array.
This function uses the generalized extreme Studentized deviate (ESD) test to detect one or more outliers in univariate data 1.
- Parameters
- Return type
- Returns
report – Report of the generalized ESD test.
index – Indices of outliers in the data.
Notes
The identification of outliers considers the following hypothesis test:
: there are no outliers in the data. : there are up to outliers in the data.
The algorithm performs the following operations:
The
test statistics are computed for potential outliers, removing the largest potential outlier from the data at each succesive calculation of the test statistic.The
critical values are computed for potential outliers, considering a significance level of for the t-distribution.Both values are compared, and the largest number of outliers where
is accepted as the number of outliers.
References
- 1
Rosner, B. (1983) “Percentage Points for a Generalized ESD Many-Outlier Procedure”, Technometrics, 25(2), pp. 165-172.
Example
>>> # calculating outliers for Rosner data (1983): >>> from numpy import loadtxt, allclose >>> from araucaria.testdata import get_testpath >>> from araucaria.stats import genesd >>> path = get_testpath('rosner.dat') >>> data = loadtxt(path) >>> r = 5 >>> alpha = 0.05 >>> report, index = genesd(data, r, alpha) >>> print(report) Generalized ESD test for outliers H0: there are no outliers in the data H1: there are up to 5 outliers in the data Significance level: alpha = 0.05 Critical region: Reject H0 if R_i > lambda_i ===================================== n outliers x_i R_i lambda_i ===================================== 1 6.01 3.1189 3.1588 2 5.42 2.943 3.1514 3 5.34 3.1794 3.1439 * 4 4.64 2.8102 3.1362 5 -0.25 2.8156 3.1282 ===================================== >>> print(data[index]) [6.01 5.42 5.34]
- araucaria.stats.genesd.find_ri(data, r)[source]¶
Computes the
test statistics for the generalized extreme Studentized deviate (ESD) test.- Parameters
- Return type
- Returns
Test statistic for the generalized ESD test.
Value of data points furthest from the mean.
Notes
The
test statistics are calculated as follows:Where
: sample mean of reduced array. : sample standard deviation of reduced array. : number of points in the reduced array. : maximum number of outliers.
After each calculation rhe observation that maximizes
is removed, and is computed with n - i + 1 observations. This procedure is repeated until r observations have been removed from the array.Example
>>> # calculating test statistics from Rosner's data (1983): >>> from numpy import loadtxt >>> from araucaria.testdata import get_testpath >>> from araucaria.stats import find_ri >>> path = get_testpath('rosner.dat') >>> data = loadtxt(path) >>> r = 5 >>> ri,xi = find_ri(data,r) >>> for val in ri: ... print('%1.3f' % val) 3.119 2.943 3.179 2.810 2.816
- araucaria.stats.genesd.find_critvals(n, r, alpha)[source]¶
Computes critical values
for the generalized extreme Studentized deviate (ESD) test.- Parameters
- Return type
- Returns
Critical values.
Notes
The
values are calculated as follows:Where
: number of points in the array. : significance level. : percent point function of the t-distribution at value and degrees of freedom. : maximum number of outliers.
Example