1.Exploratory
Data Analysis
1.3.EDA Techniques 1.3.5.Quantitative Techniques
|
|||||||||||
Purpose:
Test for Distributional Adequacy |
The Anderson-Darling test (Stephens,
1974) is used to test if a sample of data comes from a specific distribution.
It is a modification of the Kolmogorov-Smirnov
(K-S) test and gives more weight to the tails than the K-S test. The
K-S test is distribution free in the sense that the critical values do
not depend on the specific distribution being tested. The Anderson-Darling
test makes use of the specific distribution in calculating critical values.
This has the advantage of allowing a more sensitive test and the disadvantage
that critical values must be calculated for each distribution. Currently,
tables of critical values are available for the normal,
lognormal,
exponential,
Weibull,
extreme
value type I, and logistic distributions. We do not provide the tables
of critical values in this handbook (see Stephens
1974, 1976, 1977, and 1979) since this test is usually applied with
a statistical software program that will print the relevant critical values.
The Anderson-Darling test is an alternative to the chi-square and Kolmogorov-Smirnov goodness of fit tests. |
||||||||||
Definition | The Anderson-Darling test is defined as:
|
||||||||||
Sample Output | Dataplot generated the following output for
the Anderson-Darling test. 1,000 random numbers were generated for a normal,
double exponential, Cauchy, and lognormal distribution. In all four cases,
the Anderson-Darling test was applied to test for a normal distribution.
The test statistic show the characterstics of the test; where the data
come from a normal distribution, the test statistic is small and the hypothesis
accepted; where the data come from the double exponential, Cauchy, and
lognormal distributions, the statistics are significant, and the hypothesis
of an underlying normal distribution is rejected at significance levels
of 0.10, 0.05, and 0.01.
The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4. *************************************** ** anderson darling normal test y1 ** *************************************** ANDERSON DARLING 1-SAMPLE TEST THAT THE DATA COME FROM A NORMAL DISTRIBUTION 1. STATISTICS: NUMBER OF OBSERVATIONS = 1000 LOCATION PARAMETER = 0.4359940E-02 SCALE PARAMETER = 1.001816 ANDERSON DARLING TEST STATISTIC VALUE = 0.2566688 2. CRITICAL VALUES: 90 % POINT = 1.062000 95 % POINT = 1.321000 97.5 % POINT = 1.591000 99 % POINT = 1.959000 3. CONCLUSION (AT THE 5% LEVEL): THE DATA DO COME FROM A NORMAL DISTRIBUTION. *************************************** ** anderson darling normal test y2 ** *************************************** ANDERSON DARLING 1-SAMPLE TEST THAT THE DATA COME FROM A NORMAL DISTRIBUTION 1. STATISTICS: NUMBER OF OBSERVATIONS = 1000 LOCATION PARAMETER = 0.2034888E-01 SCALE PARAMETER = 1.321627 ANDERSON DARLING TEST STATISTIC VALUE = 5.827798 2. CRITICAL VALUES: 90 % POINT = 1.062000 95 % POINT = 1.321000 97.5 % POINT = 1.591000 99 % POINT = 1.959000 3. CONCLUSION (AT THE 5% LEVEL): THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION. *************************************** ** anderson darling normal test y3 ** *************************************** ANDERSON DARLING 1-SAMPLE TEST THAT THE DATA COME FROM A NORMAL DISTRIBUTION 1. STATISTICS: NUMBER OF OBSERVATIONS = 1000 LOCATION PARAMETER = 1.503854 SCALE PARAMETER = 35.13059 ANDERSON DARLING TEST STATISTIC VALUE = 287.4414 2. CRITICAL VALUES: 90 % POINT = 1.062000 95 % POINT = 1.321000 97.5 % POINT = 1.591000 99 % POINT = 1.959000 3. CONCLUSION (AT THE 5% LEVEL): THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION. *************************************** ** anderson darling normal test y4 ** *************************************** ANDERSON DARLING 1-SAMPLE TEST THAT THE DATA COME FROM A NORMAL DISTRIBUTION 1. STATISTICS: NUMBER OF OBSERVATIONS = 1000 LOCATION PARAMETER = 1.518372 SCALE PARAMETER = 1.719969 ANDERSON DARLING TEST STATISTIC VALUE = 81.64876 2. CRITICAL VALUES: 90 % POINT = 1.062000 95 % POINT = 1.321000 97.5 % POINT = 1.591000 99 % POINT = 1.959000 3. CONCLUSION (AT THE 5% LEVEL): THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION. |
||||||||||
Interpretation of the Sample Output | The output is divided into
three sections.
The output from other statistical software programs may look somewhat different than the output above. |
||||||||||
Questions | The Anderson-Darling test can be used to answer
the following questions:
|
||||||||||
Importance | Many statistical tests and procedures are based
on specific distributional assumptions. The assumption of normality is
particularly common in classical statistical tests. Much reliability modeling
is based on the assumption that the data follow a Weibull distribution.
There are many non-parametric and robust techniques that do not make strong distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than non-parametric and robust techniques. Therefore, if the distributional assumptions can be validated, they are generally preferred. |
||||||||||
Related Techniques | Chi-Square
Goodness of Fit Test
Kolmogorov-Smirnov Test Shapiro-Wilk Normality Test Probability Plot Probability Plot Correlation Coefficient Plot |
||||||||||
Case Study | Airplane glass failure time data. | ||||||||||
Software | The Anderson-Darling goodness of fit test is available in some general purpose statistical software programs, including Dataplot. |