Stepwise Discriminant Analysis
The Principle Component Analysis and Correspondence Analysis of the Prairie du Chien data have provided some useful information about the data set but have not enabled discrimination of distinct source areas within the field area. The next statistical tool to be employed for this purpose is Discriminant Analysis. First, however, there is a question of what variables (elements) to input into the process. During the above analysis of Principle Component data, nine elements were chosen to be most useful based upon their relative variability and influence on the distribution of the data set. Also, four element ratios were identified as being potentially useful. The nine elements with the largest influence in PCA will not necessarily be the most useful elements for Discriminant Analysis, however.
In order to see which variables have the greatest power for discriminating between sample locations, all of the elements and the four element ratios were entered into a Stepwise Discriminant Analysis process using the SAS procedure stepdisc. This technique is somewhat misleadingly named, since it does not actually perform a Discriminant Analysis of the data set. Instead, it calculates linear discriminant functions with which to discriminate the sample groups. An iterative process is performed in which it starts with a single variable (element), calculates the rate of success for discrimination, adds another variable, calculates the improvement or worsening in discrimination, adds another variable, and so on until there are no longer any variables which improve the results. In this stepwise manner, the program chooses which variables in a data set have the best discriminatory power and which ones are not useful.
In the above PC analysis of the data set, it was revealed that the overall Prairie du Chien geochemically clusters rather tightly, with much overlap between sample locations. This situation is expected to make the discrimination between locations very difficult. Therefore, the Stepwise Discriminant Analysis was first conducted with the instructions for the program to distinguish between each individual sample location, and the analysis was then run a second time with the instructions to distinguish between six geographic groups of sample locations (see Figure 8). The first analysis chose seven elements and three element ratios (see Table 7); all other variables were not considered additionally helpful by the program. The second analysis chose four elements and one element ratio (see Table 8), only one of which was not chosen in the first analysis. Together, there is a total of eleven variables chosen as being useful for discrimination between locations. Three of the elements (Sc, Sm, and Ce) were essentially chosen twice, however: once as individual elements, and once as part of an element ratio. Therefore the variables chosen for Discriminant Analysis were Al, Cr, Cs, Fe, Eu, Ce/Sm, La/Sc, and La/Sm. Notably, these are eight of the same variables which were found to be the most useful in the Principle Component Analysis.
Average |
||||||||||
Squared |
||||||||||
Variable |
Variable |
Number |
Partial |
F |
Prob > |
Wilks' |
Prob < |
Canonical |
Prob > |
|
Step |
Entered |
Removed |
In |
R**2 |
Statistic |
F |
Lambda |
Lambda |
Correlation |
ASCC |
1 |
Fe |
1 |
0.4761 |
8.452 |
0.0001 |
0.52389641 |
0.0001 |
0.02380518 |
0.0001 |
|
2 |
Al |
2 |
0.3775 |
5.61 |
0.0001 |
0.32612303 |
0.0001 |
0.04267943 |
0.0001 |
|
3 |
La/Sm |
3 |
0.3275 |
4.481 |
0.0001 |
0.21931392 |
0.0001 |
0.05865093 |
0.0001 |
|
4 |
Cs |
4 |
0.2923 |
3.778 |
0.0001 |
0.15521867 |
0.0001 |
0.07279750 |
0.0001 |
|
5 |
La/Sc |
5 |
0.2823 |
3.579 |
0.0001 |
0.11140711 |
0.0001 |
0.08629080 |
0.0001 |
|
6 |
Ce/Sm |
6 |
0.2436 |
2.915 |
0.0001 |
0.08426557 |
0.0001 |
0.09752780 |
0.0001 |
|
7 |
Cr |
7 |
0.2719 |
3.362 |
0.0001 |
0.06134996 |
0.0001 |
0.10847646 |
0.0001 |
|
8 |
Sm |
8 |
0.204 |
2.293 |
0.0022 |
0.04883581 |
0.0001 |
0.11813657 |
0.0001 |
|
9 |
Eu |
9 |
0.225 |
2.584 |
0.0005 |
0.03784672 |
0.0001 |
0.12716064 |
0.0001 |
|
10 |
Ce |
10 |
0.1704 |
1.817 |
0.0218 |
0.03139942 |
0.0001 |
0.13508111 |
0.0001 |
Table 7: Results of Stepwise Discriminant Analysis. These ten variables were found to be helpful in discriminating between sample locations.
Average |
||||||||||
Squared |
||||||||||
Variable |
Variable |
Number |
Partial |
F |
Prob > |
Wilks' |
Prob < |
Canonical |
Prob > |
|
Step |
Entered |
Removed |
In |
R**2 |
Statistic |
F |
Lambda |
Lambda |
Correlation |
ASCC |
1 |
Fe |
1 |
0.2485 |
13.292 |
0.0001 |
0.75151593 |
0.0001 |
0.04969681 |
0.0001 |
|
2 |
Cr |
2 |
0.2135 |
10.857 |
0.0001 |
0.59107602 |
0.0001 |
0.09210010 |
0.0001 |
|
3 |
Al |
3 |
0.1678 |
8.026 |
0.0001 |
0.49188396 |
0.0001 |
0.12125901 |
0.0001 |
|
4 |
Sc |
4 |
0.1033 |
4.561 |
0.0006 |
0.44108327 |
0.0001 |
0.13995664 |
0.0001 |
|
5 |
La/Sm |
5 |
0.0962 |
4.196 |
0.0012 |
0.39863234 |
0.0001 |
0.15668807 |
0.0001 |
Table 8: Results of Stepwise Discriminant Analysis. These five variables were found to be helpful in discriminating between groups of sample locations.
NEXT
PREVIOUS
TABLE OF CONTENTS
Intro and Background Fieldwork Sample Prep Data Analysis PCA Correspondence Analysis Stepwise DA
Discriminant Analysis More PCA Element Trends Conclusions Bibliography Appendix A: Part 1 Part 2 Part 3 Part 4 Part 5 Appendix B Appendix C