Discriminant Analysis is a powerful multivariate analytical tool which is very helpful for investigations such as geochemical provenance studies. There are two main types of Discriminant Analysis (DA): predictive and descriptive. Predictive DA is used in cases where a data set contains two (or more) known separate populations and one wishes to be able to attribute new data points to one population or the other. For instance, a person may have geochemical data that define the range of geochemical contents for two chert formations. One could then use predictive DA to calculate a discriminant rule from the data set. Any new, unknown samples of chert could have the calculated discriminant rule applied to their geochemical data in order to assign them to one chert formation or the other. The drawbacks of this method include a necessity for distinctly separate, non-overlapping populations from which to derive the discriminant rule, and also the necessity of knowing beforehand that the unknowns belong to one of the two defined groups, and not a third unknown possibility.
Descriptive DA is a tool for creating discriminant functions that will describe and distinguish previously unknown separate populations within a given data set. A drawback here is that one has to know or guess the number of separate populations hidden within the data set.
Clearly, Predictive Discriminant Analysis is a tool very well-suited to many provenance studies. One can collect sufficient samples to make an accurate geochemical representation of a number of geologic formations, calculate discriminant rules for them, and test the rules by submitting archaeological lithic samples and seeing to which formation they are attributed.
Discriminant Analysis was conducted on the Prairie du Chien data set, using the sample locations to divide the data set into pre-defined groups. A first analysis of the data set was made using the samples from each individual sample location to define twenty separate groups. This large number of groups, each with a relatively small number of defining sample points, gave poor results. This is not unexpected, since some of the sample locations are geographically very close together, and it was expected that they would probably overlap geochemically to such an extent as to be indistinguishable statistically. Therefore, the twenty sample locations were subsequently divided into six groups based upon geographic distances from one another (see Figure 8). The DA was again run for the data set, using the six defined sample regions.
Number and percent of samples classified into sample locations:
| identified as group 1 | identified as group 2 | identified as group 3 |
identified as group 4 |
identified as group 5 |
identified as group 6 |
Total |
|
from group1 |
39 54.93% |
10 14.08% |
15 21.13% |
1 1.41% |
2 2.82% |
4 5.63% |
71 100% |
from group 2 |
8 16.33% |
20 40.82% |
7 14.29% |
3 6.12% |
6 12.24% |
5 10.20% |
49 100% |
from group 3 |
13 26.53% |
4 8.16% |
17 34.69% |
5 10.20% |
9 18.37% |
1 2.04% |
49 100% |
from group 4 |
1 10.00% |
1 10.00% |
0 0.00% |
5 50.00% |
3 30.00% |
0 0.00% |
10 100% |
from group 5 |
1 5.56% |
0 0.00% |
6 33.33% |
5 27.78% |
5 27.78% |
1 5.56% |
18 100% |
from group 6 |
1 10.00% |
0 0.00% |
0 0.00% |
1 10.00% |
0 0.00% |
8 80.00% |
10 100% |
total percent: |
63 30.43% |
35 16.91% |
45 21.74% |
20 9.66% |
25 12.08% |
19 9.18% |
207 100% |
Error counts for classifications:
group 1 |
group2 |
group 3 |
group 4 |
group 5 |
group 6 |
Total |
|
error rate: |
45.07% |
59.18% |
65.31% |
50.00% |
72.22% |
20.00% |
51.96% |
Table 9: Cross-validation summary from Discriminant Analysis of the six sample location groups. Except for Group 6, the D.A. procedure does a poor job of discriminating between them.
A summary of the Discriminant Analysis results is found in Table 9, which shows the number of samples correctly assigned to each group, the number of misclassifications, and the misclassification rates. The misclassification rate for sample group 6 is 20% (pretty low), but the misclassification rates for the rest of the sample groups range from 45% to 72% (clearly very poor). It appears that Discriminant Analysis does not adequately distinguish among different sample regions of the Prairie du Chien.
The failure of Discriminant Analysis to distinguish successfully among sample regions is not a surprising development for this study. Discriminant Analysis has been a very successful technique for previous provenance studies because the previous studies were attempting to distinguish between entirely distinct geologic sources. These geologic formations could, in effect, be considered as discrete point sources. The geochemical ranges defining separate chert formations are quite likely to form entirely separate or partially overlapping populations in multivariate space, and this is just the sort of arrangement for which DA is most useful.
In contrast, an intra-formational provenance study like this one consists of a single large population of data with a continuous range of values. When Discriminant Analysis is run on this data set, it is being asked to discriminate between small parts of a single large group, without any separation distance as one would find for multiple distinct populations. In the case of intra-formational data, the pre-defined input groups are immediately adjacent and overlapping, presenting a real difficulty.
The situation might not be unsalvageable if it were set up in a particular fashion. If there were, for example, just four or five prehistoric lithic source locations scattered across the Prairie du Chien and archaeologists were reasonably certain that there were no additional undiscovered sources, then the problem could be approached just like an inter-formation provenance study. Samples could be collected to represent each of the 4-5 sites which, by virtue of geographic separation, might quite possibly have distinctly separate geochemical ranges. Then DA could be used to attribute unknown samples to individual sources.
To test and demonstrate this point, another DA was made for the Prairie du Chien data. This time, however, the entire data set was not used; instead only samples from two locations with large geographic separation were chosen. The twenty samples from these two locations were run through the DA process, with the results seen in Table 10.
Number and percent of samples classified into sample locations:
| identified as 5 | identified as 16 | Total |
|
from location 5 |
9 90% |
1 10% |
10 100% |
from location 16 |
2 20% |
8 80% |
10 100% |
total percent: |
11 55% |
9 45% |
20 100% |
Error counts for classifications:
| location 5 | location 16 | Total |
|
error rate: |
10% |
20% |
15% |
Table 10: Cross-validation summary from Discriminant Analysis of samples from locations 5 (Winona, MN) and 16 (Shakopee, MN). The D.A. procedure does an effective job of discriminating between these two widely-separated sample locations.
After cross-validation, 90% of the samples from location 5 were correctly identified, and 80% of the samples from location 6 were correctly identified. Another test was made in the same manner, using samples from locations 17 and 19, the results from which are found in Table 11. This time the correct identification rates are 70% and 90%, respectively. Clearly, there is a potential for geochemically discriminating among widely-spaced individual sites within a formation. These results would be very gratifying if there were only a few known prehistoric procurement sites within the Prairie du Chien. Also, more precise identification rates would probably result from larger sample numbers than 10 from each location.
Number and percent of samples classified into sample locations:
| identified as 17 | identified as 19 | Total |
|
from location 17 |
7 70% |
3 30% |
10 100% |
from location 19 |
1 10% |
9 90% |
10 100% |
total percent: |
8 40% |
12 60% |
20 100% |
Error counts for classifications:
| location 17 | location 19 | Total |
|
error rate: |
30% |
10% |
20% |
Table 11: Cross-validation summary from Discriminant Analysis of samples from locations 17 (Mankato, MN) and 19 (Prairie du Chien, WI). The D.A. procedure does an effective job of discriminating between these two widely-separated sample locations.
The true situation with the Prairie du Chien, however, is very different from the above scenario. The evidence indicates that prehistoric peoples did not consistently return to particular locations to collect Prairie du Chien chert. Rather, the material is so ubiquitous and abundant throughout its outcrop area that they could collect it nearly everywhere. The archaeological record in Minnesota and Wisconsin is revealing hundreds of single-visit procurement sites (Theler, 1981; Arzigian, 1981). Therefore a procurement study of the Prairie du Chien should treat it as a single continuous unit, rather than a number of discrete point sources.
NEXT PREVIOUS TABLE OF CONTENTS
Intro and Background Fieldwork Sample Prep Data Analysis PCA Correspondence Analysis Stepwise DA
Discriminant Analysis More PCA Element Trends Conclusions Bibliography Appendix A: Part 1 Part 2 Part 3 Part 4 Part 5 Appendix B Appendix C